What does HackerNews think of datasets?

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language: Python

#33 in Deep learning
#201 in Hacktoberfest
#6 in Monitoring
#15 in Tensorflow
"HuggingFace datasets" is an open source Python package: https://github.com/huggingface/datasets/

And they also have ready-to-use scripts for A LOT of the usual datasets: https://huggingface.co/datasets

including LAION 400M and LAION 2B: https://huggingface.co/datasets/laion/laion2B-en

Have a look at the datasets library [1], but as a shortcut, you can just create a file named "my_code.json" in jsonlines format with one line per source file that looks like:

   {"text": "contents_of_source_file_1"}
   {"text": "contents_of_source_file_2"}
   ...
And then pass that my_code.json as the dataset name.

[1] https://github.com/huggingface/datasets