Many requests have come in regarding the projects I have conducted to learn programming (languages), data science and machine learning. Although I host some basic project examples (e.g., Harry Plotter), I feel an overview of “training datasets” would be more valuable. The internet is full of open-source datasets that allow you to try and learn algorithms, to compete against others in predictive analytics (Kaggle), or to build data science cases for your resume.

I am compiling a list below, and if you have any datasets to add or share, please comment or contact me! If you need information regarding programming languages or algorithms, visit the overviews for RPython, SQL, or general Data Science/Machine Learning/Statistics resources.

LAST UPDATED: 03-11-2017

Dataset Repositories

Company Datasets

Institutional Datasets

Text Datasets

Image Datasets

Network Datasets

Other Datasets