Category: datasets

Google's Dataset Search: Direct access to 25 million interesting datasets

Google's Dataset Search: Direct access to 25 million interesting datasets

I used to keep a repository of links to interesting datasets to learn data science. However, that page I can retire, as Google has launched its new service Dataset Search. The "world wide web" hosts millions of datasets, on nearly any topic you can think of. Google's Dataset Search has indexed almost 25 million of these … Continue reading Google's Dataset Search: Direct access to 25 million interesting datasets

Caselaw Access Project: Structured data of over 6 million U.S. court decisions

Caselaw Access Project: Structured data of over 6 million U.S. court decisions

Case.law seems like a very interesting data source for a machine learning or text mining project: The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law … Continue reading Caselaw Access Project: Structured data of over 6 million U.S. court decisions

Datasets to practice and learn Programming, Machine Learning, and Data Science

Datasets to practice and learn Programming, Machine Learning, and Data Science

Many requests have come in regarding "training datasets" - to practice programming. Fortunately, the internet is full of open-source datasets! I compiled a selected list of datasets and repositories below. If you have any additions, please comment or contact me! For information on programming languages or algorithms, visit the overviews for R, Python, SQL, or Data Science, … Continue reading Datasets to practice and learn Programming, Machine Learning, and Data Science