They offer 21 datasets including a range of different data (time series, geospatial, user preferences) on a variety of topics like business, sports, wine, financial stocks, transportation and whatnot.
This is a great starting point if you want to practice your data science, machine learning and analysis skills on real life data!
Maven Analytics provides e-learnings in analysis and programming software. To provide a practical learning experience, their courses are often accompanied by real-life datasets for students to analyze.
The book covers the basic foundations up to advanced theory and algorithms. I copied the table of contents below. It’s kind of math heavy, but well explained with visual examples and pseudo-code.
Moreover, the book contains multiple exercises for you to internalize the knowledge and skills.
As an added bonus, the professors teach a number of machine learning courses, the lecture slides and materials of which you can also access for free via the book’s website.
Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.
The Open Source Society University offers a complete education in computer science using online materials.
According to their GitHub page, the curriculum is suited for people with the discipline, will, and good habits to obtain this education largely on their own, but who’d still like support from a worldwide community of fellow learners.
Intro CS: for students to try out CS and see if it’s right for them
Core CS: corresponds roughly to the first three years of a computer science curriculum, taking classes that all majors would be required to take
Advanced CS: corresponds roughly to the final year of a computer science curriculum, taking electives according to the student’s interests
Final Project: a project for students to validate, consolidate, and display their knowledge, to be evaluated by their peers worldwide
Pro CS: graduate-level specializations students can elect to take after completing the above curriculum if they want to maximize their chances of getting a good job
It is possible to finish Core CS within about 2 years if you plan carefully and devote roughly 18-22 hours/week to your studies. Courses in Core CS should be taken linearly if possible, but since a perfectly linear progression is rarely possible, each class’s prerequisites are specified so that you can design a logical but non-linear progression based on the class schedules and your own life plans.
Both in science and business, we often experience difficulties collecting enough data to test our hypotheses, either because target groups are small or hard to access, or because data collection entails prohibitive costs.
Such obstacles may result in data sets that are too small for the complexity of the statistical model needed to answer the questions we’re really interested in.
This unique book provides guidelines and tools for implementing solutions to issues that arise in small sample studies. Each chapter illustrates statistical methods that allow researchers and analysts to apply the optimal statistical model for their research question when the sample is too small.
This book will enable anyone working with data to test their hypotheses even when the statistical model required for answering their questions are too complex for the sample sizes they can collect. The covered statistical models range from the estimation of a population mean to models with latent variables and nested observations, and solutions include both classical and Bayesian methods. All proposed solutions are described in steps researchers can implement with their own data and are accompanied with annotated syntax in R.