Data science is hot, and the path to becoming a data scientist can scorch. That's why there aren't many who possess all of the skills that fall under the title of data scientist. This scarcity means that data scientists cost a lot of money, but companies are looking to undercut the perks and compensation full-fledged data scientists can command by breaking the role up into multiple ones: data engineer, scientist, visualization designer, manager, and so on.

In other words, developers have a faster back door into the position. Someone must write the software, implement the models, run the machine learning libraries, and otherwise enable the data science. There is a need for programming specialists who know the subject. That can be you.

Data science bootcamps involve substantial investment of time over a short period as well as a significant payment. If that doesn't look appealing to you, here is a collection of no- or low-cost training and tools to get up to speed on the field of data science. (We've divided them into six categories that we will update over time.)

Foundational math

A free and self-paced introduction to probability, statistics, and basic statistical inference from Stanford University.

A large variety of math courses from MIT—including resources—available for free online use. Includes courses in single-variable calculus, multi-variable calculus, probability and statistics, and linear algebra.

A review of calculus, in case your differentiation and integration are a little rusty.

A free, self-paced course covering linear transformations, matrices, systems of linear equations, vector spaces, and other areas important to working with large datasets.

A free textbook on linear algebra, available for download or online reading. Also available is the exercise and solutions manual.

Statistical modeling

A Stanford online course that focuses on regression and classification methods, with an emphasis on modern data analysis without heavy reliance on formulas and complex mathematics.

Another Stanford online course with an emphasis on statistical inference and understanding distribution and data relationships.

From the University of Bristol, this course moves from quantitative research to multilevel modeling of continuous and discrete data.

Data Analysis & Statistical Inference

A Duke University online course about how to collect and analyze data, make statistical inferences, and draw conclusions.

From the University of Texas at Austin (through edX), this archived course covers how to use data samples and inferential statistics, including t-tests, chi-square, ANOVA, and regression.

Data science software tools

A free downloadable or online book on Bayesian statistics with implementations in Python.

A series of statistics courses from Duke (through Coursera) focused on using R and RStudio. R is a popular statistical programming language. The series runs $59 a month for a certificate, although it is possible to audit any of the courses in the series for no charge to see if they would be helpful.

A practical introduction to Jupyter Notebook and how to use it in data science projects.

A series of articles on using BeautifulSoup, a Python library used for parsing and useful in scraping data from websites.

A Stanford tutorial on using BeautifulSoup with Python to scrape data from websites.

A tutorial for using the Selenium library with Python to simulate web surfing and scrape data.

A scientific computing package for Python used to manipulate and manage n-dimensional arrays.

A Python-based ecosystem of open-source software for mathematics, science, and engineering. The libraries include NumPy, SciPy, Matplotlib, pandas, and more.

A Python statistics library to explore data and perform statistical tests.

A Python machine learning library.

A framework for a cluster computing engine that enables large-scale data processing.

A course from the University of California at Berkeley on using Spark.

An introduction to one of the most popular JavaScript libraries for creating data visualizations in a browser.

A Python data visualization library.

An information and research center on various databases in the software industry.

An infographic that maps the categories of various databases and analytics platforms. It's useful for choosing the right database for your use case. The 2016 version requires some info to download, but the 2014 version does not.

Advanced techniques

A multi-month, self-taught study plan for developers to understand and implement machine learning.

A guide to understanding the basics of machine learning, whether for building a chat bot or using big data to create a recommendation engine for an app.

A self-study course to begin using machine learning without a heavy background in mathematics.

The textbook for the Foundations of Data Science class at the University of California, Berkeley.

An overview and tutorial for natural language processing and NLP system design.

An introduction to computerized processing of natural languages.

A free web version of a book about undertaking the geospatial analysis of data.

Data visualization

Course through edX from TUDelft on advanced techniques for robust data analysis in a business environment, including importing, summarizing, interpreting, analyzing and visualizing data.

Practical tutorials in steps of creating visualizations within JavaScript.

How to take statistical analysis done in R and visually render results.

A downloadable O'Reilly book on how to approach the design aspects of visually presenting data.

Cross-category training and tools

Although MIT's Open Courseware offerings are mentioned in the first section, this course in particular covers a wider range of topics. It starts with basic probability and works through applied statistics including Bayesian inference, linear regression, and resampling methods.

This might be the resource you want to read first. It briefs you on on the skills you need and steps you may have to take to get into commercial data science. It's also useful if you're a recruiter vetting candidates for a data science position. Also available are Part 2 and Part 3.

A recently published, free book that provides all the getting started and reference topics needed to harness Python's may data science-related libraries.

A github repo that includes a collection of free courses to learn data science.

A free ebook from O'Reilly about the skills necessary for real-world data science and how to approach the necessary coding.

A list of great resources on many aspects of data science.

A Duke University online series about performing sophisticated data analysis with Excel and Tableau.

Update 1:

Here are some additional resources recommended by the community and the TechBeacon editors after publication:

Over a decade ago, software engineering luminary Joel Spolsky wrote 12 questions to determine the maturity of your software engineering team. Domino Data Lab came up with their won Joel Test for assessing the maturity of your data science program and individual data scientists. This is a useful resource for building some of the baseline requirements you might want a data scientist candidate to have.

Similar to this article, Machine Learning for Product Managers is also a collection of resources for getting started with machine learning in your organization. There are 18 links to blogs, courses, and books, some of which you've already found in this collection.

The machine learning Coursera course from Stanford has gotten rave reviews and is taught by Andrew Ng, an associate professor at Stanford, a chief scientist at Baidu, and the co-founder of Coursera itself.

This is another good resource for testing the skills of someone claiming to be a data scientist. The authors assert, "If there is one language, every data science professional should know – it is SQL." The questions are multiple choice and they focus on practical aspects and challenges people encounter while using Excel.

Jefferson Heard explores what he does as a data scientist with very detailed stories about his various experiences in the field. The article also contains a linked list to all of the tools he's used in his work.

Any resources that you find particularly useful in the field of data science? Share them in the comments below and we will consider adding them to our evolving guide.

Keep learning