In this original blog, with equally original title, Delip Rao poses twelve (+1) harsh truths about the real world practice of machine learning. I found it quite enlightning to read a non-hyped article about ML for once. Particularly because Delip’s experiences seem to overlap quite nicely with the principles of software design and Agile working.
Delip’s 12 truths I’ve copied in headers below. If they spark your interest, read more here:
It has to work
No matter how hard you push and no matter what the priority, you can’t increase the speed of light
With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea
Some things in life can never be fully appreciated nor understood unless experienced firsthand
It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases, this is a bad idea
It is easier to ignore or move a problem around than it is to solve it
You always have to tradeoff something
Everything is more complicated than you think
You will always under-provision resources
One size never fits all. Your model will make embarrassing errors all the time despite your best intentions
Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works
Perfection has been reached not when there is nothing left to add, but when there is nothing left to take away
Delip added in a +1, with his zero-indexed truth: You are Not a Scientist.
Yes, that’s all of you building stuff with machine learning with a “scientist” in the title, including all of you with PhDs, has-been-academics, and academics with one foot in the industry. Machine learning (and other AI application areas, like NLP, Vision, Speech, …) is an engineering research discipline (as opposed to science research).
With great pleasure I’ve studied and worked in the field of people analytics, where we seek to leverage employee, management-, and business information to better organize and manage our personnel. Here, data has proven valuable itself indispensible for the organization of the future.
Data and analytics have not traditionally been high on the list of HR professionals. Fortunately, there is an increased awareness that the 21st century (HR) manager has to be data-savvy. But where to start learning? The plentiful available resources can be daunting…
Have a look at these 100+ amazing books for (starting) people analytics specialists. My personal recommendations are included as pictures, but feel free to ask for more detailed suggestions!
Behavioural Psychology: focus on behavioural psychology and economics, including decision-making and the biases therein.
Technology: focus on the implications of new technology….
Ethics: … on society and humanity, and what can go wrong.
Digital & Data-driven HR: … for the future of work, workforce, and organization. Includes people analytics case studies.
Management: focus on industrial and organizational psychology, HR, leadership, and business strategy.
Statistics: focus on the technical books explaining statistical concepts and applied data analysis.
People analytics: …. more technical books on how to conduct people analytics studies step-by-step in (statistical) software.
Programming: … technical books specifically aimed at (statistical) programming and data analysis.
Communication: focus on information exchange, presentation, and data visualization.
Disclaimer: This page contains links to Amazon’s book shop. Any purchases through those links provide us with a small commission that helps to host this blog.
The 2018 annual Society for Industrial and Organizational Psychology (SIOP) conference featured its first-ever machine learning competition. Teams competed for several months in predicting the enployee turnover (or churn) in a large US company. A more complete introduction as presented at the conference can be found here. All submissions had to be open source and the winning submissions have been posted in this GitHub repository. The winning teams consist of analysts working at WalMart, DDI, and HumRRO. They mostly built ensemble models, in Python and/or R, combining algorithms such as (light) gradient boosted trees, neural networks, and random forest analysis.