Tag: management

Artificial Stupidity – by Vincent Warmerdam @PyData 2019 London

Artificial Stupidity – by Vincent Warmerdam @PyData 2019 London

PyData is famous for it’s great talks on machine learning topics. This 2019 London edition, Vincent Warmerdam again managed to give a super inspiring presentation. This year he covers what he dubs Artificial Stupidity™. You should definitely watch the talk, which includes some great visual aids, but here are my main takeaways:

Vincent speaks of Artificial Stupidity, of machine learning gone HorriblyWrong™ — an example of which below — for which Vincent elaborates on three potential fixes:

Image result for paypal but still learning got scammed
Example of a model that goes HorriblyWrong™, according to Vincent’s talk.

1. Predict Less, but Carefully

Vincent argues you shouldn’t extrapolate your predictions outside of your observed sampling space. Even better: “Not predicting given uncertainty is a great idea.” As an alternative, we could for instance design a fallback mechanism, by including an outlier detection model as the first step of your machine learning model pipeline and only predict for non-outliers.

I definately recommend you watch this specific section of Vincent’s talk because he gives some very visual and intuitive explanations of how extrapolation may go HorriblyWrong™.

Be careful! One thing we should maybe start talking about to our bosses: Algorithms merely automate, approximate, and interpolate. It’s the extrapolation that is actually kind of dangerous.

Vincent Warmerdam @ Pydata 2019 London

Basically, we can choose to not make automated decisions sometimes.

2. Constrain thy Features

What we feed to our models really matters. […] You should probably do something to the data going into your model if you want your model to have any sort of fairness garantuees.

Vincent Warmerdam @ Pydata 2019 London

Often, simply removing biased features from your data does not reduce bias to the extent we may have hoped. Fortunately, Vincent demonstrates how to remove biased information from your variables by applying some cool math tricks.

Unfortunately, doing so will often result in a lesser predictive accuracy. Unsurprisingly though, as you are not closely fitting the biased data any more. What makes matters more problematic, Vincent rightfully mentions, is that corporate incentives often not really align here. It might feel that you need to pick: it’s either more accuracy or it’s more fairness.

However, there’s a nice solution that builds on point 1. We can now take the highly accurate model and the highly fair model, make predictions with both, and when these predictions differ, that’s a very good proxy where you potentially don’t want to make a prediction. Hence, there may be observations/samples where we are comfortable in making a fair prediction, whereas in most other situations we may say “right, this prediction seems unfair, we need a fallback mechanism, a human being should look at this and we should not automate this decision”.

Vincent does not that this is only one trick to constrain your model for fairness, and that fairness may often only be fair in the eyes of the beholder. Moreover, in order to correct for these biases and unfairness, you need to know about these unfair biases. Although outside of the scope of this specific topic, Vincent proposes this introduces new ethical issues:

Basically, we can choose to put our models on a controlled diet.

3. Constrain thy Model

Vincent argues that we should include constraints (based on domain knowledge, or common sense) into our models. In his presentation, he names a few. For instance, monotonicity, which implies that the relationship between X and Y should always be either entirely non-increasing, or entirely non-decreasing. Incorporating the previously discussed fairness principles would be a second example, and there are many more.

If we every come up with a model where more smoking leads to better health, that’s bad. I have enough domain knowledge to say that that should never happen. So maybe I should just make a system where I can say “look this one column with relationship to Y should always be strictly negative”.

Vincent Warmerdam @ Pydata 2019 London

Basically, we can integrate domain knowledge or preferences into our models.

Conclusion: Watch the talk!

People Analytics: Is nudging goed werkgeverschap of onethisch?

People Analytics: Is nudging goed werkgeverschap of onethisch?

In Dutch only:

Voor Privacyweb schreef ik onlangs over people analytics en het mogelijk resulterende nudgen van medewerkers: kleine aanpassingen of duwtjes die mensen in de goede richting zouden moeten sturen. Medewerkers verleiden tot goed gedrag, als het ware. Maar wie bepaalt dan wat goed is, en wanneer zouden werkgevers wel of niet mogen of zelfs moeten nudgen?

Lees het volledige artikel hier.

Books for the modern, data-driven HR professional (incl. People Analytics)

Books for the modern, data-driven HR professional (incl. People Analytics)

With great pleasure I’ve studied and worked in the field of people analytics, where we seek to leverage employee, management-, and business information to better organize and manage our personnel. Here, data has proven valuable itself indispensible for the organization of the future.

Data and analytics have not traditionally been high on the list of HR professionals. Fortunately, there is an increased awareness that the 21st century (HR) manager has to be data-savvy. But where to start learning? The plentiful available resources can be daunting…

Have a look at these 100+ amazing books
for (starting) people analytics specialists.
My personal recommendations are included as pictures,
but feel free to ask for more detailed suggestions!


Categories (clickable)

  • Behavioural Psychology: focus on behavioural psychology and economics, including decision-making and the biases therein.
  • Technology: focus on the implications of new technology….
    • Ethics: … on society and humanity, and what can go wrong.
    • Digital & Data-driven HR: … for the future of work, workforce, and organization. Includes people analytics case studies.
  • Management: focus on industrial and organizational psychology, HR, leadership, and business strategy.
  • Statistics: focus on the technical books explaining statistical concepts and applied data analysis.
    • People analytics: …. more technical books on how to conduct people analytics studies step-by-step in (statistical) software.
    • Programming: … technical books specifically aimed at (statistical) programming and data analysis.
  • Communication: focus on information exchange, presentation, and data visualization.

Disclaimer: This page contains links to Amazon’s book shop.
Any purchases through those links provide us with a small commission that helps to host this blog.

Behavioural Psychology books

Jump back to Categories

Technology books

Jump back to Categories

Ethics in Data & Machine Learning

Jump back to Categories

Digital & Data-driven HR

Jump back to Categories

Management books

Jump back to Categories

Statistics books

Applied People Analytics

Programming

You can find an overview of 20+ free programming books here.

Jump back to Categories

Data Visualization books

Jump back to Categories


A note of thanks

I want to thank the active people analytics community, publishing in management journals, but also on social media. I knew Littral Shemer Haim already hosted a people analytics reading list, and so did Analytics in HR (Erik van Vulpen) and Workplaceif (Manoj Kumar). After Jared Valdron called for book recommendation on people analytics on LinkedIn, and nearly 60 people replied, I thought let’s merge these overviews.

Hence, a big thank you and acknowledgement to all those who’ve contributed directly or indirectly. I hope this comprehensive merged overview is helpful.

Join 233 other followers

Univers Interview: “Algorithms haven’t replaced the HR manager yet”

Univers Interview: “Algorithms haven’t replaced the HR manager yet”

The magazine of Tilburg University — Univers — recently interviewed me on my PhD research on People Analytics and data-driven Human Resource management. The Dutch write-up by interviewer Ron Vaessen you can find here, but is unfortunately available in Dutch only.

The full text of my dissertation can be accessed in a flipbook here or downloaded directly via this link.

I have also dedicated several blogs to more background information. A small extract on the ethics of people analytics and machine learning in HR I posted here. Those interested in visualizing survival curves like I did can see this post. Curious about the cover design, read this post

Checklist to Optimize Training Transfer in Organizations

Checklist to Optimize Training Transfer in Organizations

Ashley Hughes, Stephanie Zajac, Jacqueline Spencer, and Eduardo Salas wrote a recent research note for the International Journal of Training and Development. The research note is build around an evidence-based checklist of actionable insights for practitioners that will help to enhance the effectiveness of training interventions. These actionable insights would help to prevent ‘transfer problem’, meaning that trained skills are not being used on the job. 


Screenshot of the first page of the published research note, containing the abstract

Unfortunately, these published academic papers are often behind a paywall, but you may request a PDF from the authors here on ResearchGate.

Screenshot of the appendix of the research note containing the checklist for practitioners.

For the full details and scientific evidence behind each suggested action, I suggest you access the research note. Nevertheless, here’s my summary of their main advice on improving training transfer before, during, and after training implementation:

Before training

  • Conduct a training needs analysis to align the training’s content and participants with the organizational objectives
  • Involved stakeholders should be aware of training, understand its importance, and — obviously — be prepared for the training program. The scholars provide seven specific actions here, including the setting of personal training goals, and aligning resources and rewards with the training.
  • Training attendance should be framed as an opportunity, and the training’s anticipated benefits could be emphasized (e.g. improvement of work processes or on-the-job performance).
  • A climate which encourages learning should be created, with dedicated time (and opportunities) for post‐training learning 
    and a sense of accountability for using trained knowledge, skills, and abilities.

During training

  • Piloting the training with a single department or subset of trainees is highly encouraged. This is one way that greatly helps to assess whether the training design is appropriate in terms of content and delivery.
  • Error‐encouragement framing can influence a trainee’s learning orientation and thus errors made during training should be framed as growth opportunities.

After training

  • Use of the trained skills should be supported and planned. For instance, participants could be given a small workload reduction to provide opportunities to apply the learned knowledge and skills once they return to their position. 
  • Management and training participants should be held accountable for their use of skills on the job.
  • Think about using just‐in‐time or refresher training and coaching, if needed.
  • Assess training effectiveness criteria including training transfer using metrics and analytics. Specifically, the scholars propose that the criteria measured in the training evaluation should correspond to the training needs identified through the training needs analysis that was conducted before the training. 
  • Training evaluation criteria should consider the scope and timeframe of the training. Take into account that distal outcomes such as ROI may take longer to realize. 

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

November 9th 2018, I defended my dissertation on data-driven human resource management, which you can read and download via this link. On page 149, I discuss several of the issues we face when implementing machine learning and analytics within an HRM context. For the references and more detailed background information, please consult the full dissertation. More interesting reads on ethics in machine learning can be found here.


Privacy, Compliance, and Ethical Issues

Privacy can be defined as “a natural right of free choice concerning interaction and communication […] fundamentally linked to the individual’s sense of self, disclosure of self to others and his or her right to exert some level of control over that process” (Simms, 1994, p. 316). People analytics may introduce privacy issues in many ways, including the data that is processed, the control employees have over their data, and the free choice experienced in the work place. In this context, ethics would refer to what is good and bad practice from a standpoint of moral duty and obligation when organizations collect, analyze, and act upon HRM data. The next section discusses people analytics specifically in light of data privacy, legal boundaries, biases, and corporate social responsibility and free choice.

Data Privacy

Technological advancements continue to change organizational capabilities to collect, store, and analyze workforce data and this forces us to rethink the concept of privacy (Angrave et al., 2016; Bassi, 2011; Martin & Freeman, 2003). For the HRM function, data privacy used to involve questions such as “At what team size can we use the average engagement score without causing privacy infringements?” or “How long do we retain exit interview data?” In contrast, considerably more detailed information on employees’ behaviors and cognitions can be processed on an almost continuous basis these days. For instance, via people analytics, data collected with active monitoring systems help organizations to improve the accuracy of their performance measurement, increasing productivity and reducing operating costs (Holt, Lang, & Sutton, 2016). However, such systems seem in conflict with employees’ right to solitude and their freedom from being watched or listened to as they work (Martin & Freeman, 2003) and are perceived as unethical and unpleasant, affecting employees’ health and morale (Ball, 2010; Faletta, 2014; Holt et al., 2016; Martin & Freeman, 2003; Sánchez Abril, Levin, & Del Riego, 2012). Does the business value such monitoring systems bring justify their implementation? One could question whether business value remains when a more long-term and balanced perspective is taken, considering the implications for employee attraction, well-being, and retention. These can be difficult considerations, requiring elaborate research and piloting.

Faletta (2014) asked American HRM professionals which of 21 data sources would be appropriate for use in people analytics. While some were considered appropriate from an ethical perspective (e.g., performance ratings, demographic data, 360-degree feedback), particularly novel data sources were considered problematic: data of e-mail and video surveillance, performance and behavioral monitoring, and social media profiles and messages. At first thought, these seem extreme, overly intrusive data that are not and will not be used for decision-making. However, in reality, several organizations already collect such data (e.g., Hoffmann, Hartman, & Rowe, 2003; Roth et al., 2016) and they probably hold high predictive value for relevant business outcomes. Hence, it is not inconceivable that future organizations will find ways to use these data for personnel-related decisions – legally or illegally. Should they be allowed to? If not, who is going to monitor them? What if the data are used for mutually beneficial goals – to prevent problems or accidents? These and other questions deserve more detailed discussion by scholars, practitioners, and governments – preferably together.

Legal Boundaries

Although HRM professionals should always ensure that they operate within the boundaries of the law, legal compliance does not seem sufficient when it comes to people analytics. Frequently, legal systems are unprepared to defend employees’ privacy against the potential invasions via the increasingly rigorous data collection systems (Boudreau, 2014; Ciocchetti, 2011; Sánchez Abril et al., 2012). Initiatives such as the General Data Protection Regulation in the European Union somewhat restore the power balance, holding organizations and their HRM departments accountable to inform employees what, why, and how personal data is processed and stored. The rights to access, correct, and erase their information is returned to employees (GDPR, 2016). However, such regulation may not always exist and, even if it does, data usage may be unethical, regardless of its legality.

For instance, should organizations use all personnel data for which they have employee consent? One could argue that there are cases where the power imbalance between employers and employees negates the validity of consent. For instance, employees may be asked to sign written elaborate declarations or complex agreements as part of their employment, without being fully aware of what they consent to. Moreover, employees may feel pressured to provide consent in fear of losing their job, losing face, or peer pressure. Relatedly, employees may be incentivized to provide consent because of the perks associated with doing so, without fully comprehending the consequences. For instance, employees may share access to personal behavioral data in exchange for mobile devices, wellness, or mobility benefits, in which case these direct benefits may bias their perception and judgement. In such cases, data usage may not be ethically responsible, regardless of the legal boundaries, and HRM departments in general and people analytics specialists in specific should take the responsibility to champion the privacy and the interests of their employees.

Automating Historic Biases

While ethics can be considered an important factor in any data analytics project, it is particularly so in people analytics projects. HRM decisions have profound implications in an imbalanced relationship, whereas the data within the HRM field often suffer from inherent biases. This becomes particularly clear when exploring applications of predictive analytics in the HRM domain.

For example, imagine that we want to implement a decision-support system to improve the efficiency of our organization’s selection process. A primary goal of such a system could be to minimize the human time (both of our organizational agents and of the potential candidates) wasted on obvious mismatches between candidates and job positions. Under the hood, a decision-support system in a selection setting could estimate a likelihood (i.e., prediction) for each candidate that he/she makes it through the selection process successfully. Recruiters would then only have to interview the candidates that are most likely to be successful, and save valuable time for both themselves and for less probable candidates. In this way, an artificially intelligent system that reviews candidate information and recommends top candidates could considerably decrease the human workload and thereby the total cost of the selection process.

For legal compliance as well as ethical considerations, we would not want such a decision-support system to be biased towards any majority or minority group. Should we therefore exclude demographic and socio-economic factors from our predictive model? What about the academic achievements of candidates, the university they attended, or their performance on our selection tests? Some of those are scientifically validated predictors of future job performance (e.g., Hunter & Schmidt, 1998). However, they also relate to demographic and socio-economic factors and would therefore introduce bias (e.g., Hough, Oswald, & Ployhart, 2001; Pyburn, Ployhart, & Kravitz, 2008; Roth & Bobko, 2000). Do we include or exclude these selection data in our model?

Maybe the simplest solution would be to include all information, to normalize our system’s predictions within groups afterwards (e.g., gender), and to invite the top candidates per group for follow-up interviews. However, which groups do we consider? Do we only normalize for gender and nationality, or also for age and social class? What about combinations of these characteristics? Moreover, if we normalize across all groups and invite the best candidate within each, we might end up conducting more interviews than in the original scenario. Should we thus account for the proportional representation of each of these groups in the whole labor population? As you notice, both the decision-support system and the subject get complicated quickly.

Even more problematic is that any predictive decision-support system in HRM is likely biased from the moment of conception. HRM data is frequently infested with human biases as bias was present in the historic processes that generated the data. For instance, the recruiters in our example may have historically favored candidates with a certain profile, for instance, red hair. After training our decision-support system (i.e., predictive model) on these historic data, it will recognize and copy the pattern that candidates with red hair (or with correlated features, such as a Northwest European nationality) are more likely successful. The system thus learns to recommend those individuals as the top candidates. While this issue could be prevented by training the model on more objective operationalization of candidate success, most HRM data will include its own specific biases. For example, data on performance ratings will include not only the historic preferences of recruiters (i.e., only hired employees received ratings), but also the biases of supervisors and other assessors in the performance evaluation processes. Similar and other biases may occur in data regarding promotions, training courses, talent assessments, or compensation. If we use these data to train our models and systems, we would effectively automate our historic biases. Such issues greatly hinder the implementation of (predictive) people analytics without causing compliance and ethical issues.

Corporate Social Responsibility versus Free Choice

Corporate social responsibility also needs to be discussed in light of people analytics. People analytics could allow HRM departments to work on social responsibility agendas in many ways. For instance, people analytics can help to demonstrate what causes or prevents (un)ethical behavior among employees, to what extent HRM policies and practices are biased, to what extent they affect work-life balance, or how employees can be stimulated to make decisions that benefit their health and well-being. Regarding the latter case, a great practical example comes from Google’s people analytics team. They uncovered that employees could be stimulated to eat more healthy snacks by color-coding snack containers, and that smaller cafeteria plate sizes could prevent overconsumption and food loss (ABC News, 2013). However, one faces difficult ethical dilemmas in this situation. Is it organizations’ responsibility to nudge employees towards good behavior? Who determines what good entails? Should employees be made aware of these nudges? What do we consider an acceptable tradeoff between free choice and societal benefits?

When we consider the potential of predictive analytics in this light, the discussion gets even more complicated. For instance, imagine that organizations could predict work accidents based on historic HRM information, should they be forbidden, allowed, or required to do so? What about health issues, such as stress and burnout? What would be an acceptable accuracy for such models? How do we feel about false positive and false negatives? Could they use individual-level information if that resulted in benefits for employees?

In conclusion, analytics in the HRM domain quickly encounters issues related to privacy, compliance, and ethics. In bringing (predictive) analytics into the HRM domain, we should be careful not to copy and automate the historic biases present in HRM processes and data. The imbalance in the employment relationship puts the responsibility in the hands of organizational agents. The general message is that what can be done with people analytics may differ from what should be done from a corporate social responsibility perspective. The spread of people analytics depends on our collective ability to harness its power ethically and responsibility, to go beyond the legal requirements and champion both the privacy as well as the interests of employees and the wider society. A balanced approach to people analytics – with benefits beyond financial gain for the organization – will be needed to make people analytics accepted by society, and not just another management tool.