November 9th 2018, I defended my dissertation on data-driven human resource management, which you can read and download via this link. On page 149, I discuss several of the issues we face when implementing machine learning and analytics within an HRM context. For the references and more detailed background information, please consult the full dissertation. More interesting reads on ethics in machine learning can be found here.
Privacy, Compliance, and Ethical Issues
Privacy can be defined as “a natural right of free choice concerning interaction and communication […] fundamentally linked to the individual’s sense of self, disclosure of self to others and his or her right to exert some level of control over that process” (Simms, 1994, p. 316). People analytics may introduce privacy issues in many ways, including the data that is processed, the control employees have over their data, and the free choice experienced in the work place. In this context, ethics would refer to what is good and bad practice from a standpoint of moral duty and obligation when organizations collect, analyze, and act upon HRM data. The next section discusses people analytics specifically in light of data privacy, legal boundaries, biases, and corporate social responsibility and free choice.
Technological advancements continue to change organizational capabilities to collect, store, and analyze workforce data and this forces us to rethink the concept of privacy (Angrave et al., 2016; Bassi, 2011; Martin & Freeman, 2003). For the HRM function, data privacy used to involve questions such as “At what team size can we use the average engagement score without causing privacy infringements?” or “How long do we retain exit interview data?” In contrast, considerably more detailed information on employees’ behaviors and cognitions can be processed on an almost continuous basis these days. For instance, via people analytics, data collected with active monitoring systems help organizations to improve the accuracy of their performance measurement, increasing productivity and reducing operating costs (Holt, Lang, & Sutton, 2016). However, such systems seem in conflict with employees’ right to solitude and their freedom from being watched or listened to as they work (Martin & Freeman, 2003) and are perceived as unethical and unpleasant, affecting employees’ health and morale (Ball, 2010; Faletta, 2014; Holt et al., 2016; Martin & Freeman, 2003; Sánchez Abril, Levin, & Del Riego, 2012). Does the business value such monitoring systems bring justify their implementation? One could question whether business value remains when a more long-term and balanced perspective is taken, considering the implications for employee attraction, well-being, and retention. These can be difficult considerations, requiring elaborate research and piloting.
Faletta (2014) asked American HRM professionals which of 21 data sources would be appropriate for use in people analytics. While some were considered appropriate from an ethical perspective (e.g., performance ratings, demographic data, 360-degree feedback), particularly novel data sources were considered problematic: data of e-mail and video surveillance, performance and behavioral monitoring, and social media profiles and messages. At first thought, these seem extreme, overly intrusive data that are not and will not be used for decision-making. However, in reality, several organizations already collect such data (e.g., Hoffmann, Hartman, & Rowe, 2003; Roth et al., 2016) and they probably hold high predictive value for relevant business outcomes. Hence, it is not inconceivable that future organizations will find ways to use these data for personnel-related decisions – legally or illegally. Should they be allowed to? If not, who is going to monitor them? What if the data are used for mutually beneficial goals – to prevent problems or accidents? These and other questions deserve more detailed discussion by scholars, practitioners, and governments – preferably together.
Although HRM professionals should always ensure that they operate within the boundaries of the law, legal compliance does not seem sufficient when it comes to people analytics. Frequently, legal systems are unprepared to defend employees’ privacy against the potential invasions via the increasingly rigorous data collection systems (Boudreau, 2014; Ciocchetti, 2011; Sánchez Abril et al., 2012). Initiatives such as the General Data Protection Regulation in the European Union somewhat restore the power balance, holding organizations and their HRM departments accountable to inform employees what, why, and how personal data is processed and stored. The rights to access, correct, and erase their information is returned to employees (GDPR, 2016). However, such regulation may not always exist and, even if it does, data usage may be unethical, regardless of its legality.
For instance, should organizations use all personnel data for which they have employee consent? One could argue that there are cases where the power imbalance between employers and employees negates the validity of consent. For instance, employees may be asked to sign written elaborate declarations or complex agreements as part of their employment, without being fully aware of what they consent to. Moreover, employees may feel pressured to provide consent in fear of losing their job, losing face, or peer pressure. Relatedly, employees may be incentivized to provide consent because of the perks associated with doing so, without fully comprehending the consequences. For instance, employees may share access to personal behavioral data in exchange for mobile devices, wellness, or mobility benefits, in which case these direct benefits may bias their perception and judgement. In such cases, data usage may not be ethically responsible, regardless of the legal boundaries, and HRM departments in general and people analytics specialists in specific should take the responsibility to champion the privacy and the interests of their employees.
Automating Historic Biases
While ethics can be considered an important factor in any data analytics project, it is particularly so in people analytics projects. HRM decisions have profound implications in an imbalanced relationship, whereas the data within the HRM field often suffer from inherent biases. This becomes particularly clear when exploring applications of predictive analytics in the HRM domain.
For example, imagine that we want to implement a decision-support system to improve the efficiency of our organization’s selection process. A primary goal of such a system could be to minimize the human time (both of our organizational agents and of the potential candidates) wasted on obvious mismatches between candidates and job positions. Under the hood, a decision-support system in a selection setting could estimate a likelihood (i.e., prediction) for each candidate that he/she makes it through the selection process successfully. Recruiters would then only have to interview the candidates that are most likely to be successful, and save valuable time for both themselves and for less probable candidates. In this way, an artificially intelligent system that reviews candidate information and recommends top candidates could considerably decrease the human workload and thereby the total cost of the selection process.
For legal compliance as well as ethical considerations, we would not want such a decision-support system to be biased towards any majority or minority group. Should we therefore exclude demographic and socio-economic factors from our predictive model? What about the academic achievements of candidates, the university they attended, or their performance on our selection tests? Some of those are scientifically validated predictors of future job performance (e.g., Hunter & Schmidt, 1998). However, they also relate to demographic and socio-economic factors and would therefore introduce bias (e.g., Hough, Oswald, & Ployhart, 2001; Pyburn, Ployhart, & Kravitz, 2008; Roth & Bobko, 2000). Do we include or exclude these selection data in our model?
Maybe the simplest solution would be to include all information, to normalize our system’s predictions within groups afterwards (e.g., gender), and to invite the top candidates per group for follow-up interviews. However, which groups do we consider? Do we only normalize for gender and nationality, or also for age and social class? What about combinations of these characteristics? Moreover, if we normalize across all groups and invite the best candidate within each, we might end up conducting more interviews than in the original scenario. Should we thus account for the proportional representation of each of these groups in the whole labor population? As you notice, both the decision-support system and the subject get complicated quickly.
Even more problematic is that any predictive decision-support system in HRM is likely biased from the moment of conception. HRM data is frequently infested with human biases as bias was present in the historic processes that generated the data. For instance, the recruiters in our example may have historically favored candidates with a certain profile, for instance, red hair. After training our decision-support system (i.e., predictive model) on these historic data, it will recognize and copy the pattern that candidates with red hair (or with correlated features, such as a Northwest European nationality) are more likely successful. The system thus learns to recommend those individuals as the top candidates. While this issue could be prevented by training the model on more objective operationalization of candidate success, most HRM data will include its own specific biases. For example, data on performance ratings will include not only the historic preferences of recruiters (i.e., only hired employees received ratings), but also the biases of supervisors and other assessors in the performance evaluation processes. Similar and other biases may occur in data regarding promotions, training courses, talent assessments, or compensation. If we use these data to train our models and systems, we would effectively automate our historic biases. Such issues greatly hinder the implementation of (predictive) people analytics without causing compliance and ethical issues.
Corporate Social Responsibility versus Free Choice
Corporate social responsibility also needs to be discussed in light of people analytics. People analytics could allow HRM departments to work on social responsibility agendas in many ways. For instance, people analytics can help to demonstrate what causes or prevents (un)ethical behavior among employees, to what extent HRM policies and practices are biased, to what extent they affect work-life balance, or how employees can be stimulated to make decisions that benefit their health and well-being. Regarding the latter case, a great practical example comes from Google’s people analytics team. They uncovered that employees could be stimulated to eat more healthy snacks by color-coding snack containers, and that smaller cafeteria plate sizes could prevent overconsumption and food loss (ABC News, 2013). However, one faces difficult ethical dilemmas in this situation. Is it organizations’ responsibility to nudge employees towards good behavior? Who determines what good entails? Should employees be made aware of these nudges? What do we consider an acceptable tradeoff between free choice and societal benefits?
When we consider the potential of predictive analytics in this light, the discussion gets even more complicated. For instance, imagine that organizations could predict work accidents based on historic HRM information, should they be forbidden, allowed, or required to do so? What about health issues, such as stress and burnout? What would be an acceptable accuracy for such models? How do we feel about false positive and false negatives? Could they use individual-level information if that resulted in benefits for employees?
In conclusion, analytics in the HRM domain quickly encounters issues related to privacy, compliance, and ethics. In bringing (predictive) analytics into the HRM domain, we should be careful not to copy and automate the historic biases present in HRM processes and data. The imbalance in the employment relationship puts the responsibility in the hands of organizational agents. The general message is that what can be done with people analytics may differ from what should be done from a corporate social responsibility perspective. The spread of people analytics depends on our collective ability to harness its power ethically and responsibility, to go beyond the legal requirements and champion both the privacy as well as the interests of employees and the wider society. A balanced approach to people analytics – with benefits beyond financial gain for the organization – will be needed to make people analytics accepted by society, and not just another management tool.