Tag: google

Google’s Guidebook for Developing AI Product Development

Google’s Guidebook for Developing AI Product Development

I came across another great set of curated resources by one of the teams at Google:

The People + AI Guidebook.

The People + AI Guidebook was written to help user experience (UX) professionals and product managers follow a human-centered approach to AI.

The Guidebook’s recommendations are based on data and insights from over a hundred individuals across Google product teams, industry experts, and academic research.

These six chapters follow the product development flow, and each one has a related worksheet to help turn guidance into action.

The People & AI guidebook is one of the products of the major PAIR project team (People & AI Research).

Here are the direct links to the six guidebook chapters:

Links to the related worksheets you can find here.

An ABC of Artificial Intelligence Concepts

An ABC of Artificial Intelligence Concepts

Yet another great resource by one of the teams at Google in collaboration with Oxford:

An ABC of Artificial Intelligence-related concepts!

The G is for GANs: Generative Adverserial Networks.

Want to know what GANs are all about?

Just read along with Google’s laymen explanation! Here’s an excerpt:

The P is for Predictions.

Currently the ABC is only available in English, but other language translations come available soon.

Check it out yourself!

Building a realistic Reddit AI that get upvoted in Python

Building a realistic Reddit AI that get upvoted in Python

Sometimes I find these AI / programming hobby projects that I just wished I had thought of…

Will Stedden combined OpenAI’s GPT-2 deep learning text generation model with another deep-learning language model by Google called BERT (Bidirectional Encoder Representations from Transformers) and created an elaborate architecture that had one purpose: posting the best replies on Reddit.

The architecture is shown at the end of this post — copied from Will’s original blog here. Moreover, you can read this post for details regarding the construction of the system. But let me see whether I can explain you what it does in simple language.

The below is what a Reddit comment and reply thread looks like. We have str8cokane making a comment to an original post (not in the picture), and then tupperware-party making a reply to that comment, followed by another reply by str8cokane. Basically, Will wanted to create an AI/bot that could write replies like tupperware-party that real people like str8cokane would not be able to distinguish from “real-people” replies.

Note that with 4 points, str8cokane‘s original comments was “liked” more than tupperware-party‘s reply and str8cokane‘s next reply, which were only upvoted 2 and 1 times respectively.

gpt2-bert on China
Example reddit comment and replies (via bonkerfield.org/)

So here’s what the final architecture looks like, and my attempt to explain it to you.

  1. Basically, we start in the upper left corner, where Will uses a database (i.e. corpus) of Reddit comments and replies to fine-tune a standard, pretrained GPT-2 model to get it to be good at generating (red: “fake”) realistic Reddit replies.
  2. Next, in the upper middle section, these fake replies are piped into a standard, pretrained BERT model, along with the original, real Reddit comments and replies. This way the BERT model sees both real and fake comments and replies. Now, our goal is to make replies that are undistinguishable from real replies. Hence, this is the task the BERT model gets. And we keep fine-tuning the original GPT-2 generator until the BERT discriminator that follows is no longer able to distinguish fake from real replies. Then the generator is “fooling” the discriminator, and we know we are generating fake replies that look like real ones!
    You can find more information about such generative adversarial networks here.
  3. Next, in the top right corner, we fine-tune another BERT model. This time we give it the original Reddit comments and replies along with the amount of times they were upvoted (i.e. sort of like likes on facebook/twitter). Basically, we train a BERT model to predict for a given reply, how much likes it is going to get.
  4. Finally, we can go to production in the lower lane. We give a real-life comment to the GPT-2 generator we trained in the upper left corner, which produces several fake replies for us. These candidates we run through the BERT discriminator we trained in the upper middle section, which determined which of the fake replies we generated look most real. Those fake but realistic replies are then input into our trained BERT model of the top right corner, which predicts for every fake but realistic reply the amount of likes/upvotes it is going to get. Finally, we pick and reply with the fake but realistic reply that is predicted to get the most upvotes!
What Will’s final architecture, combining GPT-2 and BERT, looked like (via bonkerfield.org)

The results are astonishing! Will’s bot sounds like a real youngster internet troll! Do have a look at the original blog, but here are some examples. Note that tupperware-party — the Reddit user from the above example — is actually Will’s AI.

COMMENT: 'Dune’s fandom is old and intense, and a rich thread in the cultural fabric of the internet generation' BOT_REPLY:'Dune’s fandom is overgrown, underfunded, and in many ways, a poor fit for the new, faster internet generation.'
bot responds to specific numerical bullet point in source comment

Will ends his blog with a link to the tutorial if you want to build such a bot yourself. Have a try!

Moreover, he also notes the ethical concerns:

I know there are definitely some ethical considerations when creating something like this. The reason I’m presenting it is because I actually think it is better for more people to know about and be able to grapple with this kind of technology. If just a few people know about the capacity of these machines, then it is more likely that those small groups of people can abuse their advantage.

I also think that this technology is going to change the way we think about what’s important about being human. After all, if a computer can effectively automate the paper-pushing jobs we’ve constructed and all the bullshit we create on the internet to distract us, then maybe it’ll be time for us to move on to something more meaningful.

If you think what I’ve done is a problem feel free to email me , or publically shame me on Twitter.

Will Stedden via bonkerfield.org/2020/02/combining-gpt-2-and-bert/

Google Fonts: 915 free font families

Google Fonts: 915 free font families

Looking for a custom typeface to use in your data visualizations? Google Fonts is an awesome databank of nearly a thousands font families you can access, download, and use for free.

If you’re into design, the website includes a blog featuring articles on font design.

Google Fonts among others provided the font for my dissertation cover so I definitely recommend it.

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

November 9th 2018, I defended my dissertation on data-driven human resource management, which you can read and download via this link. On page 149, I discuss several of the issues we face when implementing machine learning and analytics within an HRM context. For the references and more detailed background information, please consult the full dissertation. More interesting reads on ethics in machine learning can be found here.


Privacy, Compliance, and Ethical Issues

Privacy can be defined as “a natural right of free choice concerning interaction and communication […] fundamentally linked to the individual’s sense of self, disclosure of self to others and his or her right to exert some level of control over that process” (Simms, 1994, p. 316). People analytics may introduce privacy issues in many ways, including the data that is processed, the control employees have over their data, and the free choice experienced in the work place. In this context, ethics would refer to what is good and bad practice from a standpoint of moral duty and obligation when organizations collect, analyze, and act upon HRM data. The next section discusses people analytics specifically in light of data privacy, legal boundaries, biases, and corporate social responsibility and free choice.

Data Privacy

Technological advancements continue to change organizational capabilities to collect, store, and analyze workforce data and this forces us to rethink the concept of privacy (Angrave et al., 2016; Bassi, 2011; Martin & Freeman, 2003). For the HRM function, data privacy used to involve questions such as “At what team size can we use the average engagement score without causing privacy infringements?” or “How long do we retain exit interview data?” In contrast, considerably more detailed information on employees’ behaviors and cognitions can be processed on an almost continuous basis these days. For instance, via people analytics, data collected with active monitoring systems help organizations to improve the accuracy of their performance measurement, increasing productivity and reducing operating costs (Holt, Lang, & Sutton, 2016). However, such systems seem in conflict with employees’ right to solitude and their freedom from being watched or listened to as they work (Martin & Freeman, 2003) and are perceived as unethical and unpleasant, affecting employees’ health and morale (Ball, 2010; Faletta, 2014; Holt et al., 2016; Martin & Freeman, 2003; Sánchez Abril, Levin, & Del Riego, 2012). Does the business value such monitoring systems bring justify their implementation? One could question whether business value remains when a more long-term and balanced perspective is taken, considering the implications for employee attraction, well-being, and retention. These can be difficult considerations, requiring elaborate research and piloting.

Faletta (2014) asked American HRM professionals which of 21 data sources would be appropriate for use in people analytics. While some were considered appropriate from an ethical perspective (e.g., performance ratings, demographic data, 360-degree feedback), particularly novel data sources were considered problematic: data of e-mail and video surveillance, performance and behavioral monitoring, and social media profiles and messages. At first thought, these seem extreme, overly intrusive data that are not and will not be used for decision-making. However, in reality, several organizations already collect such data (e.g., Hoffmann, Hartman, & Rowe, 2003; Roth et al., 2016) and they probably hold high predictive value for relevant business outcomes. Hence, it is not inconceivable that future organizations will find ways to use these data for personnel-related decisions – legally or illegally. Should they be allowed to? If not, who is going to monitor them? What if the data are used for mutually beneficial goals – to prevent problems or accidents? These and other questions deserve more detailed discussion by scholars, practitioners, and governments – preferably together.

Legal Boundaries

Although HRM professionals should always ensure that they operate within the boundaries of the law, legal compliance does not seem sufficient when it comes to people analytics. Frequently, legal systems are unprepared to defend employees’ privacy against the potential invasions via the increasingly rigorous data collection systems (Boudreau, 2014; Ciocchetti, 2011; Sánchez Abril et al., 2012). Initiatives such as the General Data Protection Regulation in the European Union somewhat restore the power balance, holding organizations and their HRM departments accountable to inform employees what, why, and how personal data is processed and stored. The rights to access, correct, and erase their information is returned to employees (GDPR, 2016). However, such regulation may not always exist and, even if it does, data usage may be unethical, regardless of its legality.

For instance, should organizations use all personnel data for which they have employee consent? One could argue that there are cases where the power imbalance between employers and employees negates the validity of consent. For instance, employees may be asked to sign written elaborate declarations or complex agreements as part of their employment, without being fully aware of what they consent to. Moreover, employees may feel pressured to provide consent in fear of losing their job, losing face, or peer pressure. Relatedly, employees may be incentivized to provide consent because of the perks associated with doing so, without fully comprehending the consequences. For instance, employees may share access to personal behavioral data in exchange for mobile devices, wellness, or mobility benefits, in which case these direct benefits may bias their perception and judgement. In such cases, data usage may not be ethically responsible, regardless of the legal boundaries, and HRM departments in general and people analytics specialists in specific should take the responsibility to champion the privacy and the interests of their employees.

Automating Historic Biases

While ethics can be considered an important factor in any data analytics project, it is particularly so in people analytics projects. HRM decisions have profound implications in an imbalanced relationship, whereas the data within the HRM field often suffer from inherent biases. This becomes particularly clear when exploring applications of predictive analytics in the HRM domain.

For example, imagine that we want to implement a decision-support system to improve the efficiency of our organization’s selection process. A primary goal of such a system could be to minimize the human time (both of our organizational agents and of the potential candidates) wasted on obvious mismatches between candidates and job positions. Under the hood, a decision-support system in a selection setting could estimate a likelihood (i.e., prediction) for each candidate that he/she makes it through the selection process successfully. Recruiters would then only have to interview the candidates that are most likely to be successful, and save valuable time for both themselves and for less probable candidates. In this way, an artificially intelligent system that reviews candidate information and recommends top candidates could considerably decrease the human workload and thereby the total cost of the selection process.

For legal compliance as well as ethical considerations, we would not want such a decision-support system to be biased towards any majority or minority group. Should we therefore exclude demographic and socio-economic factors from our predictive model? What about the academic achievements of candidates, the university they attended, or their performance on our selection tests? Some of those are scientifically validated predictors of future job performance (e.g., Hunter & Schmidt, 1998). However, they also relate to demographic and socio-economic factors and would therefore introduce bias (e.g., Hough, Oswald, & Ployhart, 2001; Pyburn, Ployhart, & Kravitz, 2008; Roth & Bobko, 2000). Do we include or exclude these selection data in our model?

Maybe the simplest solution would be to include all information, to normalize our system’s predictions within groups afterwards (e.g., gender), and to invite the top candidates per group for follow-up interviews. However, which groups do we consider? Do we only normalize for gender and nationality, or also for age and social class? What about combinations of these characteristics? Moreover, if we normalize across all groups and invite the best candidate within each, we might end up conducting more interviews than in the original scenario. Should we thus account for the proportional representation of each of these groups in the whole labor population? As you notice, both the decision-support system and the subject get complicated quickly.

Even more problematic is that any predictive decision-support system in HRM is likely biased from the moment of conception. HRM data is frequently infested with human biases as bias was present in the historic processes that generated the data. For instance, the recruiters in our example may have historically favored candidates with a certain profile, for instance, red hair. After training our decision-support system (i.e., predictive model) on these historic data, it will recognize and copy the pattern that candidates with red hair (or with correlated features, such as a Northwest European nationality) are more likely successful. The system thus learns to recommend those individuals as the top candidates. While this issue could be prevented by training the model on more objective operationalization of candidate success, most HRM data will include its own specific biases. For example, data on performance ratings will include not only the historic preferences of recruiters (i.e., only hired employees received ratings), but also the biases of supervisors and other assessors in the performance evaluation processes. Similar and other biases may occur in data regarding promotions, training courses, talent assessments, or compensation. If we use these data to train our models and systems, we would effectively automate our historic biases. Such issues greatly hinder the implementation of (predictive) people analytics without causing compliance and ethical issues.

Corporate Social Responsibility versus Free Choice

Corporate social responsibility also needs to be discussed in light of people analytics. People analytics could allow HRM departments to work on social responsibility agendas in many ways. For instance, people analytics can help to demonstrate what causes or prevents (un)ethical behavior among employees, to what extent HRM policies and practices are biased, to what extent they affect work-life balance, or how employees can be stimulated to make decisions that benefit their health and well-being. Regarding the latter case, a great practical example comes from Google’s people analytics team. They uncovered that employees could be stimulated to eat more healthy snacks by color-coding snack containers, and that smaller cafeteria plate sizes could prevent overconsumption and food loss (ABC News, 2013). However, one faces difficult ethical dilemmas in this situation. Is it organizations’ responsibility to nudge employees towards good behavior? Who determines what good entails? Should employees be made aware of these nudges? What do we consider an acceptable tradeoff between free choice and societal benefits?

When we consider the potential of predictive analytics in this light, the discussion gets even more complicated. For instance, imagine that organizations could predict work accidents based on historic HRM information, should they be forbidden, allowed, or required to do so? What about health issues, such as stress and burnout? What would be an acceptable accuracy for such models? How do we feel about false positive and false negatives? Could they use individual-level information if that resulted in benefits for employees?

In conclusion, analytics in the HRM domain quickly encounters issues related to privacy, compliance, and ethics. In bringing (predictive) analytics into the HRM domain, we should be careful not to copy and automate the historic biases present in HRM processes and data. The imbalance in the employment relationship puts the responsibility in the hands of organizational agents. The general message is that what can be done with people analytics may differ from what should be done from a corporate social responsibility perspective. The spread of people analytics depends on our collective ability to harness its power ethically and responsibility, to go beyond the legal requirements and champion both the privacy as well as the interests of employees and the wider society. A balanced approach to people analytics – with benefits beyond financial gain for the organization – will be needed to make people analytics accepted by society, and not just another management tool.

Machine Learning and AI courses at Google

Machine Learning and AI courses at Google

Google has announced to provide open access to its artificial intelligence and machine learning courses. On their overview page, you will find many educational resources from machine learning experts at Google. They announced to share AI and machine learning lessons, tutorials and hands-on exercises for people at all experience levels. Simply filter through the resources and start learning, building and problem-solving.

For instance, up your game straight away with this 15-hour Machine Learning crash course. Zuri Kemp – who leads Google’s machine learning education program – said that over 18,000 Googlers have already enrolled in the course. Designed by the engineering education team, the courses explores loss functions and gradient descent and teached you to build your own neural network in Tensorflow.