Evaluating emerging technologies across four pillars of effective selection: validity, fairness, utility and reactions
Every once in a while, a new technology, an old problem, and a big idea turn into an innovation. – Dean Kamen, American inventor
Selection practitioners are operating in a dynamic time where new technologies and big ideas are emerging daily. Some of these innovations are developed by industrial and organizational (I/O) psychologists and many are not. These advances in technology have introduced new techniques, tools and challenges to selection practitioners. To be sure, people have not changed; we still vary on a limited number of job-related knowledge, skills, abilities and other (KSAO's) characteristics; and measuring those attributes still helps organizations predict the relative probability of success for the candidates they consider hiring. However, until recently, many of our measurement practices and procedures relied on established I/O practices with only slight deviations and small steps forward. From paper-and-pencil testing to the early days of online proctored testing to the widespread adoption of unproctored testing; our selection technology has evolved slowly until the last decade or so. In the last 10-years, the assessment landscape has grown through a rapid increase in technology-enabled processes and measurement techniques available to organizations. New tools, new techniques and new problems have surfaced and continue to redefine the “state of the art” in applied personnel selection. It has become a full-time job for selection practitioners to consume and consider these advances to stay relevant in a quickly changing selection context with innovation coming within and outside the field of I/O Psychology.
As technology evolves, organizations take advantage of efficiencies by automating everything from manufacturing processes to Human Resources payroll systems to artificial intelligence-driven marketing practices. As organizations realize the business problems emerging technology can solve, organizations also expect selection practitioners to apply advanced technology to the selection system. Unfortunately, in this time of fast-paced technological advancement, organizations often demand the adoption of new practices without first (or ever) performing the rigorous scientific research that is the hallmark of our field. This demand for new technology in the selection system requires our science to adapt and aggressively research these new developments to sift through the snake oil to find the new technologies that truly add value in solving the problems organizations face. People and talent take up the largest percentage of any organization's P&L sheet, and organizations will always pursue innovations to gain a competitive advantage in the selection of talent, whether or not those innovations are endorsed by I/O research. This growing demand for technology-enabled selection processes creates a challenge for I/O psychologists to show value, establish credibility, and rethink tried and true approaches in order to stay relevant.
This special issue was a call out to the I/O community to share research related to emerging technologies in personnel selection. The papers selected for this issue present research related to technology-enabled predictors such as interviews, situational judgment, personality and serious games. The authors also examine the use of technological advances such as video, audio, mobile-first design and gamification. Lastly, the special issue explores some unique individual differences measures relying on technology-enabled administration, which have not been possible previously with conventional measurement strategies.
Regardless of the technology involved, selection processes should always be evaluated on the same criteria that are foundational to the science of I/O Psychology. We refer to these criteria as four pillars of personnel selection. The first three have been guiding personnel psychologists for decades: (1) validity; (2) utility and (3) fairness. The fourth, (4) candidate reactions, has emerged in recent years; some might say in response to the utilization of technology in selection processes. When technology is used to enhance a selection process, it impacts these criteria in some manner. Next, we will discuss each pillar in relation to how it is impacted by the technological innovations.
Validity
Validity refers to the psychometric accuracy of the selection instruments and their relevance to the position of interest. In general, valid predictors are accurate at measuring what they are intended to measure (construct), measuring the appropriate criterion domain for the target position (content) and related to job performance (criterion).
Technology can influence validity by improving measurement or detracting from it. For example, technological advances have provided an easy way to standardize test administration, reducing error by eliminating the reliance on a proctor to give instructions. Additionally, technology-enabled (e.g. computerized) assessments also ushered in automated scoring, reducing measurement error by eliminating subjective evaluations of constructs and/or manual scoring. These error/bias reductions make it easier for organizations to follow consistent procedures without threats to internal validity.
However, the benefits of consistent administration are only present for selection situations where all candidates are using the same device and are in similar testing environments. In recent years, research has focused on measurement equivalence across devices, specifically mobile versus non-mobile. Mobile assessment research has found that measurement properties can differ for some item types by device such as traditional cognitive (e.g. Arthur et al., 2017; Illingworth et al., 2015) and interactive simulations (Chang et al., 2016; O'Connell et al., 2016). In these cases, mobile device users were at a disadvantage. While more research is needed, one can conclude that technology has the potential to affect validity both positively and negatively.
Lastly, perhaps the greatest potential for technology to impact validity is by leveraging new capabilities to change the way we measure KSAO's. Aside from resolving or creating threats to internal validity, new technology can also enhance prediction of important outcomes by building technology-enabled predictors of KSAO's that are not possible to measure with conventional testing strategies (e.g. simulation technology, trace data) (O'Connell et al., 2013). By using technology to build, for example, complex multitasking simulations and measures of the human information processing system; selection practitioners can enhance their coverage of the predictor domain in ways never before possible.
Utility
The utility pillar, as discussed here, is related to the return-on-investment (ROI) to the organization using a selection tool. A selection process with strong utility has a nice balance between accuracy and efficiency. Any time or cost invested by the company is “worth it” by getting valuable information about a candidate. Metrics that help organizations determine ROI are pass rates by selection phase, cost-per-hire and time in process. Utility is influenced by factors that relate to the resources required to administer the selection process, such as the cost of assessment tools, length of time in each phase, salary costs of recruiters involved in the process and high/low pass rates. For example, even a highly accurate assessment does not provide ROI if the cut score is set such that everyone passes it.
Many technology advancements impact utility because they often decrease candidate time in process and reduce recruiter salary costs by automating components of the process. Technology is used at almost every stage of the selection process starting with the resume screen and ending with the final reference and background checks. Any technological advances that are related to administration and scoring are likely to influence utility in a positive way.
Fairness
We focus our consideration of fairness on the legal compliance of a selection process. All assessment tools used to make selection decisions must be job-related and should minimize subgroup differences. Organizations are required to use valid predictors resulting in the least amount of adverse impact against protected classes. Decision strategies that select a disproportionate number of one protected subgroup over another (e.g. significantly more women pass than men) may result in compliance problems if an organization's process is challenged.
Recent technological advancements have had both positive and negative influence on fairness. Like with validity, the consistency that comes with many technology-enabled administration practices can remove subjective bias from the decision-making process. For example, an unproctored, computerized assessment is unaware of the candidate's race, age or gender when calculating scores based on a mathematical algorithm. In this situation, one could say that all qualified candidates have the same chance of passing the assessment and making it to the next stage of the process regardless of class membership.
On the other hand, when certain technology is required to complete a selection process (e.g. device types, specific software), it could disadvantage candidates who do not have access to the technology. In recent years, access to technology has been examined with regard to race and gender. Survey results suggest that these protected groups are more likely than other groups to use mobile devices as their main Internet source (Pew, 2015). Given that user contexts, such as smaller screen sizes tend to disadvantage candidates; these technologies can have a substantial impact on fairness. If an organization allows assessment administration on mobile devices (or cannot stop it), they could recruit a more diverse candidate pool, only to disproportionately disadvantage this more diverse segment of the candidate pool due to the challenges of taking an assessment on a small screen. Selection practitioners must be vigilant about investigating the fairness of any new technology applied to the selection process.
Candidate reactions
How candidates react to a selection tool or process is a relatively recent consideration in evaluating a selection system. As organizations pay more attention to their employment brand, positive and negative reactions of candidates have become a critical criterion in evaluating selection procedures. Positive reactions mean that qualified candidates are more likely to continue through the process and accept a job, while negative reactions can lead to the loss of skilled candidates as well as potential customers. Organizations seek selection procedures that strengthen the relationship between job candidates and the organization. Selection practitioners have become increasingly interested in understanding candidate reaction data; such as perceptions of job-relatedness and fairness, opinions about whether or not the candidate felt they had an opportunity to perform and candidate drop-out rates.
Candidates tend to react positively to interactive and highly job relevant assessments. Candidates also like having the opportunity to perform – they like to be able to “show” or discuss what they can do for the company (Hausknecht et al., 2004). Technology has the ability to create work samples and simulations that meet these criteria. However, technology can also be used to speed up the process and reduce human interaction, which can then lead to more negative reactions. This pattern is seen in research around asynchronous interviews (Blacksmith et al., 2016). As such, it is important to consider how the technology is implemented and how it affects these critical considerations when leveraging it within a selection process.
As new technologies emerge and are incorporated into the development of predictors or used to improve selection system design, practitioners and researchers should examine their impact on these four pillars characterizing effective selection systems. It is important to understand the potential consequences of new technologies and appropriately set expectations with the organizations using them.
What we learn in this issue
Our goal for this issue was to provide practitioners with a group of research studies that could help guide their use of emerging technologies in applied settings. We summarize the results of each paper within the pillar framework to provide some specific guidance in these key areas.
A few studies in this issues focused on developing predictors for deployment on mobile devices and leveraging gamification techniques. Martin, Capman, Boyce, Morgan, Gonzalez and Adler used a mobile-first design to develop a predictor of working memory as an alternative to traditional cognitive ability. They leveraged technology to focus on a construct that could only be measured with a technological device and designed the tool to maximize measurement equivalence across various user contexts and device types. This approach is a common theme in the set of papers in this issue. Across three of the studies presented here, the researchers established positive reactions, construct validity, criterion-related validity and measurement equivalence across device types. From a fairness perspective, Morgan (et al.) reported subgroup differences substantially lower than those reported for traditional cognitive measures. The results of this study suggest that a technology-enabled measure of working memory is a valid predictor, elicits positive reactions and results in fewer subgroup differences. This paper is a great example of how selection researchers can leverage technology to improve several of the four pillars of effective selection.
Weidner and Landers also focused on designing a predictor specifically for deployment on mobile devices, however, these authors investigated personality measurement. They converted a well-known personality measure into a forced-choice “swipe right” item type designed for ease of use on small screen user contexts. These researchers also leveraged the technology to measure response latency. They examined the construct validity of the new measure and found support for the item type and additionally, they found that response latency contributed to the convergence of their measure with conventional personality measurement. In their review of candidate reactions, they did not find that participants reacted more positively to “swipe right” item type, but this more mobile-friendly approach also did not negatively impact reactions relative to a traditional Likert-type response format. This study provides additional support for the value of mobile-first design in producing construct valid measures of personality across a variety of user contexts. This study also demonstrates that using game mechanics in designing an item type will not always improve assessment reactions, even if the “look and feel” of the assessment is enhanced.
Similar to Weidner and Landers, Landers, Auer and Abraham added game mechanics to traditional SJT item types to examine the effect on measurement and participant reactions. This research examined varying levels of immersion (e.g. text, audio, video) in a set of established SJT scenarios. Their results showed no measurement deterioration, supporting the construct validity of the measures where gamification interventions were applied. However, like the study by Weidner (et al.), the participant reaction results did not show strong positive reactions or significantly more positive reactions when the game elements were deployed. Their findings suggest that it is important for practitioners to consider the utility of gamification. The time and cost involved in adding game mechanics to tools that were not designed as games may not yield the level of positive results that organizations might expect.
The Ellison, McClure Johnson, Tomczak, Siemsen and Gonzalez research, which worked to confirm Gilliland's (1993) model of applicant reactions for a series of game-based assessments (GBAs), may provide a better understanding of why the previous studies did not observe expected increases in participant reactions. Ellison and colleagues' results found support for the general model and the tenant that fairness mediates reactions to assessments. Their results also show that job-relatedness and opportunity to perform had the strongest relationships with participant reactions, meaning that reactions will be most positive when the predictors are perceived as job-related and give participants the opportunity to demonstrate their skills. Additionally, their results suggest that GBAs could be perceived less positively by women than men. Practitioners who are interested in maximizing positive candidate reactions should consider the predictor design and potential gender differences with GBAs. It is clear from the Landers, Auer and Abraham and Ellison et al. papers that gamification is not the recipe for improved candidate reactions that organizations often assume. Rather, if the game elements do not enhance perceptions of fairness or provide an increased opportunity to perform, the intervention may have the opposite outcome.
As previously discussed, some individual difference measures would not be available to practitioners without advanced technology (e.g. response latency and working memory). Reddock, Auer and Landers encourage this leveraging of technology to measure in unique ways. These authors leverage technology to build branching situational judgment items. They provide a theoretical framework for understanding different branching options, all of which require the use of technology to provide adequate measurement and fidelity to the candidate. This qualitative paper presents a framework that includes concepts related to procedural justice and face validity, which impact participant reactions (Gilliland, 1993). As such, their model suggests that branching would lead to positive reactions when used effectively. Additionally, branching could allow for the measurement of individual differences in cognitive strategy and judgment that are more difficult to measure.
The last three studies are all related to technology-enabled interviews and participant reactions to this popular new approach. Langer, König and Hemsing conducted an experiment to examine reactions to an asynchronous interview. Participants recorded answers to a set of questions and then provided reactions to the experience. The researchers' results suggest that participants reported lower use of impression management in the technology-enabled interview, but they had more negative reactions to the interview due to reduced opportunity to perform. This study suggests to practitioners that there could be trade-offs when implementing asynchronous interviews. While they may provide utility for the organization, they could lead to more negative reactions from participants. Additionally, the reduced impression management could have a positive effect on the construct measurement (validity) of the interview.
Basch, Melchers, Kegelmann and Lieb also conducted an experiment comparing participant reactions to face-to-face, videoconference and asynchronous interviews. Their results suggest that when it comes to interviews, participants prefer options with more social interaction. Face-to-face interviewing was the most preferred and asynchronous interviewing was least preferred. These results confirm the practical implication of there being a trade-off between reactions and technology-enabled interviewing. The technology may improve the utility pillar, but it could decrease the reactions pillar. Practitioners should keep this in mind when designing selection systems.
Lastly, another study led by Langer (Langer, König, Sanchez and Samadi) conducted a study to compare participants' reactions to a video interview and a highly automated interview. Like the other two studies, their results suggest a trade-off between efficiency gains achieved through interview technology and participant interview reactions. While candidates do perceive machines as more consistent than people, they expect interpersonal contact (social presence) when they are told that they will participate in an interview.
What are the pearls of wisdom resulting from this set of papers to share with I/O practitioners looking for ways to build effective, technology-enabled selection systems? A general theme from the papers in this issue is that it is important to identify the selection system criteria (pillar) of greatest importance when building a selection system. All of the papers present information that show that technology can be used to improve one pillar, but often decreases another pillar. It is critically important to evaluate the likely impact of any new techniques across each of the four pillars to determine the likely trade-offs and how they relate to the organization's goals.
If your goal is prediction, gamification alone may not be enough to improve measurement and it may not gain you much in terms of reactions if the intervention does not increase opportunity to perform.
If you are interested in emerging predictors like GBAs, design them with device usage in mind. Establish measurement equivalence across user contexts for any new technique or you risk improving reactions at the cost of validity and/or fairness.
Validity gains may come from leveraging technology to measure new constructs/individual differences that are not possible with conventional measurement techniques. However, any such measure should be evaluated across the fairness, utility and reactions pillars before it is implemented.
The addition of technology meant to improve the efficiency of a selection process may improve utility metrics (e.g. time in process or cost per hire) but it may do so at the cost of applicant reactions. Several of the studies in this issue found that their is a trade off between utility and reactions when using technology-enabled interviewing practices. The papers in this issues suggest that improvements in participant reactions are possible when technology enhancements are perceived as procedurely fair, especially with regard to job relevance and opportunity to perform. Organizations should evaluate whether or not improvements on this pillar are worth the expense if they do not also improve validity, fairness or utility. Different organizations will make different decisions based on their specific talent needs, industry and labor market.
