Test Validation and Development
Let’s begin the journey of validation with three distinct but related definitions of the concept. This will include practical, legal, and academic definitions of validity.
Practically speaking, a valid selection procedure is one that measures the actual requirements of the job in a fair and reliable way. A valid selection procedure is one that “hits the mark,” and does it consistently, with the mark being the core, essential requirements for a given position that are targeted by the selection procedure. A valid selection procedure effectively measures the net qualifications that are really needed for the job, and not much more or less.
In the legal realm, a selection procedure is valid if it can be proven by an employer in litigation that it is “. . . job related and consistent with business necessity” (to address the requirements of the 1991 Civil Rights Act, Section 703[k][A][i]). This standard is usually met (or not) by arguing how the selection procedure first addresses the Uniform Guidelines* (1978), followed by professional standards (i.e., the Standards and Principles, discussed below), then by parallel or lower courts that have applied the standard in various settings.
Academically, the Principles (2003) and Standards (1999) have adopted the same definition for validity: “The degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test” (p. 184).
* While the Uniform Guidelines do not formally constitute a set of legal requirements, they have consistently been awarded “great deference” starting as early as the Griggs v. Duke Power Company (401 US 424, 1971) case. They have also been unilaterally adopted verbatim as a legal standard in several cases—e.g., Brown v. Chicago (WL 354922, N.D. III, 1998).
Overview of the Mechanics of Content and Criterion-Related Validity
Subsequent sections within BCGi Resources describe in detail how to validate various selection procedures, so only a cursory overview of validation mechanics is provided here, and this is provided only as a primer to the subsequent, more advanced discussions.
How is a content validation study conducted? What are the mechanistic parts involved? What are the basic elements of a criterion-related validity study? Mechanically speaking, content and criterion-related validity are very different. Let’s take a look at how they differ based on how they are constructed.
A content validity study is conducted by linking the essential parts of a job analysis (the job duties and/or knowledges, skills, and abilities) to the selection procedure. Thus, content validity is formed by creating a nexus between the job and the selection procedure. It relies on a process that requires Job Experts (incumbents or immediate supervisors) to provide judgments (usually by providing ratings on surveys) regarding if and how well the selection procedure represents and measures the important parts of the job.
A word processing test that measures skills in using word processing software to edit and format business correspondence would likely be content valid for a clerical worker’s job if they perform these functions. An entry-level physical ability test measuring fire scene physical performance uses a content validity approach for the position of firefighter.
Criterion-related validity is statistical. This type of validity is achieved when a selection procedure is statistically correlated with important aspects of job performance at a level that is “statistically significant” (with a probability value less than .05). One interesting benefit of this type of validity is that the employer is not pressed to define exactly what the selection procedure is measuring! While is it always a very good idea to know and describe to applicants the KSAPCs that are measured by the selection procedure, it is not a requirement to do so because the selection procedure is scientifically related to job performance. By contrast, content validity has specific requirements for the employer to show and describe exactly what KSAPCs are being measured by the selection procedure and how they related to the job (see 15C4 – 5 of the Uniform Guidelines).
Criterion-related validity can be achieved by correlating selection procedure scores to several different types of job performance measures, including both subjective and objective measures. The most typical subjective performance measures include supervisor ratings and/or peer ratings of work products (quality and/or quantity) or job performance, and performance review scores.* Objective measures can include quantifiable work output measures (e.g., number of widgets produced per hour), quality-related measures (e.g., number of widgets returned because of defects), absenteeism, turnover, disciplinary actions, safety incidents, and other aspects of performance that are gathered and recorded in a uniform and consistent manner.
* It is important to note that the Uniform Guidelines require that criterion measures consist of actual job performance, not ratings of the overall knowledge, skill, or abilities of the incumbents (see Section 15B).
Benefits of the Validation Process
Now that validation has been (briefly) defined, what is the value for the employer? Why validate selection procedures? “Validation is expensive” and “We are only required to validate a selection procedure if it has adverse impact” (a true statement) are statements that personnel consultants hear frequently. With formal validation studies sometimes costing in the tens of thousands of dollars, these are all legitimate concerns.
Validation generates two major benefits for the employer. First, validation helps insure that the selection process is measuring key, relevant job requirements in a reliable and consistent manner. This, of course, helps screen better workers into the workforce. Even if the validation process increases the effectiveness of a selection process only slightly, the results over years and hundreds of applicants can sometimes be astounding. Second, the validation process generates evidence (for use in litigation) that the selection procedures are “. . . job related and consistent with business necessity” (to address the requirements of the 1991 Civil Rights Act, Section 703[k][A][i]).
Related to this benefit, validated selection procedures can also dissuade potential plaintiffs from even beginning the lawsuit process if the relationship between the selection procedure and the job is sometimes self-evident (called “face validity”). Applicants are much less likely to challenge a selection procedure if it “smells and looks like the actual job.” Likewise, plaintiff attorneys will be discouraged to gamble the time and money necessary to wage a “validation war” if the employer has conducted good-faith validation studies.
Professional Standards for Validation
In the early 1950s, three different aspects of validity were discussed—content, criterion-related, and construct (Principles, 2003, p. 5). From the 1950s to the publication of the 1978 Uniform Guidelines, these three remained as the concrete, “tried and true” validation strategies (especially in litigation settings). While the Uniform Guidelines set down these validation ground rules in 1978, the government foretold that the educational and personnel testing fields would continue to advance the science and art of validation, so it left a loophole for their future developments in framing the criteria that will be used for validating selection procedures:
For the purposes of satisfying these guidelines, users may rely upon criterion-related validity studies, content validity studies or construct validity studies, in accordance with the standards set forth in the technical standards of these guidelines, section 14 of this part. New strategies for showing the validity of selection procedures will be evaluated as they become accepted by the psychological profession (Section 5A).
Fulfilling this expectation, the psychological community authored the 1985 version of the Standards for Educational and Psychological Testing (published by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education) and Division 14 of the American Psychological Association (the Society for Industrial and Organizational Psychology, or SIOP) published the Principles for the Validation and Use of Personnel Selection Procedures (1987).
These two documents advanced the testing field to the current state of validation at that time. Fourteen years later (in 1999), the Standards were substantially updated. Following suit, the Principles received a major update 16 years later in 2003. While published by different associations, the Principles and Standards are virtually in agreement regarding the key aspects of validity (Principles, 2003, p. 4). Part of the motivating factor behind the publication of the new Principles was to provide an update to the earlier (1987) version based on the newly published Standards (1999).
At the heart of these two documents is how they define validity. Both the Standards and the Principles share the same voice on this matter, stating their current definition of validity as no longer the three conventional types of validity (like those discussed in the Uniform Guidelines, below), but moving to “. . . validity as a unitary concept with different sources of evidence contributing to an understanding of the inferences that can be drawn from a selection procedure” (Principles, 2003, p. 4).
The Standards and Principles allow five different “sources of evidence” to generate validity evidence under this “unitary concept” umbrella:
- Relationships between predictor scores and other variables, such as selection procedure-criterion relationships;
- Content (meaning the questions, tasks, format, and wording of questions, response formats, and guidelines regarding administration and scoring of the selection procedure. Evidence based on selection procedure content may include logical or empirical analyses that compare the adequacy of the match between selection procedure content and work content, worker requirements, or outcomes of the job);
- Internal structure of the selection procedure (e.g., how well items on a test cluster together);
- Response processes (examples given in the Principles include (a) questioning test takers about their response strategies, (b) analyzing examinee response times on computerized assessments, or (c) conducting experimental studies where the response set is manipulated); and
- Consequences of testing (Principles, 2003, p. 5).
The Principles explain that these five “sources of evidence” (used for showing validity under the unitary validity concept) are not distinct types of validity, but rather “. . . each provides information that may be highly relevant to some proposed interpretations of scores, and less relevant, or even irrelevant to others” (p. 5).
Uniform Guidelines Requirements for Validation
The current government treatise for validation is the 1978 Uniform Guidelines. This document was assembled by a mutual effort by the US Equal Employment Opportunity Commission (EEOC), Civil Service Commission, Department of Labor, and Department of Justice. The goal of publishing the Uniform Guidelines was to provide an objective standard by which testing and adverse impact concepts could be defined and used for government enforcement, arbitration, and litigation. Numerous earlier texts and enforcement guidelines existed prior to the Uniform Guidelines, but it is safe to say that the Uniform Guidelines constituted the most definitive treatise when published in 1978. The Uniform Guidelines remain mostly unchanged (only a few minor updates are pending at the time of this writing, which will constitute the first change since their original publication).
Three (3) primary forms of validation are presented in the Uniform Guidelines: content, criterion-related, and construct-related (listed in the order most frequently used by employers):
Content validity. Demonstrated by data showing that the content of a selection procedure is representative of important aspects of performance on the job. See section 5B and section 14C.
Criterion-related validity. Demonstrated by empirical data showing that the selection procedure is predictive of or significantly correlated with important elements of work behavior. See sections 5B and 14B.
Construct validity. Demonstrated by data showing that the selection procedure measures the degree to which candidates have identifiable characteristics which have been determined to be important for successful job performance. See section 5B and section 14D.
Blending the Professional and Government Validation Standards into Practice
How are the professional standards different from the government standards? How are they similar? All three types of validation described in the Uniform Guidelines are also contained in the professional standards (the Principles and Standards):
- The content validity described in the Uniform Guidelines is similar to the “validation evidence” source #2 and #5 (to a limited degree) of the professional standards.
- The criterion-related validity described in the Uniform Guidelines is similar to #1 and #5 of the professional standards.
- The construct validity described in the Uniform Guidelines is similar to #1, #3, and #5 of the professional standards.
When conducting a validation study, which set of standards should a practitioner be most concerned about? The Principles? Standards? Uniform Guidelines? The conservative answer is all three. If one had to choose a “primary set” of criteria, here are a few reasons to consider using the Uniform Guidelines:
- They have the backing of the US government (the EEOC, OFCCP, Department of Labor, Department of Justice, and nearly every state fair employment office).
- They are regularly used as the set of criteria for weighing validity studies during enforcement audits conducted by the OFCCP and numerous other state fair employment offices.
- They have been afforded “great deference” by the courts and have consistently been used as the measuring stick by the courts for assessing the merit of validity studies. They have been referenced thousands of times in judicial documents. By contrast, as of the year 2000, the Principles have only been referenced in 13 federal court cases; the Standards have been referenced in ten, and they have sometimes been viewed as “lower on the totem pole” than the Uniform Guidelines.*
- If practitioners seek to address only the criteria in the Uniform Guidelines when conducting a validation study, there is a high likelihood that the key elements of the Standards and Principles will also be addressed (the reciprocal is also true, but only for some “sources of validation evidence” espoused by the professional standards).
This endorsement is provided with some hesitation, because the Principles and Standards offer a far more exhaustive set of guidelines and regulations than the Uniform Guidelines, and provide more complete guidance for many unique situations that emerge in testing situations.
Nonetheless, for these reasons stated above, the Uniform Guidelines are the primary set of criteria that will be addressed throughout this text as the standard for completing validation studies. Of the three validation types proposed in the Uniform Guidelines, only content and criterion-related validity will be reviewed. Construct validity will not be discussed further for a few key reasons. First, the author is not aware of any EEO-related case where a judge has endorsed a validation study based solely on construct validity. Because the concept is highly academic and theoretical, it is difficult for even advanced practitioners to build selection procedures based solely on construct validity. With this being the case, expert witnesses will find themselves hard-pressed to explain such concepts to a judge! Second, if one were to ask 100 validation experts to define construct validity, 50 or more unique definitions would probably emerge. Some would even contradict each other. Third, most forms of construct validity require some type of criterion-related validity evidence. This begs the question: why not just use criterion-related validity in the first place? For these reasons, the reader is referred to other texts if they desire to review the concept in more depth (see Cascio, 1998, pp. 108-111; Gatewood & Feild, 1994, pp. 220-221).
* For example, in Lanning v. Southeastern Pennsylvania Transportation Authority (181 F.3d 478, 80 FEPC., BNA, 221, 76 EPD P 46,160 3rd Cir.(Pa.) Jun 29, 1999 (NO. 98-1644, 98-1755), the court stated: “The District Court seems to have derived this standard from the Principles for the Validation and Use of Personnel Selection Procedures (“SIOP Principles”) . . . To the extent that the SIOP Principles are inconsistent with the mission of Griggs and the business necessity standard adopted by the Act, they are not instructive” (FN20).