Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Evaluating Selection Techniques and Decisions
Transcript of Evaluating Selection Techniques and Decisions
• Legally defensible
Characteristics of Effective Selection Techniques
Establishing the Usefulness of a Selection Device
Even when a test is both reliable and valid, it is not necessarily useful.
The degree to which inferences from scores on tests or assessments are justified by the evidence.
But just because a test is reliable does not mean it is valid.
There are five common strategies to investigate the validity of scores on a test: content, criterion, construct, face, and known-group.
Evaluating Selection Techniques and Decisions
- the extent to which a score from a selection measure is stable and free from error
- Therefore, reliability is an essential characteristic of an effective measure.
- Test reliability is determined in four ways:
The extent to which similar items are answered in similar ways is referred to as internal consistency and measures item stability
: The test scores are stable across time and not highly susceptible to such random daily conditions as illness, fatigue, stress, or uncomfortable testing conditions.
With the test-retest reliability method,
each one of several people take the
same test twice.
"There is no standard amount of time that should elapse between the two administrations of the test. However, the time interval should be long enough so that the specific test answers have not been memorized, but short enough so that the person has not changed significantly."
Typical time intervals between test administrations range from 3
days to 3 months
With the alternate-forms reliability method, two forms of the same test are constructed.
The scores on the two forms are then correlated to determine whether they are similar. If they are, the test is said to have form stability
With alternate-forms reliability, however, the time interval should be as short as possible.
A test or inventory can have homogeneous items and yield heterogeneous scores and still not be reliable if the person scoring the test makes mistakes.
: Another factor that can affect the internal reliability of a test
Three terms that refer to the method used to determine internal consistency:
2. coefficient alpha
3. K-R 20 (Kuder-Richardson formula 20)
method is the easiest to use, as items on a test are split into two groups. Because the number of items in the test has been reduced, researchers have to use a formula called Spearman-Brown prophecy to adjust the correlation.
Cronbach’s coefficient alpha and the K-R 20
are more popular and accurate methods of determining internal reliability, although they are more complicated to use and thus are calculated by computer program rather than by hand.
The difference between the two is that the K-R 20 is used for tests containing dichotomous items (e.g., yes/no, true/ false), whereas the coefficient alpha can be used
for dichotomous items but for tests containing interval and ratio items such as five-point rating scales.
Evaluating the Reliability of a Test
When deciding whether a test demonstrates sufficient reliability, two factors must be considered:
the magnitude of the reliability coefficient- To evaluate the coefficient, you can compare it with reliability coefficients typically obtained for similar types of tests
people who will be taking the test.
- the extent to which test items sample the content that they are supposed to measure.
- In industry, the appropriate content for a test or test battery is determined by the job analysis. A job analysis should first determine the tasks and the conditions under which they are performed. Next the KSAOs (knowledge, skills, abilities, and other characteristics) needed to perform the tasks under those particular circumstances are determined. All of the important dimensions identified in the job analysis should be covered somewhere in the selection process, at least to the extent that the dimensions (constructs) can be accurately and realistically measured. Anything that was not identified in the job analysis should be left out.
- which refers to the extent to which a test score is related to some measure of job performance called a criterion.
- Criterion validity is established using one of two research designs:
concurrent or predictive
- With a
concurrent validity design
; a test is given to a group of employees who are already on the job and with a
predictive validity design
; the test is administered to a group of job applicants who are going to be hired.
- A major issue concerning the criterion validity of tests focuses on a concept known as
, or VG—the extent to which a test found valid for a job in one location is valid for the same job in a different location.
- most theoretical of the validity types. Basically, it is defined as the extent to which a test actually measures the construct that it purports to measure. Construct validity is concerned with inferences about test scores, in contrast to content validity, which is concerned with inferences about test construction.
is usually determined by correlating scores on a test with scores from other tests. Some of the other tests measure the same construct, whereas others do not.
- another method to measure construct validity. This method is not common and should be used only when other methods for measuring construct validity are not practical. With known-group validity, a test is given to two groups of people who are “known” to be different on the trait in question.
- Even though known-group validity usually should not be used to establish test validity, it is important to understand because some test companies use known- group validity studies to sell their tests, claiming that the tests are valid.
- the extent to which a test appears to be job related.
- But just because a test has face validity does not mean it is valid.
- statements so general that they can be true of almost everyone.
Choosing a Way to Measure Validity
With three common ways of measuring validity, one might logically ask which of the methods is the “best” to use. As with most questions in psychology, the answer is that “it depends.” In this case, it depends on the situation as well as what the person conducting the validity study is trying to accomplish.
Finding Reliability and Validity Information
Seventeenth Mental Measurements Yearbook (MMY), which contains information about thousands of different psychological tests as well as reviews by test experts.
Another excellent source of information is a compendium entitled Tests in Print VII.
Computer-adaptive testing (CAT)- the computer “adapts” the next question to be asked on the basis of how the test-taker responded to the previous question or questions
The advantages to CAT is that fewer test items are required, tests take less time to complete, finer distinctions in applicant ability can be made, test-takers can receive immediate feedback, and test scores can be interpreted not only on the number of questions answered correctly, but on which questions were correctly answered
tables are designed to estimate the percentage of future employees who will be successful on the job if an organization uses a particular test. To use the Taylor-Russell tables, three pieces of information must be obtained:
Criterion validity coefficient
- The best would be to actually conduct a criterion validity study with test scores correlated with some measure of job performance. The higher the validity coefficient, the greater the possibility the test will be useful.
- which is simply the percentage of people an organization must hire.
The lower the selection ratio, the greater the potential usefulness of the test.
Proportion of Correct Decisions
Determining the proportion of correct decisions is easier to do but less accurate than the Taylor-Russell tables. The only information needed to determine the proportion of correct decisions is employee test scores and the scores on the criterion.
Base rate of current performance
- the percentage of employees currently on the job who are considered successful. This figure is usually obtained in one of two ways:
The first method is the most simple but the least accurate.
The second and more meaningful method is to choose a criterion measure score above which all employees are considered successful.
Making the Hiring Decision
• After valid and fair selection tests have been administered to a group of applicants, a final decision must be made as to which applicant or applicants to hire.
• If more than one criterion-valid test is used, the scores on the tests must be combined. Usually, this is done by a statistical procedure known as multiple regression, score weighted according to how well it predicts the criterion.
By: Jansen Alyssa Datinguinoo
Alaiza Camille Fajardo
• The Taylor-Russell tables were designed to
determine the overall impact of a testing procedure. But we often need to know the probability that a particular applicant will be successful.
• The Lawshe tables (Lawshe, Bolda, Brune, & Auclair, 1958) were created to do just that.
• Three pieces of information are needed.
Brogden-Cronbach-Gleser Utility Formula
• the other way to determine the value of a test given situation is by computing the amount of money an organization would save if used the test to select employees.
• To use this formula, five items of information must be known:
1. Number of employees hired per year
(n). This number is easy to determine: It is simply the number of employees who are hired for a given position in a year.
2. Average tenure (t)
. This is the average amount of time that employees in the position tend to stay with the company. The number is computed by using information from company records to identify the time that each employee in that position stayed with the company. The number of years of tenure for each employee is then summed and divided by the total number of employees.
3. Test validity (r)
. This figure is the criterion validity coefficient that was obtained through either a validity study or validity generalization.
4. Standard deviation of performance in dollars (SDy
). For many years, this number was difficult to compute. Research has shown, however, that for jobs in which performance is normally distributed, a good estimate of the difference in performance between an average and a good worker (one standard deviation away in performance) is 40% of the employee’s annual salary (Hunter & Schmidt, 1982). The 40% rule yields results similar to more complicated methods and is preferred by managers (Hazer & Highhouse, 1997). To obtain this, the total salaries of current employees in the position in question should be averaged.
5. Mean standardized predictor score of selected applicants (m)
. This number is obtained in one of two ways. The first method is to obtain the average score on the selection test for both the applicants who are hired and the applicants who are not hired. The average test score of the non hired applicants is subtracted from the average test score of the hired applicants.
This difference is divided by the standard deviation of all the test scores.
• Once a test has been determined to be reliable and valid and to have utility for an organization, the next step is to ensure that the test is fair and unbiased.
• The terms fair and unbiased appear to be similar and do overlap, they have very evaluating selection techniques and decisions 225 different meanings.
• A test is considered biased if there are group differences (e.g., sex, race, or age) in test scores that are unrelated to the construct being measured.
• The term fairness can include bias, but also includes political and social issues.
• Typically, a test is considered fair i f people of equal probability of success on a job have an equal chance of being hired.
• Though some people argue that a test is unfair if members of a protected class score lower than the majority (e.g., Whites, men), most I/O psychologists agree that a test is fair if it can predict performance equally well for all races, genders, and national origins.
Determining the Fairness of a Test
1. Bias is finding out whether it will result in adverse impact.
2. Though determining the adverse impact of a test seems simple—which is done by comparing the hiring rates (hires ÷ applicants) of two groups—the actual nuts and bolts of the calculations can get complicated, and it is common that plaintiffs and defendants disagree on who is considered an “applicant” and who is considered a “hire.”
3. There are three criteria for a minimum qualification:
it must be needed to perform the job and not merely be a preference;
it must be formally identified and communicated prior to the start of the selection process;
and it must be consistently applied.
4. Remember that a legal defense for adverse impact is job relatedness and that a valid test is a job-related test. Thus, even if the test has adverse impact, it probably will be considered a legal test.
5. But even though the test might be considered valid, an organization still might not want to use it. If a test results in adverse impact, the organization may have to go to court to defend itself. Even though a valid test will probably allow the organization to win the case, going to court is expensive.
6. if the utility of the test is low, potential court costs will outweigh the minimal savings to the organization.
7. a test with adverse impact will lead to poor public relations with minority communities, which could hurt recruitment or marketing efforts by the organization.
8. Using the 80% rule to determine a test’s fairness means that an organization must wait until it has used the test to select employees, at which time damage already has been done.
9. A method of estimating adverse impact compares the average scores of minority applicants with those of White and male applicants.
10. This is most easily done by looking in the test manual to determine whether African Americans and Whites or men and women have significantly different test scores.
11. the test probably will have adverse impact, and an alternative test can be sought.
1. In addition to adverse impact, an organization might also determine whether a test has single-group validity, meaning that the test will significantly predict performance for one group and not others.
2. To test for single-group validity, separate correlations are computed between the test and the criterion for each group.
3. If both correlations are significant, the test does not exhibit single-group validity and it passes this fairness hurdle.
4. If, only one of the correlations is significant, the test is considered fair for only that one group.
A test is valid for two groups but more valid for one than for the other.
Single-group validity and differential validity are easily confused, but there is a big difference between the two.
Remember, with single-group validity, the test is valid only for one group. With differential validity, the test is valid for both groups, but it is more valid for one than for the other.
Unadjusted Top-Down Selection
• Applicants are rank-ordered on the basis of their test scores.
• Selection is then made by starting with the highest score and moving down until all openings have been filled.
• Advantage to top-down selection is that by hiring the top scorers on a valid test, an organization will gain the most utility (Schmidt, 1991).
• The disadvantages are that this approach can result in high levels of adverse impact and it reduces an organization’s flexibility to use nontest factors such as references or organizational fit.
Compensatory approach to top-down selection
The assumption is that if multiple test scores are used, the relationship between a low score on one test can be compensated for by a high score on another.
Rule of Three
A technique often used in the public sector is the rule of three (or rule of five), in which the names of the top three scorers are given to the person making the hiring decision.
Passing scores are a means for reducing adverse impact and increasing flexibility.
Banding attempts to hire the top test scorers while still allowing some flexibility for affirmative action.