Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Transforming & Equating

No description


on 25 May 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Transforming & Equating

Equating Jin-sun, Yoo
Da-un, Jeong in SWU are used to Make important decisions @ individual level
@ institutional level
@ public policy level can be administered Tests on multiple occasions.
over many years to track educational trends over time. In this situations, we need to Equating ! Different test forms on different test dates might differ somewhat in difficulty. Equating is a statistical process that is used to adjust scores on test forms so that score on the forms can be used interchangeably. The process of equating is used in situations where
such alternate forms of a test exist and
scores earned on different forms are compared to each other. Equating adjusts for differences in difficulty,
not for differences in content. Transforming Linking Scaling Equating interchangeable
similar in content & statistical characteristics comparable
different contents & levels Scaling
& Equating
process Score scales typically are established using a sigle test form. For subsequent test forms,
the scale is maintained through an equating process that places raw score from subsequent forms
on the established score scale. Typically,
raw scores on the new form are equated to raw scores on the old form, and these equated raw scores are then converted to scale scores
using the raw-to-scale score transformation for the old form. 1. Decide on the purpose for equating As these steps in the equating process suggest,
individuals responsible for conducting eqauting make choices designs,
operational definitions,
statistical techniques,
and evaluation procedures. about Property Symmetry Same specifications Equity Observed score equating Group invariance Symmetry property requires that the function used to transform a score on Form X to the Form Y scale be the inverse of the function used to transform a score on Form Y to the Form X scale.
This property rules out regression as an equating method. of equating test forms must be built to the same content and statistical specifications
if they are to be equated. Lord's equity property holds
if examimees with a given true score have the same distribution of converted scores on Form X as they would on Form Y. This property implies that examinees with a given true score
would have identical observed scores on Form X and scores on Form Y. Using Lord's equity property as the criterion, equating is either impossible of innecessary. The converted scores on Form X
have the same distribution as scores on Form Y Equipercentile equating property implies that the cummulative distribution of equated scores on Form X is equal to
the cumulative distribution of scores on Form Y. Under the group invariance property,
the equating relationship is the same
regardless of the group of examinees
used to conduct the eqauting. Equating designs Random groups design Single group design Single group design
with counterbalancing Common item
nonequivalent groups design NAEP reading anomaly
- Problems with common items Error Evaluating
the results of Equating Examinees are randomly assigned the form to be administered. Spiraling Form X - Form Y - Form x - Form Y ... Spiraling process typically leads to
comparable, randomly equivalent groups. Each examinee takes only one form of the test, thus minimizing testing time relative to a design in which examinees take more than one form. More than one new form can be equated at the same time by including the additional new forms in the spiraling process. practical
features Limitation All the forms must be availabel and administered
at the same time. Large sample sizes are typically needed. The same examinees are administered both Form X & Form Y. Fatigue Familiarity Order effects Because thease are typically present,
this design is rarely used in practice. One way to deal with order effects in the single design. Form X+Form Y - Form Y+Form X - Form X+Form Y ... The effect of
taking Form X
after taking Form Y = The effect of
taking Form Y
after taking Form X Equating relationships Differential order effect If thease relationships differ each other... The data for the form that is second might need to be disregarded. In practice,
the single group design with counterbalancing
might be used instead of the random groups design When administering two forms to examinees is operationally possible,

differential order effects are not expected to occur,

it is difficult to obtain participation of a sufficient number of examinees. ASVAB problems
with a Single Group Design The Armed Services Vocational Aptitude Battery A battery of ability tests
that is used in the process of
selecting individuals for the military. The scores on the old form : were used for selection. on the new form : were not used for selection. Many examinees can distinguish between
the old and the new forms. also knew that only the scores on the old form were to be used for selection purpose. The examinees were likely more motivated
when taking the old form than taking the new form. The result of Maier's study(1993) : Motivation differences caused the scale scores on the new form to be too high when the new form was used to make selection decisions for examinees. in estimating
equating relationships Estimated equating relationships typically contain estimation error.
A major goal in designing and conducting equating is to minimize such equating error. Random equating error is present whenever samples from populations of examinees are used to estimate parameters. (e.g., means, standard deviations...) ; Standard error of equating As the sample size becomes larger,
the standard error of equating becomes smaller. Sample size! Random equating error Systematic equating error Systematic equating error results from violations of the assumptions and conditions of equating. Although the amount of random error can be quantified
using the standard error of equating,
systematic error is much more difficult to quantify. In the random groups design, if spiraling process is inadequate for achieving group comparability... In the single groups design, if differential order effects can not be controlled... In the common-item nonequivalent groups design, if the assumptions of statistical methods used to seperate form and group differences are not met... After the equating is conducted, the results should be evaluated. The criteria
for equating Standard errors of equating The properties of equating to estimate random error
(consistency of results) also can be used to develop evaluative criteria. Observed score equating properties are especially important
when equating is evaluated from an institutional perspective. We will discuss several different types of raw-score transformations that aid in the interpretation of test scores. raw-score problem difficult to interpret : isolated raw score does not give any information about how one examinee's performance is related to the performance of the other examinees. difficult to compare across tests Two basic types
of transforming scores linear transformation nonlinear transformation Y=ax+b Y=X 2 standard and standardized scores
& some formula scores Percentiles, age and grade scores, expectancy tables,
normalized scores, equal-interval scales
& some formula scores the shape of the distribution of the transformed scores
is the same as the shape of the distribution of the raw scores. change correlations and the shape of the score distribution,
so that the transformed-score distribution can be very different
from the raw-score distribution. Monotonic transformations will not alter
an examinee's rank order in the sample. Norm-referenced test Criterion-referenced test comparing an examinee's performance to the performance of other examinee's (;norm group) whether the examinee has reached a certain specific criterion performance or mastered a specific task. not require transformation. Percentiles Age and Grade Scores Expectancy Tables Standard and Standardized Scores Normalized Scores Corrections for Guessing and Omissions Equal-Interval Scales Vertical equating Horizontal equating to equate different levels of a test so that an examinee will get the same score
regardless of whether an levels of the test is harder or easier. to equate test forms within a specified difficulty level. (ex. the test on grade 4 - the test on grade 5) (ex. test on grade 4 in 2010 - the test on grade 4 in 2011) Norm group is a specified sample of examinees is defined as the percentage of people in a norm group
who have trait values less than or equal to that particular trait value. Percentile rank of a trait value Limitations Percentiles can be assumed to form
Thus, arithmetical manipulations of percentiles can produce the distribution of percentiles within the norm group is Percentile scores may lead to of small differences,

especially when the test is short. a rectangular distribution curve a horizontal line. Therefore,
researchers who desire to use
common statistical techniques
that assume normal distributions
should avoid the use of percentiles. ordinal scales. misleading results. rectangular, not normal. exaggerated
interpretations Age or grade equivalents A third-grader may be said
to read at the fifth-grade level
or have the mental ability of a 10-year-old. Ex. Limitations These scores are assumed to form ordinal scores, arithmetical manipulations of these scores can lead to misleading results. The interpretation of these scores is not
as straightforward as it appears. Score distributions for adjacent grades typically tend to
have increasing overlap as grade level increases. School may differ in their curricula and introduce topics at different rates. The use of age or grade scores is only reasonable
when the trait being measured increases(or decreases)
monotonically with age or grade. interpolation between tests may be inaccurate. The National Assessment of Educational Progress The survey of the educational achievement of students in American schools The reading results showed a surprisingly large decrease form 1984 at age 17 and, to a lesser degree, at age 9...(Zwick, 1991) 1. In 1984, the test booklets administered to examinees contained reading and writing sections. In 1986, the booklets administered to examinees contained reading, mathematics,
and/or science sections at ages 9 reading, computer science, history
and/or literature at ages 17 In 1986, the booklets administered to examinees contained 2. The composition of the reading sections differed in 1984 and 1986. The orders of common items The available time to complete common items Context effects
can lead to
very misleading results. This design often used when more than one form per test date cannot be administered because of test security or other practical concerns. Internal common items External common items When the score on the set of common items contributes to the examinee's score on the test When the score on the set of common items
does not contribute to the examinee's score on the test = miniversion in content & statistical charactersitcs of the total test form To accurately reflect group differences... To help common items behave similarly,
each common items should occupy a similar location(item number) in the two forms. be exactly the same in the old and new forms. Differences between means on Form X & Form Y examinee group differences
and test form differences can result from a combination of The central task in equating using this design is to seperate group differences and test form differences. Which of the two forms is easier?
What would have been the mean on Form X for Group 2 taken From X? Group 2 might be expected to correctly answer 10% more of the Form X items than would Group 1. The mean for Group 2 on Form X would be expected to be
82=72+10. Because Group 2 earned a mean of 77 on Form Y and has an expected mean of 82 on Form X, Form X appears to be than Form Y. 5 points easier The larger the differences between examinee groups,
the more difficult it becomes for the statistical methods
to seperate the group and form differences. the conditional distribution of criterion scores for different test scores. A counselor would advise
a student with a high pre-course test score to take the course,
perhaps with a warning that a few students from this test-score level
still did poorly in the course. Time or monetary considerations
or clear-cut criterion is not available. The expectancy table illustrates
the probabilistic nature of psychological prediction.
:high pre-course test score is not guaranteed an "A." Limitations Large enough to ensure that the probabilities in the table are reasonably stable. often called Z scores ①T scores
② Transformations of scores can also be made to adjust
for the effects of guessing and the effects of omitting items. ①hypothesizes that the continuous trait being measured by a test
has a normal distribution in some specified population. linear transformations of raw scores two frequency distribution have the same shape,
it is difficult to compare scores on two standardized scales, it is particularly risky to interpret small differences
in standard scores. half the score are negative (standard-score equivalents) that eliminate the problem involved with negative number The transformation to normalized scores
involves forcing the distribution of transformed score to be as close as possible to a normal distribution
by smoothing out, stretching, or condensing irregularities and departures
from normality in the raw-score distribution. Two nomalized scores are normalized scores with
mean=50, standard deviation=10 are one-digit normalized scores Stanines The manual carefully to normalized or standardized mean=5, standard deviation=approximately 2
‣ may not be reasonable
if the underlying trait has The use of normalized scores a very non-normal distribution In the interpretation of an examinee's performance
and in the comparison of the performances of different examinees When there are no omitted items... On multiple-choice tests,
examinees can get an item correct,
without knowing the right answer,
simply by guessing. These transformations can aid Formula scores Transformaions that take into account
guessing or omissions When there are omitted items... F2, is the estimate number of items
that would be correct
if every blank item were replaced by random guess. F1, a linear funtion of X, is perfectly correlated with X,
and has the same reliability and validity as X. F1 and X are not perfectly correlated. Equal intervals is particularly useful
for measuring growth or change in a trait or behavior. raw scores can be transformed into a set of scores
that does have equal intervals. Thurstone's absolute scaling method hypothesizes that raw scores on the test
are monotonically related to trait values. (A monotonic relationship is one in which every increases in the raw score reflects an increase in the trait value.) Unlike the bell-shaped normal curve,
looks like 2. Construct alternate forms 3. Choose a design for data collection 4. Implement the data collection design 5. Choose one or more operational definitions of equating 6. Choose one or more statistical estimation methods 7. Evaluate the results of equating
Full transcript