Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Testing Times: Ensuring Success
Transcript of Testing Times: Ensuring Success
what is assessment?
types and purposes
criteria / desirable properties
constructing / checking test items
discuss issues in applied testing / evaluation
assessment & technology Michael Phillips:
honours degree in environmental sciences [biology/geography]
JET Programme - July 2001
Nakama / Munakata / Tobata
kindergarten, primary, junior-high, high-school, junior college, community centres, cafes, eikaiwas, and business school...
Masters of Education (TESOL)
computing "In the context of language teaching and learning, 'assessment' refers to the act of collecting information and making judgments about a language learner's knowledge of a language and ability to use it."
Carol Chapelle and Geoff Brindley
"Language testing is the practice and study of evaluating the proficiency of an individual in using a particular language effectively."
Priscilla Allen What do we mean by
"language assessment"? Welcome to: Kitakyushu 11 June 2011 Testing Times Today's Presentation Aims:
interesting Group activity:
What are some of the main assessment issues in your situation?
How can they be managed? Forum:
What do we mean by computer aided assessment?
computer assisted testing
computer based assessment
computer based testing
technology enhanced assessment
technology supported evaluation
internet based testing
e-assessment 1. Validity
An assessment task is said to be valid when it tests what it sets out to test.
An assessment has validity to the degree that what is actually assessed (the construct) is what should be assessed in the context concerned, and that the information gained is an accurate representation of candidate proficiency (Council of Europe, 2001). 2. Reliability
This is basically the extent to which the same rank order of candidates is replicated in two separate administrations of the same assessment.
Clark (1975) regards it as prerequisite to validity in performance assessment since a test must provide consistent, replicable information about candidates’ language performance.
In the endeavour to make tests more reliable, however teachers should be wary of reducing their validity. 4. Washback [or backwash]
This refers to the effects of testing on teaching and learning and can be beneficial or harmful.
Fulcher and Davidson (2007) explains it in terms of ‘consequential validity’ referring to the effects tests have on instruction or feedback.
As such they can enhance (or hinder) acquisition factors such as attitudes, motivation or self-confidence, based on their content validity. It may also affect test takers' perceptions of the syllabus and of the assessment process itself (face validity). The Bridge of Death
[and the Gorge of Eternal Peril] [2:56] Group activity:
What considerations do you think are most important for creating or designing a test? High-stakes summative testing!
While watching this video,
consider the following issues:
practicality 3. Authenticity
Bachman and Palmer (1996) defined this as the degree to which a given language test's tasks' characteristics correspond to a target language task's features.
Essentially when a test task is regarded as authentic, it is perceived that this task is likely to occur in the “real world”.
Tests are felt to be more authentic if they use ‘natural’ language, use items that are contextualised, and provide some thematic organisation. 5. Practicality
This quality refers primarily to implementation issues such as human resources, material resources, and time.
Brown (2004) regards an effective test as not excessively expensive, has appropriate time constraints, is easy to administer, and has an evaluation procedure that is clear and time-efficient. We can place most tests [eg. proficiency, achievement, diagnostic, placement] into two major categories of use:
As sources of information for making decisions within educational contexts. These may be related to particular programmes, to students’ progress, or the quality of teaching.
As a measure of specific abilities that may be of interest to researchers, which may or may not have any immediate bearing upon teaching programmes.
There are some who claim that anything that is learnable can be tested, while others claim that there are outcomes that are both difficult to define and therefore difficult to measure. When tests are used as a source of evaluation, two assumptions are usually made:
First, that information regarding educational outcomes is important for the continued effectiveness of any educational programme, namely:
Accountability: being able to demonstrate the extent to which we have effectively and efficiently discharged responsibility’. Without accountability in language teaching, students can pass several semesters of language courses with high grades and still be unable to use the language for reading, writing, or conversing with speakers of that language
Feedback: simply refers to information that is provided to teachers, students and others interested about the results or effects of the educational program. Group activity: What are the purpose and reason for these tests?
You have a new class to which you give a grammar test in order to find out which aspects they have not yet mastered.
You give a language test to a group of new international students. You then put the students into different classes of differing language ability.
You are a personnel manager for a hotel and speak Japanese fluently. You interview each applicant for a position in both English and Japanese. The applicant who seems most fluent gets the job.
At the end of the year, in which you taught a self-devised language programme, you give a test to find out how well your students have achieved the objectives of your programme. Tests with items
Correction-for-guessing is usually applied in large-scale testing situation, but often not in the classroom. It is for this reason that classroom tests should not normally have T/F questions because there is a 50% chance of guessing the correct answer.
It is generally better to have at least four choices, thus (in theory) increasing the reliability of the test. However, it is difficult to write distractors, and if they do not distract any of the students, then very little information regarding discernment is received. Tests with discrete items (such as multiple-choice) are easy to score because the answer is either right or wrong. They tend to contribute to reliability. However, this type of test has been questioned by some on the grounds of validity.
Tests with open ended questions (such as writing or speaking), tend to sample proficiency more globally, but are also more subjective in their marking which can introduce larger measurement errors. Threats to test reliability (consistency)
intra-rater error variance
inter-rater error variance
fluctuations in administration
length / difficulty / boundary effects
test-wiseness Threats to test validity
inappropriate content selection
imperfect examinee cooperation
imperfect examiner cooperation
inappropriate referent/norm group
poor criteria selection
use of invalid constructs Forum:
How do you judge weak test items?
What to do with double negative questions?
How do you isolate the one skill you want to test?
How many items on a test are 'enough'?
How do you determine the time allowed?
How to determine the weighting of components?
When to use very broad/very narrow tasks?
Is testing vocab/grammar ok in CLT tests?
Is there are role for L1 in assessing L2? Pearson Vue - computer-based testing
at a Pearson Professional Center [4:11] The initial diagnostic evaluation:
What do you know about me?
What do you know about today's presentation? Language assessment word cloud
[Wordle.net] Area 1- Evaluation not involving either tests or measures
eg. the use of qualitative descriptions of student performance for diagnosing learning problems
Area 2- A non-test measure for evaluation
eg. teacher rankings used for assigning grades
Area 3- A test used for purposes of evaluation
eg. the use of an achievement test to determine student progress
Area 4- Non-evaluative use of tests and measures for research purposes
eg. the use of a proficiency test as a criterion in SLA research
Area 5- Non-evaluative non-test measures
eg. assigning code numbers to subjects in L2 research according to native languages Assessment (Lynch, 2001) Assessment (Bachman, 1990)
Group activity: What do you think are the differences between areas 1-5? "The activity of developing and using language tests. ... Language testing traditionally was more concerned with the production, development and analysis of tests. Recent critical and ethical approaches to language testing have placed more emphasis on the uses of language tests.
The purpose of a language test is to determine a person’s knowledge and/or ability in the language and to discriminate that person’s ability from that of others. Such ability may be of different kinds, achievement, proficiency or aptitude. Tests, unlike scales, consist of specified tasks through which language abilities are elicited.
The term language assessment is used in free variation with language testing although it is also used somewhat more widely to include for example classroom testing for learning and institutional examinations."
Alan Davies (EALTA) Bachman (1990) regards evaluation as “the systematic gathering of information for the purpose of making decisions”. Lynch (2001) states that this decision or judgment needs to relate to individuals.
Both authors agree that assessment/evaluation is the superordinate term in relation to both measurement and testing.
Measurement is distinguished from evaluative and qualitative descriptions, since measurement is concerned with quantification.
Language proficiency, like many other constructs, needs to be quantified before any judgments can be made about it (Bachman, 1990). The third component in this model is testing, which consists of the use of specific tests items.
“A psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual”.
Bachman (1990) :
a test is a measurement instrument which quantifies characteristics of individuals
designed to obtain a specific sample of behavior according to explicit procedures
extended the model from language testing to language teaching / learning / research. Cambridge International General Certificate of Secondary Education [IGCSE] English: An Experiment in eAssessment [5:21] 21st Century Skills Assessment [6:07] Assessment for Learning:
Classroom Assessment [3:50] Test reliability [1:33] IELTS fraud exposed [2:51] Test anxiety [1:00] The UK Assessment Reform Group (1999) identifies 'The Big 5 Principles of Assessment for Learning' as:
The provision of effective feedback to students
The active involvement of students in their own learning
Adjusting teaching to take account of the results of assessment
Recognition of the profound influence assessment has on the motivation and self esteem of pupils, both of which are critical influences on learning
The need for students to be able to assess themselves and understand how to improve Interaction of learning and assessment
Forum: What is meant by "assessment for learning"? The qualities of desirable tests:
For Hughes (2003) it is relatively straightforward to introduce and explain the desirable qualities of language tests: validity, reliability, practicality, and beneficial backwash.
Brown (2004) agrees, although he adds a fifth cardinal criterion (authenticity) for ‘testing a test’. Forum: What are the assessment issues here? Forum:
What are some of the issues involved with iBT/CBT vs PBT?
short-term vs. long-term costs
feedback to students
location and timing of tests
reliability of machine marking
test item banking
interactivity and multimedia Some other issues in testing can involve:
standardised test formats
the impact of high-stakes tests
tests with multiple choice items
assessing written / oral output
evaluating test structure / items
managing the test cycle
external considerations For further reading: "Guess The Language" test A second assumption is that it is possible to improve learning and teaching, through appropriate changes in the programme, based on feedback.
All in all, without these assumptions there is no reason to test, since there are no decisions to be made, and therefore no information required (Bachman, 1990). diagnosis