Learner assessment is an ongoing process in which teachers and learners gather and analyze data and use it to make educational decisions. Assessment instruments include tests, interviews, questionnaires, and work samples. In general, there are at least 3 purposes for assessment: to identify individual goals, strengths, and needs (which are used for initial planning); to check on learning and spot problems (for ongoing progress monitoring) and to assess learning over time (for outcomes measurement). Learner assessment can be used in different spheres, there are different types of it, but this paper is primarily concerned with assessment measures in Reading.
There are three types of assessment measures: standardized tests, classroom- or curriculum-based tests and supplemental/alternative assessments. Let us compare and contrast the first two types of assessment measures – standardized and curriculum-based tests.
Standardized tests are suitable for some assessment purposes. When standardized tests are mentioned we usually think of achievement tests, intelligence tests (ACT, SAT; TABE and CASAS in adult education) and many others, of course. They are tests in which a learner’s test performance is interpreted by comparing it to the performance of a group of subjects of known demographics (a norm group). These tests are most-often given to entire classes to compare one student’s ability in relationship to another’s, or to the class as a whole. Standardized tests are easily administered and scored, are most often of the pencil/paper variety and can be given in a group setting, and give results in easy to interpret age or grade equivalents. Most standardized tests have met reliability and validity measures.
The term curriculum-based assessment (CBA) is often used to refer to customized assessment designed to assess regular education students in the regular education curriculum. Curriculum-based tests are closely related to instruction. Teacher-made tests and tests in workbooks and computer-assisted instructional programmes fall into this category.
Standardized tests are typically more reliable than less formal measures and may therefore provide more accurate results for developing reader profiles. For curriculum-specific learning, you may develop your own tests and measures, that will be sensitive to the content you have taught. These measures provide good information for teachers and learners.
Let us examine two tests – a standardized one and a curriculum-based one. One of the examples of standardized tests is the MAPP (Measure of Academic Proficiency and Progress) test; Wheldall Assessment of Reading Passages (WARP) is an example of curriculum-based reading test. The Measure of Academic Proficiency and Progress (MAPP) test is a measure of college-level reading, mathematics, writing, and critical thinking in the context of the humanities, social sciences, and natural sciences. The MAPP test is designed for colleges and universities to assess their general education outcomes, so they may improve the quality of instruction and learning. It focuses on the academic skills developed through general education courses, rather than on the knowledge acquired about the subjects taught in these courses. The main aim of the WARP is to provide a means of tracking and monitoring the performance of older low-progress readers toward functional literacy. The WARP test is a means both of measuring reading competence and of indexing progress in reading. A curriculum-based passage reading test typically requires students to read a grade level passage from a basal reader for one minute. The number of words read correctly in that minute is the index of student reading performance. While standardized tests are mostly used to check the students knowledge in several fields (i.e., Sciences, Math, Use of English, Reading and others), curriculum-based tests deal with one sphere of knowledge, as a rule (i.e. reading performance). Another distinction is that standardized tests, as was mentioned above, focus, on the academic skills, developed by a student, unlike curriculum-based tests, focusing on the knowledge, acquired in the course of study.
The two tests examined have different instruments and procedures of taking it. The WARP comprises a series (21 passages currently) of specially written passages, each passage being exactly 200 words in length. Scores indicate the number of words read correctly in one minute (WPM) averaged over the number of passages administered. The MAPP test is a standard, multiple-choice test consists of 108 questions and is 120 minutes long. The abbreviated, multiple-choice test consists of 36 questions and is 40 minutes long. Scores are based on the number of questions answered correctly. Because there is no penalty for guessing, test takers are encouraged to answer every question. The reading and critical thinking questions do not ask for recall of information learned in specific subjects such as psychology, history, biology or English. Instead, these questions focus on the reading and critical thinking skills that students should develop while studying these subjects. The questions are based on materials from three academic areas — humanities, social sciences and natural sciences.
The results of these two tests are also different in form and meaning. The WARP test result is actually a number of words read per minute (eliminating number of words, read incorrectly or not read at all). This test is supposed to be taken several times (once or twice a year, usually) to see the students progress. Students who take the standard form of the MAPP test will receive their scores on an individual score report that is sent to their chosen institution about a month after testing is completed. The individual report includes a total score, reported on a scale of 400 to 500, and seven subscores, each reported on a scale of 100 to 130 (relating to each type of skills and sphere of knowledge tested).
Another important feature of any test is their reliability and validity. Teachers and instructors want to be confident that the assessments they use truly reflect the abilities and the level of knowledge of the students. If there is no true measure, the decisions based on the data may not lead to good results. These 2 features should be considered at every point in the process of choosing and developing assessments, administering, scoring and interpreting results. Reliability concerns consistency or stability of scores. If scoring is reliable, different administrators evaluating the same test or performance should arrive at similar scores or ratings. A reliable instrument is also consistent over time. If a learner takes a test at two different times with no intervening instruction, his scores should be the same or very similar, because one assumes that abilities don’t change much without specific intervention. If the scores are different, they may reflect some feature of the instrument, not the individual’s skills and knowledge. Of course, no measure is 100% reliable. Test developers and measurement experts use statistical methods to assess the types of reliability discussed above and assign ratings–reliability coefficients ranging from 0 (low) to 1.0 (high). These ratings are based on qualities and features of an instrument. For example, research on the WARP test has established the reliability of the passages of 0.94-0.96, which is considered to be quite high for a test of this kind. The outcome of the Reliability study of the MAPP test has proven, that it is an accurate tool for identifying the level of skills and knowledge of students. All three of the reliability coefficients indicate that MAPP is highly consistent over time and shows great stability in test responses. While .50 is an acceptable reliability coefficient for true score variability, the MAPP instrument exceeds standards with coefficients of .95, .90, and .71.
Validity refers to the interpretation and use of test scores. Validity is extremely important because we make decisions on the basis of these scores, and we need to be confident that they accurately represent the abilities – both strengths and weaknesses – that we intend to measure and that our use of the scores for various purposes is appropriate. The extensive research by Wheldall & Madelaine has shown, that validity coefficients of WARP are 0.83-0.87.
1. Susan McShane (2005). Applying Research in Reading Instruction for Adults. First Steps for Teachers. [Electronic version]. Retrieved March 9, 2007 from http://www.nifl.gov/partnershipforreading/publications/html/mcshane/chapter3.html
2. Wheldall, K., & Madelaine, A. (2000). A curriculum-based passage reading test for monitoring the progress of low-progress readers using standardized passages: The development of the WARP. International Journal of Disability, Development and Education, 47, 371–382.
3. Wisconsin Literacy Education and Reading Network Source (2005). Adult literacy. Standardised assessment. Retrieved March 9, 2007 from http://wilearns.state.wi.us/apps/default.asp?cid=648
4. Educational Testing Service (2006) The Measure of Academic Proficiency and Progress. Retrieved March 9, 2007 from http://campus.umr.edu/irinfo/mapp.pdf