ÎÚÑ»´«Ã½

Test Assembly

Some Practical Issues in Computerized Adaptive Testing With Response Times (RR 14-06)

Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing, or CAT. A second advantage is the ability to record not only the test taker’s response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker.

Four strategies for incorporating response time data were evaluated, and the precision of the final test-taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time.

Request the full report

Additional reports in this collection

An Overview of Research on the Testlet Effect: Associated...

A mathematical model called item response theory is often applied to high-stakes tests to estimate test-taker ability level and to determine the characteristics of test questions (i.e., items). Often, these tests contain subsets of items (testlets) grouped around a common stimulus. This grouping often leads to items within one testlet being more strongly correlated among themselves than among items from other testlets, which can result in moderate to strong testlet effects.

Robust Text Similarity and Its Applications for the LSAT...

Text similarity measurement provides a rich source of information and is increasingly being used in the development of new educational and psychological applications. However, due to the high-stakes nature of educational and psychological testing, it is imperative that a text similarity measure be stable (or robust) to avoid uncertainty in the data. The present research was sparked by this requirement. First, multiple sources of uncertainty that may affect the computation of semantic similarity between two texts are enumerated.