RSS

Complete Reliabilty of Psychological Testing

Introduction of Reliability

(Anne Anastasi)

"Reliability of a test refers to consistency of scores obtained by the same person on same test in different administrations and different occasions".

In short, it is the repeatability of your measurement. A measure is considered reliable if a person's score on the same test given twice is similar. It is important to remember that reliability is not measured, it is estimated.

Types of Reliability

1. Test/Retest Reliability

(Anne Anastasi)
"In this method reliability coefficient is simply the correlation between the scores obtained by the same person on two administrations of the same test".

Test/retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows:
  • implement your measurement instrument at two separate times for each subject;
  • compute the correlation between the two separate measurements; and
  • assume there is no change in the underlying condition (or trait you are trying to measure) between test 1 and test 2.
Factor Effecting Test/Retest Reliability
  • Interval
  • Experience
  • Errors due to conditions of test takers
  • Errors due to uncontrolled test conditions
2. Parallel or Alternative form of Reliability

In the parallel form procedure, two tests that are equivalent, in the sense that they contain the same kind of items of equal difficulty level but not the same items re administered to the same examiner.
(Aiken)
Formulation of Second Form

The test should contain the same number of items and items should be express in the same form and should cover the same type of content. The range and level of difficulty of the items should also be equal. Instructions , time limits, assertation examples formed should be equal.
(Anne Anastasi Psy Testing)

3. Internal Consistency Methods

Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between those two groups of three questions to determine if your instrument is reliably measuring that concept.

In short, Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all (we use a computer program for this part). In the end, your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient, the closer it is to one, the higher the reliability estimate of your instrument. Cronbach's alpha is a less conservative estimate of reliability than test/retest.

I. Split-Half Reliability

Step 1: Divide the test into equivalent halves.

Step 2: Compute a Pearson r between scores on the two halves of the test.

Step 3: Adjust the half-test reliability using the Spear-man-Brown formula

Spear-man-Brown Formula
  • used to estimate how much a test's reliability will increase when the length of the test is increased by adding parallel items
  • where L = the number of times longer the new test will be
  • estimate of SEM for different test lengths can be obtained using
Cronbach's Alpha (a)

If we compute one split-half reliability and then randomly divide the items into another set of split halves and recompute, we can keep doing this until we have computed all possible split half estimates of reliability. Cronbach's Alpha is the geek equivalent to the average of all possible split-half estimates (although that's not how we do it.)

In saying we compute all possible split-half estimates, it doesn't mean that each time we go and actually measure a new sample! Instead, we calculate all split-half estimates from the same sample. Because we measured all of our sample on each of the six items, all we have to do is have the computer analyze the random subsets of items and compute the resulting correlations.

The figure shows several of the split-half estimates for our six item example and lists them as SH with a subscript. Just keep in mind that although Cronbach's Alpha is equivalent to the average of all possible split half correlations we would never actually calculate it that way.

Rulon's Split-half Method
  • Split test into two halves and create half-test scores
  • Compute the difference between half-test scores
  • Compute the variances of differences and total scores
  • reliability estimate =
Split-half testing

Split-half testing measures consistency by:
  • Dividing the test into two (usually a mid-point, odd/even numbers, random or other method).
  • Administering them as separate tests.
  • Compare the results from each half.
A problem with this is that the resultant tests are shorter and can hence lose reliability. Split-half is thus better with tests that are rather long in the first place.

Use Spear-man-Brown’s formula to correct problems of shortness, enabling correlation as if each part were full length:

r = (2rhh)/(1 + rhh)

(Where rhh is correlation between two halves)

II. Kuder-Richardson Reliability or Coefficient Alpha

The appropriate use of this method requires that all items in the test should be psychologically homogeneous that is every item should measure the same factor or a combination of factors in the same proportion every other item does.
(Frank.S.Freeman)
The Kuder-Richardson reliability or coefficient alpha is relatively simple to do, being based on one administration of the test. It assesses inter-item consistency of test by looking at two error measures:
  • Adequacy of content sampling
  • Heterogeneity of domain being sampled
It assumes reliable tests contain more variance and are thus more discriminating. Higher heterogeneity leads to lower inter-item consistency. For right/wrong scores that are non-dichotomous items:
Formula No 20
Its uses where the problem of difficulty level present.

R=k /k –1[1–Σpi(1-pind)/σ2)]

Where K is number of items, pi is item variance, pind is test variance.
  • Pi= Probability of score.
  • Pind= Proportion of person.
  • K=Total number of items.
  • σ/SD=Standard deviation of the total score of test.
Formula No 21
Its uses where only two options are available.

R=k/k-1[1-mean(k-mean)/σ2]

When three or more options are available in test then uses this formula as under.

R=k /k –1[1–Σ(Si2/St2)]
Equivalence of results (parallel form)

Seeks reliability through equivalence between two versions of the same test, comparing results from each version of test (like split-half). It is better than test-retest as it can be done the same day (reducing variation).

Parallel versions are useful in such situations as with graduates who may do the same test several times.
An adverse effect occurs where different groups score differently (potential racial, etc. bias). This may require different versions of the same test – eg. MBTI for different countries.

Discussion

There are a number of procedural aspects that affect test reliability, including:
  • Test conditions,
  • Inconsistent administrative practices,
  • Variation in test marking,
  • Application of an inappropriate norm group,
  • Internal state of test-taker (tired, etc.)
  • Experience level of test-taker (eg. if taken test before).
Aptitude Tests > Speed and Power Tests

There are at least 5000 aptitude tests on the market at the moment. The types of question you can expect will depend on which aptitudes and abilities that are needed in the job you are applying for. Aptitude and ability tests are classified as maximum performance tests because they test what you can achieve when you are making maximum effort. There are two different styles of maximum performance test; speed tests and power tests.

III. Speed and Power Tests
  • Speed Test
"A pure speed test is one in which individuals difference depends entirely on speed of performance".
In a speed test the scope of the questions is limited and the methods you need to use to answer them is clear. Taken individually, the questions appear relatively straightforward. Speed test are concerned with how many questions you can answer correctly in the allotted time.

For example:

139 + 235=

A) 372 B) 374 C) 376 D) 437
  • Power Test
"In power test the time is enough but the difficulty level of items so high that no one can solve all items at any cost".
A power test on the other hand will present a smaller number of more complex questions. The methods you need to use to answer these questions are not obvious, and working out how to answer the question is the difficult part. Once you have determined this, arriving at the correct answer is usually relatively straightforward.

In summary, speed tests contain more items than power tests although they have the same approximate time limit. Speed tests tend to be used in selection at the administrative and clerical level. Power tests tend to be used more at the graduate, professional or managerial level. Although, this is not always the case, as speed tests do give an accurate indication of performance in power tests. In other words, if you do well in speed tests then you will do well in power tests as well.

These speed and power definitions apply only to maximum performance tests like aptitude and ability tests and not to personality tests.

Factors Effecting/Influencing Reliability
  • Length of the test
  • Characteristics of the population(If the population is more hatro-genious the reliability of of the test will reliable than homo-genious).
  • Characteristics of the test itself
  • Method itself
  • Range of age
  • Time interval

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

0 comments:

Post a Comment

Mind Study