Clinical depression is an unfortunately common condition that afflicts a large portion of the global population. The hardest part about it is that some people may not even know what they’re feeling is depression. Thankfully, we have measures to test people who are possibly suffering from depression, and one measure we are able to use is the Hamilton Rating Scale for Depression, also abbreviated to the HAM-D and HRSD.

The HRSD was developed in 1960 by Max Hamilton, a psychiatrist working at the University of Leeds in the United Kingdom. Developed as a means to accurately determine the symptoms of one whose possibly suffering with depression, the test uses a rating scale (either a three-point or five-point scale) with questions pertaining to depression, covering upwards of seventeen variables. While specific questions on the test have answers on an intensity scale, others may ask questions where all answers do not pertain to quantity or increasing intensity. Once the variables are measured, the questions are applied to the results of the interview with the patient. The test is applicable for anyone over the age of eighteen, and is administered as a paper questionnaire in a clinical setting.

One of the most important parts of any test is it’s reliability. Reliability constitutes whether a test has a low amount of measurement error could be considered reliable. Many tests do have at least some measurement error, since it is near impossible to create an absolutely perfect test. The Hamilton Rating Scale for Depression has been around for over fifty years, so the reliability must be relatively low in error for the test to be relied on as long as it has been. There are certain types of reliability to be looked for when a test is assessed. The main three types, test-retest reliability, internal consistency, and interrater reliability, are all extremely important when it comes to considering whether a test is reliable. The reliability of the Hamilton Rating Scale for Depression will be assessed through examples of the aforementioned three reliability types.

Test-retest reliability is an idea that in order to determine the reliability of a measure, one could administer a test at one time, then the same test, with possibly different ordering of items, at a later time. Depending on when the retest is given to the participants, there could be the issue of error arising in the form of practice effects if too soon, or the individuals encountering anything that would change their test answers if the time period between tests is longer. Lin et al’s study of unipolar depression compared to bipolar, and healthy candidates uses the HRSD weekly over the course of six weeks in order to make sure the retests are reliable and the answers for the tests are relatively constant. Ballard et al’s use of test-retest method was slightly different than Lin’s. While Lin had used test-retest effectively over the course of six weeks, Ballard had given the test eight times in a three day period. While Ballard was testing the effect of ketamine on the system during a depressive episode, it was understandable to test in a shorter time span than Lin would have, since ketamine would have worn off over time, but this does not necessarily mean that chances for error would be on par with Lin’s method. Since Ballard et al administered the test eight times during a very short period, practice effects very possibly may have contributed to any error that could have come from it. Ballard’s result of a .44 with 19% variance does indicate a fair amount of reliability, despite the results showing that after the second test depression rates had fallen back to roughly their original scores for the rest of the tests.

Internal consistency is another sub-type of reliability, but instead of measuring the best time for the test to be administered like test-retest does, internal consistency is how much of the test can be considered reliable in terms of items measuring a construct, or in other words, how much do the items on the test, the Hamilton Rating Scale for Depression specifically, measure the same construct, in this case would be depression. Coefficient alpha (α) is used when denoting the results of internal consistency on a test.

Interrater reliability, the final type of reliability to be covered here, is possible one of the simpler types to understand, although that doesn’t make it any less important. Interrater reliability is simply the amount that the administrators on the test agree on their findings among participants, and is denoted by kappa (κ). Although interrater reliability can run into error by chance the administrators of the test find the same result coincidentally, kappa is used to eliminate the chance of error caused by coincidences. In the Lin et al study, the Hamilton Rating Scale for Depression had a well above average score during their interviews, with kappa indicating a 0.9.

A study conducted by Lin et al in July 22, 2014 in China, wanted to find if there were any sort of behavioral differences between those who suffered from clinical, or as was phrase by Lin, unipolar depression, and those whom had depression but also dealt with symptoms of bipolar disorder, to which Lin used the term “soft bipolar spectrum.” They hypothesized that one group for unipolar depression, abbreviated UP, would behave differently than those whom were afflicted by the soft bipolar spectrum, to which Lin would use the abbreviation of SBP. The test’s participants, a group of 736, were split into the five groups, the strict UP group consisting of 219 participants, while 98 participants were in the bipolar I group, 136 were in the bipolar II group, 81 participants in the SBP group, and the final group, the control group, consisted of 200 individuals without mental illness. The age range for the test was between the ages of 18 and 60 years old. The interrater reliability was shown by kappa to be higher than 0.9, indicating that agreement between the interviewers was exceptional. The HRSD, among other tests, was used weekly as the patients were observed during a period of six weeks, showing good use of test-retest reliability. The HRSD and the other tests were also given to the control group to test to be sure they were free of mental illness, the HRSD testing with the control group with a .6. The possibility of error through the use of the test-retest method could come from carryover effects, or practice effects more specifically. Since the HRSD is given to participants weekly, they may be able to score closer to what the interviewer may be looking for.

The HRSD in the test conducted by Lin et al can be seen as reliable, but how does the validity measure up? Face validity, whether the test appears to measure what it says it does, is definitely present. Lin et al was attempting to measure behavioral differences between bipolar, SBP, depression, and control groups, and did so using a battery of tests, the Hamilton Rating Scale for Depression being a prominent one in it’s weekly administrations. Content validity, whether the test is accurately representing a construct it is testing for, also appears to be confirmed. Lin et al addresses what they need to while forming the beginnings of the test: what mental disorders are represented, who would be separated into which groups, which tests will give accurate measurements for those with differing disorders. Specifically, the HRSD does gauge whether depression symptoms are present in each participant. What needs to be measured is done so, making content validity present. Criterion validity, determining if the criteria and the test measuring it are correlated well, may seem harder to interpret, but I believe is not present for Lin et al. Since the depression symptoms are being measured correlating to the questionnaire, those from the UP, whom are solely dealing with clinical depression, are among the lower groups for correlations, while getting higher correlation rates only above the bipolar I group. Construct validity, whether the test being measured correlates to other related tests, is present, although negatively. When matched up, the HRSD has the highest amount of psychotic symptoms, and the lowest scores on the YMRS (Young Mania Rating Scale) and the HCL-15 (15-item Hypomania Symptom Checklist) for the UP group; this result could have been predicted as well, since the YMRS and the HCL-15 mainly measure manic symptoms, which heavily contrast depressive symptoms.

Overall for Lin et al, I interpret the test as reliable and valid. Although the reliability for the test may have run into some possible issues with test-retest reliability and carryover effects, the .6 score is not a bad score for the measure, and with interrater reliability at an excellent high of .9, the reliability scores for this measure seem to be in good standing. As for validity, things may have not been as concrete as reliability’s scores being above average, since the criterion validity is very low, and the construct validity correlates negatively compared to the other measures being used for the experiment, but the content validity does make up for it, being able to cover questions for depression symptoms with varying intensity for all groups.

Reliability isn’t the only factor to be considered when analyzing a test: validity is just as important. Validity is a concept in testing that is when the measurement data seems to accurately interpret what the test is measuring. There are multiple types of validity, all of which are important and each have a specific area of a measure they focus on. Validity is forever associated with reliability as well, since there cannot be a valid test without reliable measurement data.

Content validity is a type of validity evidence that allows a test or measure to show that it accurately portrays the data that it says it does. This type of validity is important since it is the basis of most of, if not all items on a test, since the point of it is to measure what is relevant to the construct which is sought after. The Hamilton Rating Scale for Depression’s content validity would come from items which measure how intense depression symptoms may be, or to determine whether one is experiencing various depression symptoms at all.

Criterion validity is not as direct as the previously discussed content validity. Criterion validity is more so what the items on the test could be related to other than the construct being measured. The HRSD questions would have roots in not just outright depression, but also other mental illnesses that have depression symptoms as a part of them, like bipolar disorder or borderline personality disorder. The criterion in criterion validity refers to just that, what other criteria a measure can be related to.

The following two validity sections are technically part of one higher validity, which would be construct validity. Construct validity is considered an overall arching form of validity, so it is commonly broken up into parts, which the first being convergent validity. Evidence relating to convergent validity is when the test or measure is correlated to another test or measure. Many participants who take the Hamilton Rating Scale for Depression frequently have some sort of correlation with the Hamilton Anxiety Rating Scale (HAM-A).

Convergent validity makes up the first half of construct validity, while the second half measures the opposite of convergent. Discriminant validity provides evidence for whenever a test or measure does not correlate with another, or simply put, what doesn’t apply to the test. The HRSD, for example, measures the intensity for depression symptoms, and discriminant validity would provide that it would correlate negatively compared to other scales that would measure a construct such as mania. Lin et al actually touches on this exact example, with when the HRSD was measured in their experiment, it correlated poorly against the Youth Mania Rating Scale (YMRS) and the 15-item Hypomania Symptom Checklist (HCL-15). This negative correlations are predictable since depression symptoms and mania symptoms rarely happen in tandem.

References:

1. Hamilton, M. (1960). Hamilton Rating Scale for Depression. Psyctests, doi:10.1037/t04100-000

2. Lin, K., Xu, G., Lu, W., Ouyang, H., Dang, Y., Guo, Y., & … Lee, T. C. (2015). Neuropsychological performance of patients with soft bipolar spectrum disorders. Bipolar Disorders, 17(2), 194-204. doi:10.1111/bdi.12236

3. Ballard, E. D., Ionescu, D. F., Vande Voort, J. L., Niciu, M. J., Richards, E. M., Luckenbaugh, D. A., & … Zarate, C. J. (2014). Improvement in suicidal ideation after ketamine infusion: Relationship to reductions in depression and anxiety. Journal Of Psychiatric Research, 58161-166. doi:10.1016/j.jpsychires.2014.07.027

Sample details

Related Topics

HRSD by Max Hamilton

Cite this page

Related Topics

Related Topics

HRSD by Max Hamilton

Cite this page

Related Topics

Check more samples on your topics