Correlation and Regression

Correlation is a measure of the relatedness or the association of two or more variables. These variables are already existing data which a researcher can study to determine whether or not there is a relationship between the variables. Correlation uses a variable in order to forecast something from the other variables (Lanthier, 2002). An example would be the correlation between the national income and HIV/AIDS in African countries. A study shows that the lower the national income, the higher the chance for the African people to acquire HIV/AIDS. Since a correlation is present between the said variables, one can make a prediction about the status of HIV/AIDS in African country if he or she knows its national income (Andoh, et al., 2006).

A correlation can be positive or negative. A positively correlated data is said to be increasing while a negatively correlated data is said to be decreasing. A correlation can also be strong or weak. A perfectly positive correlated data can have a correlation score of 1, a perfectly negative correlated data have a correlation score of -1, while a perfectly uncorrelated data can have a correlation score of 0 (Lanthier, 2002).

Regression is also a measure of relatedness or the association of two or more variables compared to correlation. Regression is usually about searching a relationship between two variables and at the same time determining the causal effect of one variable on another (Sykes, 2008). An example in public health would be the immunization coverage among long-term care facility residents. In this example, regression gives us a quantitative measure of the immunization coverage to the long-term care facility residents. If immunization coverage is shorter for a person, then he or she will have a long-term care in a hospital (Bardenheier, et al., 2005).

Linear regression is one of the most basic forms of regression. In this form of regression, it is said that two variables relationship can be illustrated with the use of a linear model in the form of Y = a + bX + e where Y and X are the variables, a and b are determined by the condition that the sum of squares of the residuals is as small as possible, and e is the residual variable with mean zero (Stanton, 2008).

The main contrasting concept is that correlation is not causation. Correlation cannot tell the cause of something. If a variable causes another variable to happen, then it is causation. Thus, if there is causation, then one can say that the variables are also correlated. However, when variables are only correlated, it only gives us an idea of what variable will be based on another variable (Lanthier, 2002). To illustrate, consider the correlation of influenza and respiratory syncytial virus with the total volume of emergency department visits in Los Angeles County. There is a strong correlation between the said variables. Thus, if the total volume of emergency department visits increases, then the total volume of influenza and respiratory syncytial virus cases may rise as well. However, one cannot say that the cause of the high volume of emergency department visits is influenza or syncytial virus (County of Los Angeles Public Health, 2005). Another example could be the correlation between the girls who watch soap opera and eating disorders. A study concluded that girls who watch soap operas are more likely to have eating disorders. Nevertheless, it is wrong to conclude that watching soap operas will cause a girl to develop an eating disorder. In such cases, correlation is not causation (STATS, 2007). There are many other cases in which correlation does not imply causation. Thus, the proper notion should be “causation is correlation.”

References

Andoh, S.Y., Umezaki, M., Nakamura, K., Kizuki, M. and Takano, T. (2006). Correlation between national income, HIV/AIDS and political status and mortalities in African countries [Abstract]. Public Health, 120 (7), 624-633. Retrieved February 5, 2009 from http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B73H6-4K42DPS-3&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=3fa152633c914a9eeefa31206c248fec.

Bardenheier, B., Shefer, A., Barker, L., Winston, C., and Sionean, C. K. (2005). Public Health Application Comparing Multilevel Analysis with Logistic Regression: Immunization Coverage among Long-Term Care Facility Residents [Abstract]. Annals of Epidemiology, 15 (10), 749-755. Retrieved February 5, 2009 from http://www.annalsofepidemiology.org/article/S1047-2797(05)00062-1/abstract.

County of Los Angeles Public Health. (2005). Correlation of Influenza and Respiratory Syncytial Virus with Total Volume of emergency Department Visits in Los Angeles County. Acute Communicable Disease Control: 2005 Special Reports. Retrieved February 5, 2009 from http://www.lapublichealth.org/acd/reports/spclrpts/spcrpt05/Syndromic_SS05.pdf.

Lanthier, E. (2002, March 29). Correlation. Northern Virginia Community College. Retrieved February 5, 2009 from http://www.nvcc.edu/home/elanthier/methods/correlation.htm.

Stanton, C. (2008). Linear regression. California State University, San Bernardino, Department of Mathematics. Retrieved February 5, 2009 from http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html.

STATS. (2007). What is the Difference Between Correlation and Causation? Retrieved February 5, 2009 from http://stats.org/faq_vs.htm.

Sykes, A. O. (2008). An Introduction to Regression Analysis. Chicago Working Paper in Law & Economics. Retrieved February 5, 2009 from http://www.law.uchicago.edu/Lawecon/WkngPprs_01-25/20.Sykes.Regression.pdf.