Calculated Risk, a site on our regular reading list, has a story today discussing the same chart we analyzed yesterday. (Thanks to reader "RB" for pointing this out). Here is their conclusion:
We can be 99% confident that the YoY changes in real PCE are positively correlated with loosening mortgage lending standards.
Some observers are interpreting this statement to mean there is a .99 correlation -- a very different thing. (Perhaps this is also what Doug Kass meant in his comment on CNBC.)
The data presented have a correlation of .43, evidence of some relationship but far from a .99 correlation. In a .99 correlation, the scatter plot shows every point in a straight line. This is far different from the plot we showed yesterday.
This has a few outliers that define the relationship and a data cloud. Any good analyst looks at a scatter plot when doing regression and correlation. Looking at the residuals is the only way to know if a linear model is appropriate. In this case, the warning flags should go up.
A correlation of .43 means an r-squared of .19. The statistical interpretation of this is that about 19% of the variation in one series, as defined by the squared deviations from the regression line, is "explained" by the variation in the other series. This is how one looks at substantive significance -- whether the relatinoship is important.
So where does the .99 come in? A linear regression analysis calculates a slope coefficient and an intercept. In this case, the slope coefficient for the equation is .10, as we noted on our chart. This means that every 1% change in the mortgage availability measure is associated with a one/tenth point change in year-over-year PCE. The slope coefficient has a standard error, calculated from the number of cases and the degrees of freedom. The standard error for the slope coefficient is .02.
Since the coefficient is much larger than the standard error, we can be 99% sure (making some other assumptions about the typicality of the data we have) that the "true" slope coefficient is not zero. This is a test of statistical significance.
The confusion of statistical and substantive significance, and which measures are used for each, is one of the most common mistakes made by those without a strong background in research reseasrch methods.
To summarize, the Calculated Risk statement that we can be 99% sure of some relationship between the two variables is correct.
Everything in our article yesterday is also correct. The degree of association is not as strong as the misleading graph suggests. The entire relationship rests upon something that happened for a year or so in the early nineties. The measure of mortgage availability is not very good for the purpose. There is not enough data. The resulting relationship is probably spurious.
As someone who taught these classes at the graduate level, I have no illusion that the average reader is going to appreciate these distinctions, however important. It is a good illustration of how easy it is to be fooled by the eyes, and how difficult it can be to reach the truth.




