There is a very common research mistake. It is pervasive in Wall Street research, even that presented by the big-name firms. My academic friends are not immune, partly because they have their own set of incentives.
My mission in this post is fourfold:
- Explain the problem in a way that it can be readily understood;
- Show how you can spot it in practice;
- Provide a clear example;
- Suggest some other applications.
The Problem – Selecting the Right Data
If you take a course in research design, one of the first topics will be determining the right data and sources for your analysis. To help us stay open-minded, I will use one of my favorite approaches – a sports analogy.
Let us suppose that we wanted to predict the total points that would be scored in tonight's basketball game between Wisconsin and (The) Ohio State University. The game is being played as I write this, so the exercise is purely academic. Here are some possible choices for our analysis:
- Take the entire history of college basketball and use the average as our forecast.
- Use data only from the time since the shot clock was put in place.
- Use data only since the three-point basket was introduced.
- Use only data from the Big Ten, which might differ from other conferences.
- Use data only from Ohio State and Wisconsin, who might have different team tendencies.
- Use data only from these teams in the last few years, perhaps reflecting their current personnel.
Note that the data becomes more relevant as it gets more specific. Please also note that there is still plenty of data for our problem, since college teams play 30 or so games each year. Even a few years of data would provide 100 cases.
The Stock Market Comparison
Let us take what we learned in step one and consider how it applies to the stock market. Suppose that we wanted to forecast tomorrow's trading volume at the NYSE. Here are just a few of the major changes in stock trading (readers are invited to add more) since the 1792 agreement signed under a Buttonwood Tree.
- Stock quotes replaced a ticker tape.
- Securities regulation to provide information.
- Competitive commissions.
- The invention of computers.
- Options trading.
- Futures trading and arbitrage.
- Online trading.
- New NASDAQ rules and deep pools.
- Decimalization of stock prices.
- SOX and regulation FD.
- High frequency trading.
- Individual stock circuit breakers.
There are other elements, including the more active role of the Fed, but you get the drift. If you were interested in predicting volume, you probably would not use data that was more than a few years old. Too much has changed.
Even when it comes to more general market analysis I am not interested in what happened in the Taft Administration, the FDR era, or even the Ike years. I do not care much about the Nixon years, or even Jimmy Carter. We at least need to get to the modern era of an active Fed, active stock trading with low commissions, and broader access to data through financial television and computers.
For the purposes of this post I want to use a very innocent example from two of my favorite sources – both valuable contributors to our understanding of markets and current issues.
Let us first look at this chart from my friend Doug Short:
This is a beautiful chart. It is accurate and provides the most comprehensive history available from any source. Doug notes that the long-term, inflation-adjusted increase in stock prices is an annualized growth of 1.73% and that current values are 48% above this trend.
When I look at Doug's chart my eye does not follow his proposed regression line, mostly because I am totally uninterested in the old data. I imagine a different line, starting with the post-war period – surely more relevant. I also imagine a line beginning in 1982, where the data become even more relevant.
The starting point of 1871 is represented not because it is best, but because that was the earliest year for which Dr. Shiller could generate data. There is not a strong research reason for the choice.
While I was pondering this question and considering developing my own chart, I discovered this presentation from Scott Grannis:
Scott's chart is not inflation-adjusted, but it also does not include dividends. The conclusion is dramatically different, showing that stocks are in the middle of the long-term trend – growing at almost 7% a year plus dividends.
Neither source gives any particular reason for the choice of starting point – and that is my main focus here.
Great analysis begins with choosing the right data. Everyone has heard the expression "Garbage in, garbage out." This is where it starts.
If you understand this problem, you have jumped the first (and most important hurdle) in identifying strong research.
It will help you grasp the mistakes of most recession and business cycle forecasters. They simply do not have enough relevant cases to do a good job of ex-post analysis.
You can see the mistakes of those whose research identifies "bad times to invest." They also do not have enough past cases, so the inferences are unsound.
You can see the shortcomings of leading academics. They get respect for exhaustive and thorough analysis, finding data that others have missed. That is fine for their book reviews. You and I need to apply a higher (different?) standard. The popular book about why "this time is different" book has only a handful of truly relevant cases.
The world wants "actionable investment advice." Fair enough. I have been acting on the principle described here for several years – with weekly articles to explain.
The basic conclusion is that many of the popular pundits, despite their apparent use of data, have developed inaccurate and over-fit models. It is better to have simple models with more relevant data. These may not seem as impressive at first glance, but prove to be more robust in practice.
More to come on this important theme…..