I did not really understand what regression analysis was before I took my 2nd quantitative analysis class for my master’s degree. Sure, I had heard the term regression thrown around, but my understanding was truly limited. Once I dug into the material and embraced my inner dork, I was amazed at the predictive power that data could provide. Regression quickly became one of my favorite tools for analysis.
I used regression liberally to monitor my improvement while studying for the CFA level 3 exam. Here is how I did in Schweser’s Q-bank for Economic Analysis:
Look at that improvement!
The solid line, which is sometimes referred to as the line of best fit, is on a nice trajectory. The upward slope suggests that as I take more exams, my understanding of economic analysis improves at a rate equal to the slope of the line. Technically, this is not a great example because there are only 5 tests in the sample. Below is a better example which includes all 92 tests I took in Schweser’s Q-bank while studying for the CFA level 3 exam.
Yes. I took 92 Schweser Q-bank quizzes.
I started out averaging just under 70% and ended up at around 83%. Notice that while I took 92 tests, I never scored higher than a 96% on any of my Q-bank exams. I practically bludgeoned myself to death with CFA study materials and I still could never manage a perfect 100% score.
How many more tests would I need to take before I scored a perfect 100%?
If you draw the line out further, you would have an idea. There is some dispersion around the line, and as the line crosses 90% I would probably score a perfect test at some point. If you keep drawing the line out to the 300th test mark, I would have a fabulous average score in excess of 100%. If you kept going even further, at some point, I would never score below 100%.
Therein lies the problem with regression analysis. If you look at just my Private Wealth Management 2 Q-bank test scores you can see that if I would have taken 8 exams instead of just 6, I would have breached the 100% mark. Except that, this is not even possible.
Regression measures everything in a straight line. Improvement in almost any endeavor will rarely be in a straight line. Generally speaking, the more you improve, the more difficult it becomes to improve further. Test score improvements would probably look more like a curve or even a hockey stick which would never quite touch the 100% upper limit.
This reminds of a Greek mathematics paradox about Achilles and the tortoise. Achilles is going to race a tortoise and being a sporting fellow he gives the tortoise a head start in the race. When the tortoise is a certain distance away, say 100m, Achilles takes off and begins to close ground on the tortoise. At one instant, Achilles is 50m away. At another instance, he is 25m away. Then 12.5m away. Then 6.25m. Then 3.125m. Continuing this exercise ad nauseam, it would seem that Achilles will never catch up to the tortoise. No matter how minuscule the distance between Achilles and the tortoise, it can always be divided by 2. Mathematically, it appears that Achilles is trapped in an endless cycle of getting ever closer to the tortoise, but he never quite gets there. Yet, everyone knows that Achilles will catch and even surpass the tortoise, hence the paradox.
This is what people did before Facebook and Pokémon Go; they wrestled with the concepts of space, time, and infinity. While Achilles will eventually surpass the tortoise, my regression line for Q-bank exams would never touch 100% (unless I only ever scored 100% which I have demonstrated as sufficiently beyond my abilities).
If you draw lines between and through the points of my worst Q-bank exam results in my 92 exam regression analysis, it paints a clearer picture of what the rate of improvement should probably look like.
Massive improvements in the early testing and then slow gains followed by ever slower gains. Just like Achilles versus the tortoise. Also, note that the dispersion of exam results above the line of best fit is relatively stable compared with the dispersion below . This is also a clue that regression analysis may not be the perfect tool for the data you are looking at. My data exhibits heteroskedasticity: the dispersion is unstable, getting smaller as I take more tests. Ironically, this is exactly what I wanted going into the CFA level 3 exam, but it is not good for regression analysis.
There is yet another issue with my regression analysis. An important piece of data is completely missing. In between each of the 92 quizzes I took, were hours of studying. Taking the exams in and of themselves did not cause my comprehension of the material to improve. Rather, my performance in the Q-bank was likely more directly correlated with the amount of quality time I spent reviewing the material. The improvement happened before I took the quizzes, not during them. This data would have been difficult to collect. Believe it or not, regression analysis of my quiz scores was not my top priority while preparing for the CFA level 3 exam.
So, should we just ignore regression analysis?
Even the fact that my dispersion decreased with the number of Q-bank exams I executed is useful; I did improve and the probability that I would pass the CFA level 3 exam increased the more I practiced. I was missing important data (time spent studying), however, my quiz scores are probably positively correlated to a high degree with the amount of time I spent studying. Regression can bring data to life and it is especially appropriate for measuring things which are not limited by barriers such as 100% for test scores.
For example, if you placed a populations’ height on the y-axis and their weight on the x-axis, the regression analysis would be perfectly appropriate. People can grow as tall and round as the laws of nature allow. The data would also look a bit different than my test score data.
A good hint is that the bulk of your data should have a nice overall elliptical shape to it for regression analysis to be most useful.
So, I was a bit off using regression analysis when I was measuring my progress. However, I would do it again in exactly the same manner. There are three points to always consider:
- Never give a tortoise a head start in a race or you will have to divide distances into infinitesimal amounts until the end of space and time.
- Know what the issues are with regression analysis before you rely on it too heavily.
- There are thousands of questions in Schweser’s CFA level 3 Q-bank.
Really, I guess just number 2 is the main point here.