When historical averages let you down

Patrick Lougheed

The past is the best predictor of the future, and when forecasting what’s going to happen our best source of information is often what has happened previously. But, there are times when we need to be careful as looking backwards can let us down if we aren’t intentional about it.

When we’re forecasting student headcount and course enrolments, using historical averages for student retention is a common practice. Some institutions look at only the immediately previous year to see what happened, while others may look at two, three, five, or even more years and average them out. This average can then be used to anticipate retention going forward, giving an anticipated number of retained students.

But, this number is going to be so rarely exactly correct that it might as well be never. Yes, it’s going to be in the right neighbourhood, but it won’t be bang on. Each student has their own decision-making path, and when we get into those individual decisions in some years they’ll go one way, and in other years they’ll go another.

There’s several different situations where using historical averages might let us down:

Small Groupings

When we look at small groups of students, like a small program or a particular slice within a program, historical averages can let us known. Let’s say we have a program with 20 students, and historically 80% of them have been retained. Then we might expect 16 students to be retained.

This won’t always be the case. In this case we have two potential outcomes (retained or not retained) so we can model this as a coin flip or more generally as a binomial distribution of outcomes. If we run binomials ourselves, 16 will be the most likely outcome but it’ll only happen roughly 1 time out of 5 (21%). It’s almost as likely that we’ll get 15 (17.5%) or 17 (20%) students retained - not a big deal. We might also get more extreme outcomes, like 12 students retained (2%) or all 20 retained (1%), and it’s possible we could have even lower numbers of retained students.

Note that this problem doesn’t disappear with large programs - we still aren’t going to be perfect - but the differences will likely be smaller in relative terms. Being off by one student in a 20 student program is a 5 percentage point swing, but being off by one student in a 1000 student program is a 0.1 percentage point swing.

New Programs

Historical data is great…when you have it. But for new programs, you won’t.

One option is to estimate retention rates based on other, similar programs, and here you’ll need to determine which programs are similar. Or you could use rates from similar programs at other institutions, if they exist. It’s unlikely any of these will be absolutely correct, but they’ll get you in the neighbourhood.

When Circumstances Change

We’ve learned many things from the COVID-19 pandemic, and one of these is that unexpected events can throw a wrench into our expectations. Underlying the use of historical averages is the assumption that our general circumstances will be similar to past years, but that’s not always the case.

These changes circumstances may be something external, like COVID or changes to government student aid policies. They may be something internal, like changes to our own institutional policies or program delivery models. Or they could be circumstances specific to students' own life experience. Regardless, when things change, we need to change our assumptions.

So, What Do We Do Instead?

Historical averages aren’t perfect. How can we use them in a way that gives us more confidence?

One option is Monte Carlo simulation. Instead of running one forecast and assuming the historical average will hold, we run many different forecasts. We’ll still have a “middle of the road” outcome we can use for planning purposes, but Monte Carlo will also give us a range of different outcomes (with some associate probabilities) that will also allow us to say “we think the outcome is going to be somewhere in this range.”

Another option is running forecasts where human input is used on top of the historical averages. For example, if 80% is the historical average retention rate but our institution has just implemented a new co-curricular program that we believe will increase retention rates, we can use that human intuition to model what would happen in that case. By providing different adjustments (1 percentage point increase, 2 pp increase, etc.) we can see how that would change different scenarios.

Of course, the ideal situation would be having both of these options.

This is where it gets tricky. Doing either of these in a tool like Excel is difficult at best, and it requires some computational horsepower. But that’s another blog post…

< BACK TO POSTS