Happy Independence Day to our American clients, colleagues, and friends.
Learning to work with daily or real-time data feels like learning to drink from a firehose. As more universities and colleges move to having near-real-time data, these tips can be helpful:
- You’re building on someone’s greatness
- Be conscious that it can be overwhelming
- Make sure key milestones are a click away
- Make sure you can answer the questions your institution cares about
- Consult experts who have done this before
You’re building on someone’s greatness
Thanks to Elisa Davis @ItsElisaDavis for this tip on Twitter.
You may feel differently sometimes, but when the existing system was developed, it was the best that could be done with the team’s talent and resource constraints. Whether state of the art or a system that just got the job done, status quo helped you get where you are today.
I really appreciated Elisa’s point here – your role is to uplift and help those who came before. Working collaboratively with them on improvements is more likely to result in buy-in and even championing.
Be conscious that it can be overwhelming
At a university I previously worked at, our developers built an amazing datamart that could help answer nearly any question about students and applicants. I was in heaven with this level of sophistication.
But my colleagues hadn’t spent their previous ten years building and designing data warehouses like I had. They’d been primarily focused on analysis – what did they uncover, and why did it matter.
To me, having 37 joins for our admissions reports was an inconvenience, but I could work with it easily enough. I could make changes easily with code. My colleagues didn’t have that background.
The volume of information in the datamart was utterly overwhelming at first for our team.
We were able to help improve this situation for them with a combination of training and practice and designing features like reporting views that simplified the structure for those who were new to data warehousing.
Make sure key milestones are a click away
With snapshot data, you can always say “On July 1, the data looked like this”. The downside is snapshots take too much space.
In a data warehouse with daily or real-time data, you’d typically see effective from and effective to dates that are associated with records.
These date ranges can then by combined with other information:
- A Point In Time table might define particular time periods that matter – for example, maybe your registration cycle for Fall begins May 1, and the Fall term runs until December 31. In a Point In Time table you might have a “cycle” of “Registration”, and all the dates between May 1 and December 31 of that year. Often, rather than displaying the actual date, the Point In Time table will have a Date Id field.
- The Date Id field is then a foreign key to a Date table, which tells you additional information about that date id: the actual date, perhaps the weekday, day of month, day of year, week of year, etc.
With this technique, we can create the notion of a “snapshot” without having to build one individually. For example, if we were considering the applicant pipeline, we could have a recordset like this joining the 3 tables (Applicants, Point In Time, Dates).
Here we’re trying to say: “Was this student registered on July 1?”
How this technique works is this: we filter based on the specific milestone from the Point in Time table (Registration Deadline) or the date (July 1, 2022). In our join to the Applicants, we specify that we want to keep effective from dates <= July 1 and effective to dates >= July 1. Row 1 does not meet our criteria (because the effective to date is < July 1). But Row 2 does meet both our criteria. That becomes part of our “snapshot”.
To me, the biggest advantages of this technique are:
- We can make a snapshot of any day we want (assuming our data warehouse was running).
- We need way less storage – for example, the two rows above in theory replace 59 rows of data – one for each day between May 3 and July 1. Imagine how many rows that is if we have every student.
Monitoring your data warehouse is important because you want to be alerted if something goes wrong. For this, you can use purpose-built monitoring tools like DataDog, or can start with simpler things like automations that will send you a Slack message if something is out of place.
Make sure you can answer the questions your institution cares about
As with so many data opportunities – the real opportunity lies in better serving your constituents. As data professionals, we’re often surprised that many users don’t share our Tipsame passion for swimming through every record in the database. What most want is actually well structured data sources that help them answer key questions:
- How many students do we have this term? Over time?
- How many applicants do we have? How many will convert?
- What is our financial aid spend? What’s the trend? Is there money left on the table?
- Which courses are over- or under-subscribed?
- How is our enrollment compared to state funding models?
- How does our instructor workload compare with our goals?
- And so on….
In almost all cases, your users will want to slice and dice the data by helpful dimensions – nationality, ethnicity, gender identity, academic level, faculty / division / school, program, faculty rank, and so on.
And of course, build in flexibility to be able to expand to answer future questions. You won’t get them all right away.
Consult experts who have done this before
There are great communities who have been down this road before and have members who often are happy to discuss:
- IT-focused: Educause or CANHEIT,
- Institutional research-focused: AIR or CIRPA.
Additionally, independent experts like Plaid Analytics (my company) are on a mission to help higher education institutions advance their analytics. One of the big advantages of working with a group like ours is our experience working inside higher education: we understand how different that is from the corporate world, and as a bonus, we are experts on Banner, PeopleSoft, WorkDay, an many other higher ed systems. This experience, and working with dozens of institutions since launching Plaid, helps us hit the ground running because we truly get what matters in higher education. We’re always happy to discuss our experiences and/or build your daily or real-time data strategy and architecture.
The best time to build your new data warehouse was yesterday, but the second best time is today. If you haven’t started, I’d suggest the following:
- Find some colleagues who can help or provide support such as some from IT, IR, and from the business areas you serve.
- Put together a project charter for a pilot. Start small with something you can demonstrate success on. Expand later.
- Select a technology that will meet the goals outlined in the project charter.
- Begin experimenting with appropriate database designs. Don’t be afraid to try something, discover it wasn’t quite right, and iterate.
- Inventory ways you could improve in the future.
Good luck! If you’d ever like to swap ideas or war stores, or get a little help to move through the process quicker, I’m happy to connect via direct message.