Andrew Drinkwater

Two decades of data wrangling, and I’m still learning new things

Click here to view the original post.

I’ve spent a long time wrangling data. In the early 2000’s, I had some unique experiences with university. The program that I was a part of was originally part of the Technical University of British Columbia, a publicly funded university that closed and was absorbed into Simon Fraser University in 2002. I joined during the transition, but many students had attended TechBC one to three years prior. Once the decision was made to close our programs and transition them to “regular” SFU programs, we students felt that the walls were closing in: that we better take our courses before they were phased out and no longer an option. I became a student advocate during this time, meeting regularly with the university administration to discuss student concerns. Little did I know I was inadvertently auditioning for a future job opportunity. As I got nearer to graduation and the realization that I didn’t want to work in IT (this irony is not lost on me), I applied for a student advisor-recruiter position in what is now known as the School of Interactive Arts and Technology. In turn, little did I know, there were plans to overhaul the fledgling curriculum only a few years into the new school establishment. And who better to help plan that? Yeah. Me! (And others – it takes a village.)

Overall, I think we did a pretty good job helping students through two major curriculum upheavals in approximately five years. The key, as our registrar reminded me years later, was focusing on putting the learner at the centre of our planning and helping them navigate the pathways.

But where did data come in? We ended up recruiting a ton of students to our new school, growing from 400 to 1,200 in 3 years.

If you work in business your first reaction might be “wow, that’s amazing!” but if you work in academia, you look at numbers like this with dread. Hiring seldom keeps pace with that kind of growth, and we were no different.

Data to the rescue. I had recently read Competing on Analytics by Tom Davenport and was inspired at what could be done with data. I thought, we must have some information that could help us plan! And with a little help from my friends, I was able to get permission to connect to the university’s reporting database and learn more about who our students were, how far through the program they were, and importantly - what courses they would likely need going forward. This helped us figure out who to hire to ensure we could help all these students graduate.

In those days I had to use hand-curated SQL export to Excel, and then visualized using Tableau (then a new and untested tool. See Tableau Changed My Life).

More recently I learned of a tool called FME. Little did I know, it existed way back in the early 2000s. Oh how I wish I knew! I could have skipped the SQL and Excel, though Tableau remains the best tool for visual analysis.


FME, made by Safe Software, is a tool that allows you to connect applications, transform data, and automate workflows.

Disclosure: my company, Plaid Analytics, is an FME Associate Partner. We provide consulting services, training, and workflow development for universities and colleges related to FME. We do not receive any compensation from Safe Software but do have access to demonstration licenses.

In my view, FME can help professionals working with education data be more responsive, facilitate decision making, and improve data quality.

FME helps me by making things I used to do the hard way so much easier. For example, I can now build a visual workflow with drag and drop to transform my data from one form to another, such as being able to make timely and accurate submissions to the government. It also helps me do things I always wished I could do but weren’t all that straightforward, such as notifying a student or an advisor if the student drops a class and falls below the cut-off for financial aid.

Here’s an example of that government submission workflow. On the left, we connect to our local Student Information System (eg: Banner). In the middle, we create the provincial government submissions as tables. On the right, we create the federal government submission as space-delimited text. This makes my submission as easy as clicking the run button, and then moving the files to the appropriate repository (FME can help with that move, too).

No alt text provided for this image

To me, one of the biggest advantages here is that I can see what is happening at every step of the process. If I missed a join key, I could see exactly where my data doubled or vanished. One of the major plusses for me is not needing to debug nested queries, which can be very difficult in SQL.


We recently presented at the FME World Fair on use cases in higher education. These are, of course, the tip of the iceberg - watch our site or subscribe to YouTube to learn more.

1. Building an automated data pipeline and data warehouse

Leveraging FME, we’ve been able to build an automated data pipeline and data warehouse. This allows the college we’re working with to better track their students' academic progress, and importantly ensure access for their diverse populations.

2. Combining campus SIS and cloud LMS data

When I worked in Institutional Research, very few had deep experience working with cloud data which is often delivered via an Application Programming Interface, or API. Unlike a traditional relational database or spreadsheet, with an API you typically need to request your data in small batches - maybe 100 records at a time. FME allows you to connect to services like this quickly and easily using built-in transformers and some information from the API.

It can also be the interface between your platforms. If a student adds a class in PeopleSoft, FME Server can instantly run to get them access to your Canvas Learning Management system. Or vice-versa.

3. Projecting alumni economic impact - how much do my graduates earn?

This one was a ton of fun. We built our analysis leveraging the Sudmant method, whereby earnings were estimated over a 40-year career, comparing university to high school graduates. The differential, net of the costs of tuition and educating the student, form what is called the educated workforce premium. Traditionally, most universities in Canada have used a similar method, though some have simplified. What we saw in our background research is that most institutions focused on their local market (for example the University of Calgary used Alberta earnings; SFU used Vancouver) but not beyond. We used FME to scale this out with additional geography: bringing in Statistics Canada regions and earnings and relating those to where graduates live and work allowed us to say that there was an economic impact for BC graduates both within BC and across Canada. As FME has the best support for spatial data, we could also split up the various regions based on user selection - for example, if this analysis was run for the University of Northern British Columbia (they’re not our client, just for demonstration), we could split Prince George from the rest of the surrounding region, allowing us to estimate very local impacts, regional impacts, and cross-country impacts.

Here’s the result, while noting that the institution and number of graduates is made up.

No alt text provided for this image

4. Enabling government reporting

I have yet to meet an education data professional who relishes government reporting (though many are really good at it anyways!). FME makes this simpler. With a few transformers, we can take data from our Banner Student Information System, transform it into the tables required by the provincial government, and then transform it the space delimited text file required by the federal government. All with drag and drop.

5. Creating an open data portal

Demands for transparency and accountability on institutions are increasing. FME can help you build an open data portal where your users can select the type of file they need and what data is of interest.

Safe Software had a great blog post on exactly this recently.

6. Empowering your data team with the FME education subscription

The education subscription, in my view, is a game changer. This allows you to get an enterprise subscription, with as many licenses as you need (both desktop and server). Pricing is based on the number of students at your institution:

Why does this matter?

Many institutions we work with struggle with the subscription models employed by many software vendors. Each time you onboard new staff, there is an added license cost that you didn’t plan for in your budget. But with FME, the cost of adding new staff to the platform is exactly zero. This allows you to encourage your teams to build data integrations, experiment, and learn new things without blowing a hole in your licensing budget.

So, what else can you do with FME?

The sky is the limit. If it involves data, FME can help you get there. Maybe you want your data warehouse to run overnight; maybe you want it to refresh as soon as a change is made (such as a student dropping a class); perhaps you want amazing campus maps to help students navigate. What about using Augmented Reality so you could embed visual instructions right into student’s phone screen? One I find compelling: how can I get from classroom A to B without encountering any stairs? What about starting to clean up all those spreadsheets? We all know that despite ERPs, many institutions have tons of spreadsheets. What if you could build an easy way to make sure those spreadsheets are properly recorded? Or maybe you want to understand what future outcomes await students who waitlist for a course. Do they ever enroll? If so, when?

We’ll be running webinars on the above over the course of the summer. Links are in the comments.

Are you curious about what is possible with FME? Please let me know in the comments or reach out directly. I’d love to hear from you.