Benefits of Mapping Data Lineage

Published on 05 Oct 2022

by Clara Yeung

Data can be used to tell a story, but how the story unfolds can be difficult to follow if you only have data points from the beginning of the story and the end, leaving you to ask, “how did we get here?” As a simplified example, let’s say you want to know what factors may have played a role in fluctuating enrolment data. You notice that since two years prior, your enrolment in Faculty X has decreased, yet student satisfaction scores increased. Is the story that lower class numbers increased student satisfaction? What about the decrease in enrolment – what factors adversely influenced this data? Data lineage fleshes out more of the story, allowing you to look at the life cycle of the data, from data origin, to changes over time – the how and why. Following the data in its journey over time helps you to understand its story from its creation to its consumption and then to share its significance with your institution.

Understanding your data’s lineage therefore also creates trust in the data, empowering you and your team to make decisions that benefit the institution. Data mapping tools can trace end-to-end data lineage across platforms, datasets, ETL/ELT pipelines, charts, dashboards, and beyond. Ideally, your data mapping tool should be able to extract lineage from a myriad of data platforms such as modern cloud warehouses, web connections and APIs, transformations, and business intelligence tools such as Tableau.

Your data mapping tool should also allow you to explore downstream and upstream entities to gain insight into the production, transformation, and consumption of data at the organization. Upstream lineage provides insight into the sources of data and therefore helps you to determine whether they are reliable and trustworthy. Downstream lineage allows you to validate the quality of the data product. Visibility into both upstream and downstream data facilitates understanding into the impact of breaking changes on downstream dependencies.

Documenting lineage metadata is also very beneficial. Metadata can include things such as description, ownership, tags, and so on. This information can aid in visualizing how data flows across your entire data ecosystem and see the dependencies of a given data product, as well as a list of downstream consumers. Combined with schema history, your institution can then identify root causes to explore how a dataset’s schema has changed, allowing you to identify any potential upstream changes that may have resulted in an issue.

These data tools typically also support identifying data owners, data stewards, and other roles important for a successful data governance initiative. This can help promote trust in your data by ensuring that it is clear who is accountable for the accuracy of a particular dataset, and who to turn to if questions arise.

Implementing a data mapping tool may seem daunting, but its benefits are far reaching. A story with no middle – only a beginning and an end – is both confusing and unreliable.

Trace your data lineage to learn where your data came from, how it evolved over time, and where it is now!