Lecture Notes: Data Governance
The initial step in the implementation of a data governance program involves defining the owners or custodians of the data assets in the enterprise. A policy must be developed that specifies who is accountable for various portions or aspects of the data, including its accuracy, accessibility, consistency, completeness, and updating. Processes must be defined concerning how the data is to be stored, archived, backed up, and protected from mishaps, theft, or attack. A set of standards and procedures must be developed that defines how the data is to be used by authorized personnel. Finally, a set of controls and audit procedures must be put into place that ensures ongoing compliance with government regulations.
Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. Aspects of data quality include:
- Data quality is not a separate, single activity. Data quality is incorporated into every aspect of the framework. The prime function of governance is to improve and maintain the quality of the data; thus, to be successful at governance, quality must be continuously measured and the results continuously fed back into the governance process.
- Data quality does not equal data cleansing and scrubbing. Data quality is a much more involved process that focuses an organization’s resources on addressing the quality issues at the source vs. after the fact in the warehouses and analytic/reporting platforms.
- Source systems and their data stores have been included in the framework. As anyone who has tackled the issue of data quality knows, it is far easier and cheaper to fix data issues at the source than rely completely on data cleansing and scrubbing. It is for this reason that our framework explicitly links source systems and files to the data governance entity. Ownership and accountability for the quality of the data must reside with the business owners of the source systems. Now for a dose of reality. Let’s understand that some may feel that it's naive to think that all or even any data quality issues will get solved at the source when in a large enterprise there may be multiple conflicting sources, and often the real challenge is getting different divisions to agree to use a single source system. Thus, it is usually smart to build rigorous structures in a data quality management architecture to verify & remedy source data quality and certify target analytic data's quality.
- Technology is not explicitly highlighted as a separate component either. Technology is an absolute requisite to ensure the success of any governance and quality effort and is, therefore, integrated within the entire framework.
Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. Aspects of data quality include:
- Accuracy
- Completeness
- Update status
- Relevance
- Consistency across data sources
- Reliability
- Appropriate presentation
- Accessibility