DMBOK is the Data Management Body of Knowledge framework that we use when we’re putting our data management strategy in place. In the below image, sourced form dataninjago, we can see the components of the framework.
At the very bottom, we have data governance, which is the framework around everything we do with data. This is about forming: an organizational structure (roles & responisbilities); processes; policies and skillsets which will help you to meet your business objectives using data. Part of this is about having the AS-IS scenario and conducting a gap analysis to understand the current lay of the land and to define transitional steps to move us from our current position to our desired state.
Data quality is the first element we’ll look at in the next layer. Ultimately, it’s about ensuring that data is fit for purpose and ready to use in a strategic manner. Data quality is defined by DMBOK as the accuracy, completeness, consistency, integrity, reasonability (is it within reasonable, expected ranges etc..), timeliness, uniqueness (duplicate records management), validity and accessibility. So this stage is about understanding all of these factors related to your dataset; setting up a quality monitoring & reporting process and ensuring that your governance processes support the improvement towards high data quality. This may also include the improvement of data literacy across the company as your strive for a better understanding and accessibility of the data.
Architecture is about the infrastructure and toolsets that you have but also about definining enterprise level data models; understanding lineage of data; defining data flow diagrams & understanding the recepients (people & systems) of the data.
Metadata is data about your data. Within this segment, you’ll need to deliver a comprehensive data catalog / inventory; documented schemas; documented relationships between data and defined storage details (e.g. retention periods).
Now, we move onto the blue chunk of the diagram starting with data security. This is a very serious part of the data management strategy. We must define access control policies and audits in addition to physical & technical security considerations around the data. We should also setup a governance board who will approve/deny any use-cases proposed around sensitive data to ensure that data is being processed with reasonable cause; care and attention.
Data modelling & design is another key function. We need to have clear definitions around how data is modelled; how it relates to other data; how we maintain integrity; how data is manipulated (lineage) and much more. By understanding this, we can provide far better transparency around the data & this paves the way to a self-service data environment.
When we have our data stored, we must define lifecycle policies via the data storage and operations phase of the DMBOK pyramid. The idea here is to define a clear ruleset around what happens to the data. For example, after N days, data should be archived to a cheaper type of storage; after X days, user identifiers should be hashed or after N days the data should be entirely deleted. It’s about having clearly defined rules around the data & what should happen to it over time.
After these steps, we start to get to the usability of the data, looking at middleware & ETL solutions; analytics; reporting and data science. We will cover the data management concepts associated with these in an upcoming post.