One of the key parts of any information governance strategy is policy creation (and of course enforcement). It comes back to this notion of showing you tried to protect the customer data and failed is better than not trying at all.
With data, it is important to have the below policies in place:
- Data retention, archiving and disposal policy
- Data classification policy
- Information sharing policy
- Data quality policy
- Subject Access Request (SAR) policy
Let’s start with the data retention, archiving and disposal policy. This is all about how long you need the data for. The below is an example of how that might work.
The idea here is that you need ‘immediate’ access to the data within table 1 for 30 days. After that time, it doesn’t really need to be accessed very often (if ever) and can therefore be archived, but must be retained for audit purposes.
The benefit of this is, we can have even fewer individuals with access to the archived files and we can add additional encryption to the files. Even if it takes some time to access the raw information, it’s not a problem – as long as we have instant access to the newest data.
So our security increases on historical files. In the below, we can see that once a file moves to archive storage, standard users no longer have access to it – only admins can access the files. The file has also been encrypted and we have an additional layer of security to prevent unauthorised access to the file.
These things can be systematically done too. For example, with AWS (Amazon Web Services), which is a popular cloud provider, you can automatically send files to an archive after X days and you can automatically delete those files after a further X days. This takes any human error out of the equation and ensures that your data governance policy is actioned.
One final observation is that each file can have a different ruleset. For example your accounts may need to be retained for 10 years, so your archiving and deletion policy would reflect that. Whereas, your customer order information may only be required for 1 year and hence, you can drop those files sooner. Each file or datasource should be treated independently and have their own archiving rules.
Next, we have the data classification policy. This is a really important piece of work that organizations must carry out. It’s all about classifying how important data is and hence, you can decide how to protect it. Many organizations will have a scale, like the below:
- Public: this kind of data is available to anyone, for example, the companies quarterly accounts are available on the company website for anyone to download and consume. Hence, we should consider these files low risk.
- Internal: this data is slightly more sensitive. It could be that these are the yet-to-be-released quarterly accounts. So, these documents should not be communicated outside of the company. The security policy should reflect that.
- Confidential data is stuff to do with our customers, like their name and address. This data is super sensitive and we need to make sure we protect it at all costs. This data should only be accessible by people that absolutely require it and the security around the data should be extremely tight.
- In strictest confidence data would be the next level of sensitive information. It could include biometric information about our customers (e.g. fingerprints) or their payment details. This data should be accessible to even fewer individuals in the company and should have the tightest controls around it.
This policy may seem like an administrative headache, but it’s absolutely necessary. We can use this policy to drive our investment into our technologies to protect customer data; it also tells your employees what they can and can’t do with that data.
That is a very important point. Just because we may think it’s obvious what classification a piece of data should have, it is a subjective thing. So, your workforce might consider a file to be confidential whereas, you consider it to be the next step – in strictest confidence. We can remove that subjectivity and confusion by tagging all data with their classification.
The information sharing policy is exactly what you’d think. It tells our workforce what information they can share and with who. This includes sharing information internally between teams but also how they communicate with third parties.
If you have a third party service provider that supports your business, can you share customer information with them? Well, read your information sharing policy to find out!
The data quality policy is important too. We need to make sure we keep accurate records about our customers. If you think about it from a healthcare perspective, if we see that someones blood pressure was 15,000, we know it’s inaccurate. These inaccuracies do however, occur – all roads lead back to our good friend, human error.
We can therefore implement some systematic checks and prevent people creating records with incorrect data – we call these domain restrictions. For example, in a hospital, you might set the age field to be anywhere from 0 to 120. In the event that someone mistypes 40 as 400, it will not allow them to create the record.
We can also implement regular data audits to check the distribution of the data. Where we see extreme outliers, we can have those flagged automatically and we can follow the process defined within this policy to rectify the data inaccuracies.
We can also look at the consistency of our data. What does that mean? Well, it’s very likely that the same piece of data resides in multiple places. It’s terrible but it’s true, it almost certainly does. So, if I have a system that stores the patients blood pressure, I can validate the data quality by making sure that it matches the other systems that store the same data.
We can also look at how many incomplete records we have. We need to ensure both the accuracy and completeness of our data. We therefore need gaps to be flagged and we actively need to work to populate the missing data. If you were a nurse, you could do that the next time you see the patient.
Finally, if we see that we have bounced emails, where the email address is not valid, we should reflect that in our system. We should then contact our customers via other means to fix the data at source and make sure it’s accurate.
The subject access request (SAR) policy is all about how we provide our customers with access to their own data. This is a complex policy as we need to make it easy to find out where all the customer data is stored. In large corporate businesses, this could span tens of systems and be quite difficult to collate.
This policy needs to outline how to obtain all the information but also, what criteria the customer must meet in order to be provided the information.