The cloud allows very simple scalability. But before we scale up we need to think about a number of concerns including data sovereignty and data integrity. I would argue the latter is most important.
Organizations are worried about data sovereignty, and so may not take full advantage of cloud computing. As the separation vote was held last week, there were concerns about data stored in Scotland. Companies are afraid to let their data go, and argue they need to keep it close. But vendors like Cisco Systems Inc. are providing solutions to these concerns. You can carve off big chunks of data and say these are the rules for all of this section.
In contrast to the data sovereignty issue, data integrity cannot be solved in big chunks. For each field and relationship you build or gather, you must consider it individually and determine ways to keep that data clean. The cloud is making it easier to just keep everything, but we must be careful we do not clutter up our enterprise like hoarders clutter up their homes.
As IT professionals that want our ethics to be trusted, we must be aware that once data is stored in a computer it is assumed to be correct. Any kind of number or report based on this data cannot be checked, or at least is not normally questioned. Especially in the case of big data. They rely on us. This puts a very strong responsibility on us to ensure that the information we publish can be trusted.
On the IT side we put data edits and cross-checks, error reports and valid value tables. But the organization as a whole has to understand their data sources and ensure the people entering the data are not just filling in fields to get through the screens. Just because they picked a valid value does not mean that it was the correct value for the transaction they were doing. Just because they have a second list of customers, it does not mean they can all be considered new. Is there overlap on the lists? We are adding work for everyone, the more data we keep.
Anyone who has tried to reconcile two project reports with conflicting numbers has seen the problem. Data Administrators have been warning us about this for years. Now there are Data Cleansing Administrators. And organizations with tools that will help you clean up the “17% of the data they hold to be inaccurate”.
Do you need an intervention? Someone to come and clean up your data? Avoid this by good habits. Don’t just store everything and keep it for later. Identify what is needed and work to keep that data clean and organized. Ensure the IT industry does not become known for hoarding data all over the world.