In a previous post, I explored the power of data to give you a competitive advantage. In this part of the series, I will cover the biggest blockers to leveraging your proprietary data to its fullest extent, the ease with which you can actually start down the path of utilizing your data by taking an experimental approach, and the ultimate benefits that can be achieved once a data exploration program has been implemented.
Silos are blocking you from using this data
There are several reasons that you can’t take full advantage of your data, but primary amongst them is that the valuable and un-explored data is siloed and is sitting on legacy systems.
The fact that the systems are “legacy” means that, more often than not, a good understanding of the data is lacking. This happens primarily because of poor record keeping and lost know-how due to folks that built the systems retiring. Ultimately, even the folks who own the data have no idea about its meaning, or, in a lot of cases, how it is being used across the enterprise.
Data scientists across the enterprise face a more pronounced issue. They have even less of an idea about what might be available to them, and even if they have an idea, the process around gaining access to the data is very cumbersome and often takes months to resolve, as lines of business (LOBs) try to figure out who has the authority to grant access to the data, if the requesting party has sufficient authority, and if their work falls within the guidelines of acceptable use of its data. As a result, it becomes very difficult to discover data, play with that data, and to try to figure out if that data will add value to a model or not.
Some other issues include the timeliness of the data. Even if the data can be explored, and is useful in a certain type of model, it is often the case that the data is not updated in time, or often enough to be useful in a model, especially these days when models are run so much more frequently.
Uncovering the value in this data is not as difficult as you might think
So how do you uncover that value in the data? For one, you need a really specific question or business problem. Data scientists are usually pretty good at solving problems, but without a clear direction, they’ll find their own problem to solve, usually focusing on coolness, which rarely intersects business need.
Second, you need a forum where data scientists can collaborate with business folks during the data exploration process. As I mentioned earlier, a lot of the most valuable insights come from cross-LOB or behavior capturing variables. The know-how about the e-mail address decomposition I mentioned in Part 1 might lie with your fraud investigators, for example.
Third, you need a subtle mind shift around who conducts the analysis, the confidence in their results, and the relationships between the stuff we are trying to predict and what we are using to predict that stuff.
The person who is conducting an analysis or building a model does not have to have a PhD in statistics or computer science anymore. Also, the low or no code tools available in the marketplace – including those from SAS – have matured so much that you can have confidence in the results that even a junior team member has come up with.
On top of that, because of the siloed nature of a lot of our business units, we tend to mostly use that business unit’s data to fulfil analytics requests. This approach still works fine, but there are many other similarly powerful data points from outside the LOB that are correlated with an event we’re trying to predict or analyze. This is what we call “alternative data,” which a lot of us assume has to come from outside the institution. As we saw earlier, this is not the case; the examples we covered earlier fall into this category to some extent, as they are not necessarily related to credit or fraud, or to traditional sources of credit and fraud data.
What’s also needed is the right governance structure. This goes beyond data governance and more into process governance. As you’ve seen here, I’m advocating for more of a bottom-up approach, where you start with the question and work your way towards an answer, as opposed to setting up all the data first and then trying to figure out what questions can be answered with that data. With the suggested approach, you need a way to very quickly resolve, in case of ambiguity, who owns the data, who can access it and for what reasons. Mining your own LOB’s data works well, but to really get the biggest bang for the buck, you have to be able to explore other LOBs’ data, and nothing kills progress faster than taking six months to figure out who owns the data and if access can be granted. As this approach is used to solve more and more problems, the amount of data that is available for modelling and analytics expands.
We also have to realize that modeling and analysis is not a one-shot deal. You are not going to get the perfect model right off the bat. This is an iterative process. New info will come to light that could wildly change how a model performs, and old data could lose its predictive power.
Finally, I’d like to highlight that these suggestions do not alter your digital transformation journey. They fit neatly within it and enhance it.
You get to improve your operations by focusing on the data
With more data, more timely data, and better data, you get to:
- compete more effectively by offering innovative products like earned wage access, which relies on timely chequing account data to offer what are effectively micro-loans to users who might not otherwise qualify for a credit card or a larger, longer-term loan. This allows customers to form a tighter relationship with your institution and gives you the ability to put a significant dent into predatory lending.
- improve your operations by more readily tackling issues like attrition and fraud.
- arm yourself against changing market conditions. Government, through the open banking themed legislation, is looking to give much more visibility into, and ownership over, the data that is generated about consumers, to consumers. At the extreme, think about a consumer being able to automatically dictate what data can be used, and shared, and with whom. Generally, the entity collecting the data should be immune from those rules, but third party data is no longer as valuable, because a bunch of large holes that could develop in a privacy-conscious customer’s profile if they decline to allow third party data providers to see their data.
All of this additional data is really meant to uncover hard-to-determine patterns in a more timely manner, to drive business outcomes like growth and loss mitigation. The innovative insights you get from this data give you a customer’s full, 360-degree view, their lifetime value and competitive intelligence on what these customers might be doing with other banks.
Additionally, in Canada, the maximum term you can keep data is seven years. Similar rules exist in other countries. This is barely enough time to cover one business cycle. This means that the older data you delete without first having analyzed it properly is literally money down the drain.
I hope you got a sense of how much data you are sitting on, the untapped value of that data, and the fact that it is not that difficult to start making headway in realizing said value, so that you can give customers personalization and innovation, satisfy regulator demands, utilize talent more efficiently, reduce technical debt, provide shareholders with better performance and compete effectively with existing and new entrants.