Businesses, governmental agencies, and individuals all around the world create data, and they are doing so at an overwhelming rate. In fact, some estimates state that 90 percent of all data that has ever been created happened in just the past two years.
While this may not be news to those of us who plan for, manage or process this data deluge, questions still remain regarding best practices when taking on infrastructure changes to address big data. Big data, without any sort of structure, is just noise; massive amounts of information derived from a large and growing pool of internal and third-party sources. What was once a question of how we get the data has changed into a question of how we manage, analyze and operationalize insights from this data?
The value and potential uses of the vast amounts of data available is being undeniably and universally recognized. However, most organizations are playing catch up with the optimal management and use of that data. In the face of rising expectations from the business, it is becoming increasingly important for technology leaders to get data management and analysis right to contribute to increased business success.
The sheer quantity and growing sources of data as well as dynamic, rapid changes in the business and competitive landscape pose a huge challenge to technology leaders. Technology organizations can no longer operate in a traditional manner in delivering solutions to business. There is a need for technology teams to work closely in cross-functional teams to set clear business objectives and work towards those objectives by first understanding data and data flows end-to-end. This understanding will dictate guidelines for best use of the data to meet business requirements. That should be cut and dry, right? So why isn’t it?
Too often, analytics teams focus on extracting business value out of whatever data is available from systems without examining the impact of the quality and integrity of the data. The fact that there are data integrity or quality issues is taken as an inevitable given. However, in big data, there are small, bad-data problems, that now could potentially have a significant negative impact. These problems can cause flaws with the very analytical results that were painted as one of the virtues of big data.
Tech Republic stated that, “Outside of security, data management and analytics, big data usually represent the biggest IT cost for most organizations.” It’s also true that data integration and data quality are the top challenges companies need to overcome to embark on successful analytics initiatives. For this reason, focus on data integrity/quality should be an integral part of any analytics efforts. But data quality concerns don’t have to keep organizations up at night.
With automated, continuous data integrity checks and deductive analysis, organizations can be sure that data anomalies will be flagged and reconciled, and that the workflow process will route exceptions to the correct groups for timely management and decision-making..
This kind of automated data integrity and quality monitoring then provides the basis for analytics teams to assess and quantify the impact of such issues on analytics outcomes. A cost-benefit analysis can be made to determine the value of fixing identified data integrity/quality issues. If the value-add is significant, then increased benefits of the analytics is achieved by fixing the data integrity/quality issue(s), without actually sinking more cost or effort into the analytics itself. If the value-add is not significant, then there is no need to fix the integrity/quality issue. In this scenario, the automated monitoring ensures that the impact doesn’t vary over time. Another benefit of the automated data integrity and quality monitoring is to identify and manage sudden changes in integrity/quality and hence, analytics outcomes, due to external reasons (e.g. system upgrades, introduction of a new process/product).
Needless to say, there is a strong case to be made for making data integrity/quality considerations a part of any big data and/or analytics initiative.
Ravi Rao is the SVP of analytics at Infogix.
Edited by Ken Briodagh