SECTIONS - Cover Story
September 08, 2017

How to Avoid Data Mining Disasters And Capture the Value You Want to Unearth

By Special Guest
Ryan Carlson, director of digital transformation services, Exosite

In the past 15 years, I’ve witnessed success and failure come and go within organizations introducing products and services to the market that leverage machine-generated data. Out of 118 projects, roughly 70 percent of those projects never made it to market or failed within the first year, regardless of the company’s size, project budget, or engineering talent. These data-driven projects all suffered for many of the same reasons, and it was rarely because of the technology.

Successfully getting a connected product into the field has become a big differentiator for companies looking to create strong competitive advantage in the industrial and commercial space, although not all data-gathering initiatives are created equal – not by a long shot.

Data-gathering initiatives rely on both quality and volume of data as key variables for use in data analysis. Successful initiatives collect the right data, which sets a company apart from the rest of the pack by driving results like greater process efficiency, reduced repair costs, improved customer service, and faster R&D cycles. First-mover advantage in the Internet of Things creates scenarios where there are no obvious fast-followers because the value of data collection is hard to replicate or copy. This means that companies with no IoT experience will be looking to play catch up.

The right time to start playing catch up by kicking off an IoT initiative is a lot like planning the right time to plant a tree – 20 years ago or right now.

It Came from the Trade Show Floor
A lot of connected product concepts get buy-in from stakeholders on the trade show floor after seeing a competitor announce a new IoT solution. These hasty IoT business plans tend to be based solely on the collection and exploitation of machine-generated data and read like a bad internet meme (Step 1. collect data, Step 2. perform analytics, Step 3…profit). This sounds a lot like sanctioned corporate gambling—gambling with company time, money, and resources on an unproven plan. Although prevalent, this approach is heavily flawed.

Driven by the fear of losing out or of being left behind in the IoT gold rush, organizations are finding themselves scrambling to play catch-up with a plan to collect data now and understand it later.That is a costly recipe for failure.

Speaking from experience, I’ve baked those cupcakes of regret when I was an OEM in the commercial manufacturing space building products with machine-generated data and internet technology. This approach to IoT represents a major misalignment between data and its value, putting too much emphasis on what might be learned in the future and turning a blind eye to the value collected data could be providing on day one. The end result is extraneous data collection costs, unnecessary market noise, and inefficient data analysis.

Successfully Leveraging Machine-Generated Data in IoT
Experience from hundreds of projects has taught me a valuable lesson about building a product and a business plan around machine-generated data.

You need to first consider whether you are gathering the right data. Next, you must identify if that data can be used to validate your project outcomes.

Assuming you can get that data, you must then consider whether the data is being leveraged in a way that is valuable to stakeholders – what immediate value does the data gathering provide and what is the value of an aggregated set of data? These are the types of things to consider to address the why of IoT before your team gets too concerned with the how.

Data-Readiness Checklist
Let’s review some of the questions you should be asking as you formulate your IoT plan.

• Do you have a hypothesis?

• Are you gathering the right data to support your hypothesis? What do you hope to learn from the data?

• Can the data be used to build a test case and is it formatted in a way analysts can consume it?

• Who are the users of the data and who are the customers of the data (i.e., who pays for it)?

• What is the day-one value of the data and what is the long-term value of aggregating that data over time?

• Who else within your organization might find value in the data you’re collecting?

Example: Data-Mining Misalignment
Below is a personal story that was shared with me about datacollection misalignment in the agriculture industry. It will help illustrate a typical form of misalignment.

A major food supplier developed a means to monitor and sample soil quality, along with a number of additional environmental factors, for a research project. After a year of data collection efforts, those involved with the effort approached its internal data-science team with the newly acquired data like a bunch of proud parents off to visit family with the goal of analyzing the data for insights. To the dismay of the research team, the data scientists reported back that they found nothing of significance. The research team was devastated at it loss and confronted the data scientists about their (lack of) results.

When asked about what the research team expected to find with the data, the lead researcher replied confidently, “We recently spoke with the landowner we worked with and they assumed that with any changes in [environmental variable], we might see the moisture level drop off, and we would see small gains in [crop performance variable].”

After a calming breath, the lead analyst diplomatically shared with the research team that had they spoken with the data-science team up front, it would have saved the research team from collecting 12 months of the wrong data. Knowing the hypothesis, the data scientists could have specified that they only needed 6 months of data at a specific sample size, duration, and frequency to help inform their hypothesis and further their research.

This story is not unique. Extracting value from data is hard, even for professional researchers and engineers.

It’s important to understand that data has a lot of uses beyond learning insights, which tends to just be fool’s gold. The real value of data comes when it is used to understand user behaviors, identify patterns, alert the right people about system-level performance, and help R&D and support teams diagnose and resolve issues in the field.

Ryan Carlson is director of digital transformation services atExosite (

Edited by Ken Briodagh

Back to Homepage
Comments powered by Disqus