The Industrial Internet of Things FEATURE NEWS

Top 8 Questions to Ask When Choosing a Data Storage Strategy for Your IoT Solution

By Special Guest
Jay Srinivasan, Sr. Director - Engineering at infiswift
January 10, 2018

Internet of Things solutions have a lot of moving parts that need to work together to make something useful. Each architectural choice impacts performance from the end device hardware to the protocol used to transmit data to the back-end infrastructure like hosting and databases. Here we’ll look at the impact of choosing a database on an IoT solution and some of the key questions that should be asked to help make the right choice.

Let’s say you are the decision maker in your company that got the unenviable responsibility of coming up with the right database strategy for your IoT solution. Actually, finding answers is not that hard with all the information out there. What’s difficult is asking the right questions. In this article, we’ll focus on what questions an engineering director/manager should be asking to ensure that they’ve considered all options when selecting  a database.

The first question is how much it costs, right? Wrong, this should really be the last question. Although it might be tempting to understand cost first thing, it’s not actually so easy to answer that question unless you first answer several other questions. Here’s where I would start:

What type of database do you want?

There’s SQL databases, NoSQL databases, and for IoT specific workloads there are also time-series databases. It may help to understand the strengths and weaknesses of each of these and to decide which major direction you want to take. The next 7 questions will help direct this highest level decision.

What type of workloads do you have?

Do you have transactional workloads, analytics workloads or a combination of both? The underlying storage engine is very different for various workloads, so different databases can be great with one but not the other. For example, there’s rowstore vs columnstore. Typically (but not always) rowstore-based databases are efficient for performing transactional queries but are not optimized for reading selected columns for analytical purposes. Columnstore-based databases are efficient at reading lots of data for analytics processing in a high-performant manner, but are not so good when you have to do a transactional update.

How scalable does your solution need to be?

Are you just connecting up tens or even hundreds of devices, or are you dreaming in the millions? Most implementations start small but it’s important to understand reasonable expectations for the near future. Some databases may be great to start with (low cost, high performance), but won’t necessarily scale easily beyond a certain capacity. That is, unless you make changes to your application code which brings us to the next question.

How dependent should your application be on your database?

Each application can be tied to a particular database based on  the way it’s partitioned, scaled, etc. Tying to a database makes certain things easier and more streamlined, but can require lots of new code if a switch is needed down the line. Keeping an application as decoupled from a particular database is very much desirable so as not to lock yourself out of other options. It also keeps up with the paradigm of programming to an interface, not to a specific implementation.

What’s the expertise of your current (and near future) team?

If all you have is a bunch of SQL developers, you’re going to ruin their productivity by forcing them to use a NoSQL database and vice versa. So ask yourself how quickly your team can switch and ramp up. Do you have the in-house database administrator (DBA) expertise to manage and support the developers as needed? Also are you taking into account the new additions to your team that are in your recruiting pipeline? Get the most out of your database by pairing it with the right team.

What database dependencies you can accept and can you not accept?

Some database products come with an option to host yourself and others are cloud-based solutions that are ready to use. Both have their pros and cons. In the former, you’re dependent on an experienced DBA who can take responsibility for hosting and scaling your database as your business grows. In the latter, you’re dependent on an external cloud provider, which naturally influences where your applications will be deployed. For example, if you choose Google Spanner, it might be more performant to run your applications on Google Cloud as well so you don’t have to pay for the latency or egress costs of network data.

What other pieces do you need to complete the story?

Once the database is selected, there is a lot to still decide based on how you want to use your data: What about caching? Do you need other services on top like a memcache? How about visualizing the data in your database, do you need services like Tableau or Dundas? If so, do they integrate well with the database of your choice? Are you going to store all the data in the database or are you considering moving older data to cheaper, cold storage such as AWS Redshift or Google BigQuery?

And finally, the big elephant in the room, what are the costs of licensing, developing and hosting?

A database that has a license fee of $100,000 a year may appear outrageous compared to an open-source alternative but if the open-source option requires a dedicated DBA, whose typical salary is around $150,000, then is it really cheaper? There are also costs associated with hosting the database that you need to take into account that vary widely at different scale. For example, an option that’s cheaper for the first 10TB may not be cheaper once you go in the 100 TB range.

By understanding the needs of the solution and how a database will be used, you can be more comfortable that there will not be surprises down the line. It’s important to think beyond the pure database technology and numbers to what you can expect in the future. Once you have your answers clear and databases shortlisted, you can be confident in making the right choice!

Edited by Ken Briodagh

Related Articles

At the Edge of an IIoT Evolution

By: Special Guest    3/15/2018

IIoT continues to grow and the buzz around edge computing is growing in parallel. Operators are beginning to drive more computing power to the edge of…

Read More

Bosch, RTI, Huawei and Dell EMC Confirmed as New IIC Leadership

By: Ken Briodagh    3/14/2018

International team comprised of leaders from industry, IT, OT and telecom

Read More

The Future is Now: Mobile Print Trends from HP and Mopria

By: Chrissie Cluney    3/14/2018

An IoT Evolution interview with Brent Richtsmeier, chairman, Mopria Alliance Steering Committee and executive at HP where he leads the technology focu…

Read More

IoT Time Podcast S.3 Ep. 8 IIC

By: Ken Briodagh    3/14/2018

In this episode of IoT Time Podcast, Ken Briodagh sits down with Steve Hanna, Senior Principal, Infineon Technologies, and co-author of a new Industri…

Read More

Excelling at Edge Security: Best Practices Whitepaper released by IIC

By: Cynthia S. Artin    3/14/2018

Of all the challenges associated with the never-ending variations and themes on the IoT and IIoT, there is arguably no more vexing a challenge than lo…

Read More