The Industrial Internet of Things FEATURE NEWS

Top 8 Questions to Ask When Choosing a Data Storage Strategy for Your IoT Solution

By Special Guest
Jay Srinivasan, Sr. Director - Engineering at infiswift
January 10, 2018

Internet of Things solutions have a lot of moving parts that need to work together to make something useful. Each architectural choice impacts performance from the end device hardware to the protocol used to transmit data to the back-end infrastructure like hosting and databases. Here we’ll look at the impact of choosing a database on an IoT solution and some of the key questions that should be asked to help make the right choice.

Let’s say you are the decision maker in your company that got the unenviable responsibility of coming up with the right database strategy for your IoT solution. Actually, finding answers is not that hard with all the information out there. What’s difficult is asking the right questions. In this article, we’ll focus on what questions an engineering director/manager should be asking to ensure that they’ve considered all options when selecting  a database.

The first question is how much it costs, right? Wrong, this should really be the last question. Although it might be tempting to understand cost first thing, it’s not actually so easy to answer that question unless you first answer several other questions. Here’s where I would start:

What type of database do you want?

There’s SQL databases, NoSQL databases, and for IoT specific workloads there are also time-series databases. It may help to understand the strengths and weaknesses of each of these and to decide which major direction you want to take. The next 7 questions will help direct this highest level decision.

What type of workloads do you have?

Do you have transactional workloads, analytics workloads or a combination of both? The underlying storage engine is very different for various workloads, so different databases can be great with one but not the other. For example, there’s rowstore vs columnstore. Typically (but not always) rowstore-based databases are efficient for performing transactional queries but are not optimized for reading selected columns for analytical purposes. Columnstore-based databases are efficient at reading lots of data for analytics processing in a high-performant manner, but are not so good when you have to do a transactional update.

How scalable does your solution need to be?

Are you just connecting up tens or even hundreds of devices, or are you dreaming in the millions? Most implementations start small but it’s important to understand reasonable expectations for the near future. Some databases may be great to start with (low cost, high performance), but won’t necessarily scale easily beyond a certain capacity. That is, unless you make changes to your application code which brings us to the next question.

How dependent should your application be on your database?

Each application can be tied to a particular database based on  the way it’s partitioned, scaled, etc. Tying to a database makes certain things easier and more streamlined, but can require lots of new code if a switch is needed down the line. Keeping an application as decoupled from a particular database is very much desirable so as not to lock yourself out of other options. It also keeps up with the paradigm of programming to an interface, not to a specific implementation.

What’s the expertise of your current (and near future) team?

If all you have is a bunch of SQL developers, you’re going to ruin their productivity by forcing them to use a NoSQL database and vice versa. So ask yourself how quickly your team can switch and ramp up. Do you have the in-house database administrator (DBA) expertise to manage and support the developers as needed? Also are you taking into account the new additions to your team that are in your recruiting pipeline? Get the most out of your database by pairing it with the right team.

What database dependencies you can accept and can you not accept?

Some database products come with an option to host yourself and others are cloud-based solutions that are ready to use. Both have their pros and cons. In the former, you’re dependent on an experienced DBA who can take responsibility for hosting and scaling your database as your business grows. In the latter, you’re dependent on an external cloud provider, which naturally influences where your applications will be deployed. For example, if you choose Google Spanner, it might be more performant to run your applications on Google Cloud as well so you don’t have to pay for the latency or egress costs of network data.

What other pieces do you need to complete the story?

Once the database is selected, there is a lot to still decide based on how you want to use your data: What about caching? Do you need other services on top like a memcache? How about visualizing the data in your database, do you need services like Tableau or Dundas? If so, do they integrate well with the database of your choice? Are you going to store all the data in the database or are you considering moving older data to cheaper, cold storage such as AWS Redshift or Google BigQuery?

And finally, the big elephant in the room, what are the costs of licensing, developing and hosting?

A database that has a license fee of $100,000 a year may appear outrageous compared to an open-source alternative but if the open-source option requires a dedicated DBA, whose typical salary is around $150,000, then is it really cheaper? There are also costs associated with hosting the database that you need to take into account that vary widely at different scale. For example, an option that’s cheaper for the first 10TB may not be cheaper once you go in the 100 TB range.

By understanding the needs of the solution and how a database will be used, you can be more comfortable that there will not be surprises down the line. It’s important to think beyond the pure database technology and numbers to what you can expect in the future. Once you have your answers clear and databases shortlisted, you can be confident in making the right choice!

Edited by Ken Briodagh

Related Articles

IoT Time Podcast S.3 Ep.10 Netcracker

By: Ken Briodagh    3/29/2018

In this episode of IoT Time Podcast, Ken Briodagh sits down with Paul Hughes (@PHughesNC), Director of Strategy at Netcracker to talk about 5G impleme…

Read More

Smart Farming: How IIoT Is Making Agriculture More Sustainable

By: Special Guest    3/28/2018

The IIoT is driving a new industrial revolution - and this one's centered on the automation of industrial processes. All kinds of industries are affec…

Read More

IIC Releases Endpoint Security Best Practices

By: Chrissie Cluney    3/27/2018

Are you and your company interested in learning more about the IIoT to better your company's productivity? The Industrial Internet Consortium (IIC), t…

Read More

IoT Events Feature Healthcare, Smart City, Autonomous Vehicles and IIoT

By: Ken Briodagh    3/26/2018

The IoT is expanding in hundreds of new vertical directions, and upcoming events are focusing on some of the hottest: Smart City IIoT, Autonomous Vehi…

Read More

FogHorn Partners with Google Cloud to Deliver IIoT Solution

By: Ken Briodagh    3/22/2018

Foghorn Lighting Edge Intelligence platform with Google Cloud IoT Core will maximizes the value of industrial data on IoT devices

Read More