Industrial processes are continuously generating data. Careful analysis of this data can lead to greater production, reduced downtime, lower maintenance costs, and reduced energy usage, as well as other desirable outcomes. However, to truly be valuable, data must be contextualized before it can be analyzed.
This means that to fully understand the data and to make use of it, we need not just the values, but also metadata like units, and the relationships between various pieces of data. We use the term "context" to describe the totality of this information that surrounds any single piece of data. Without this context, useful, reliable, and scalable data analysis is simply not possible. Context is key to making the data generated by your industrial systems useful.
We'll start by discussing the different types of context, then look at the consequences of a lack of context, and finally consider how to apply and preserve context from the data source and otherwise properly contextualize data.
Types of Context
We can break down context into the levels listed below, which are ordered from the simplest to the most complicated.
Descriptive
This type of context describes the data and can be captured in the tag name. For example, a tag with the descriptive name "16-05-064-13W4.100.PT-101" makes it pretty clear where the data is coming from, as long as the name matches our tag naming convention. We can know from our naming rules that it's a pressure value.
Descriptive context also includes the data type. For example, if we know that this value is a floating point value and we can also be sure that the value is continuous.
Metadata
Metadata typically includes units and quality, as well as other descriptive things like location.
Hierarchy
Hierarchy describes how a piece of data is related to another piece of data. For example, a hierarchical model can make it clear whether two pieces of data are on the same well, or on the same well pad, or if they are completely unrelated. Tag names can imply hierarchy, but because tag names are limited in length, they cannot express the depth of a corporate hierarchy. Note that hierarchy is not fixed and can often be viewer dependent. Different types of users will use the same data in different ways and will want data to be organized in a way that meets their specific needs.
Behavior
This is the most complex type of context and describes the behavior of our system. For example, when trying to optimize a process, we need to understand the operating state of the process before analyzing the data.
Lack of Context
When data has missing context or is contextualized improperly, the following issues arise:
Analyzing the Wrong Data
When the descriptive data is wrong, then the wrong data is analyzed. This isn't a theoretical concern. Common industrial protocols like Modbus have no concept of a tag or a tag name, so a tag name is applied to a value when the value is retrieved from the controller. If a program is updated, and the value in a Modbus register refers to a different value than the value that was originally expected, it is definitely possible for the data at the analytics level to be incorrect.
An incorrect tag name, or a mismatch between the tag name and the underlying value, is a critical error that makes the data worthless and analysis impossible.
Missing or Incorrect Metadata
Much like with descriptive data, most commonly used industrial protocols like Modbus have no capacity to attach units or quality to a value. This means that the system must be explicitly programmed to make this data available, or metadata must be attached to values at some other level in the system.
This means that units can be wrong and quality data may not exist, which means that the data cannot be used for any useful analysis.
Behavior
A specific example of behavioral context is knowing when a system is in a manual override state instead of a fully automatic state. Any analysis of this system should understand the operating state of the system so that data that is generated during specific operating states can be excluded. Without this behavioral context, it's possible that analysis can lead to incorrect conclusions.
Applying Context
We know what context is and why it's valuable. How can we best contextualize data without errors over the lifecycle of the system? Doing this correctly requires specific knowledge of the system, but here are some general guidelines to follow.
Minimize the number of systems that data passes through, especially if context is destroyed and must be recreated at each stage in the system. As much as possible, push data directly from the place where it's generated to analytical systems.
Prefer protocols that support sending context with data. For example, MQTT transport with Sparkplug B payload allows sending data values with units and quality. Tag names are preserved when data is sent. Compare this to polling data via Modbus, which only provides a raw value. A tag name, units, and quality must be added later, which can lead to errors.
Standards are essential when trying to apply behavioral context. Using standardized programs across your system means that behavior is identical at all of your sites, which means that structured data analysis is possible without accounting for a bunch of corner cases and exceptions. Note that this doesn't mean that each site needs to be identical. Instead of modifying your automation programs directly for the specifics of each site, deploy configurable programs with known behavior. This allows for site-specific automation, but for standardized data at the analytics level. Scalable systems require strict adherence to standards.
Conclusion
Moving data from industrial systems to analytical systems is a solved problem. However, applying context to that data, and making the data useful in modern analytics systems requires additional work. The first step in the process is understanding what type of context that each piece of data needs. Some of this is obvious, like units, whereas some are less obvious, like behavioral context. Applying context correctly and without errors over the lifecycle of the system is just as much work as collecting the data in the first place, but properly applied context unlocks the true value of data.