What is Data Aggregation?
Today we experience an abundance in data. The solutions we build on the Ignition platform contribute by polling and collecting vast amounts of data from numerous sources. But that does not necessarily make us any smarter! The difference between data and real knowledge: The story. Pure data is all noise and challenging for human beings to understand without some supporting narrative.
This supporting narrative is precisely what data aggregation brings to the table. Aggregation is a storyteller of the data; aggregation creates a compelling story. When any data set grows enormously, it has multi-dimensional facts hidden underneath the pile of data. Aggregation presents these facts of the data in a clear and understandable form.
Thus, aggregation is the mechanism to convert data into information understandable for humans, and it provides the foundation for any decision support system. In technical terms, data aggregation is a process in which data is gathered, searched and summarised into human-readable reports.
The below example illustrates a solution we created for a client. In the case, raw data aggregates into meaningful time intervals, and the illustration shows the hierarchy in the data is collected by Ignition from several sources, stored in the Ignition Tag Historian and prepared for aggregation. The raw data is collected with a resolution of a few seconds from a vast number of tags. The aggregation process turns the enormous amount of raw data into more accessible aggregations: 1 minute, 5 minutes, 1 hour and one day.
The Benefits of Data Aggregation
Less data to query – aggregation helps you to speed up your queries
The mantra these days is: “as much data as possible,” which often means you sample or log data with a very high resolution. The general idea and understanding are that storing data is not costly, and therefore “the more, the better.” But one thing is storing data in a database, retrieving data can be a completely different thing.
For example, you sample and store data from a pump station into a historical database. For the sake of the case, let’s say from 10 different data points or tags, and with a 5-second resolution. Without data aggregation, your queries will be directly on the raw data. That is not a problem if you ask for “all data for these 10 tags for the past 24 hours.” But if you ask for “all data for these 10 tags for the past year,” your query takes considerably longer.
Especially if the data storage has a well-defined partitioning policy. Partitioning keeps the data table size manageable but tracking of required information from a huge set of partitions will be a complex problem. Here aggregation of data on predefined pattern/interval will be a life saver.
Trends – aggregation helps you to learn from your data
Aggregation makes it easier to identify patterns and trends in data, which are not immediately visible. The aggregation process reduces random noise in the data. Random noise can be many things, such as logging mistakes or clear outliers. When looking directly at the raw data, such random noise will make it difficult to spot the more important underlying trends. Aggregated trends help to study the ecosystem of the data, variation and deviation.
A good example is a pump station:
You want to study the pump performance, and you have data for Amps (the current) for the pump. But the starting current – the Amps needed to get the pump running – is several times the operating current. Thus, every time the pump starts, a spike in Amps is registered. Depending on the Start/Stop pattern of your pump, these spikes will make it challenging to analyse real underlying trends in the data without some form of aggregation.
KPI – aggregation helps you to work much smarter
KPIs (Key Performance Indicator) helps you to measure performance. KPIs are used as quantifiable measurements to indicate long-term developments in an organisation, project or any kind of technical system. They can be used to set measurable targets and measure the progress towards the same. You can have high-level KPIs, “performance of our organisation,” and more low-level KPIs like “performance of the pump in this particular exchanger station.”
Aggregation is the tool to shape and study KPIs, and thus act as an accurate and objective storyteller. A dataset can have multiple dimensions, and aggregation helps to identify these dimensions and later define KPIs of the system based on the selected and refined aspects.
Forecasting – aggregation helps you “see into the future”
Of course not literally, at least not yet. Arthur Schopenhauer, the German philosopher, once said that life has to be lived forwards, but can best be understood backwards. Data aggregation on historical data can assist you with that. Data aggregation provides valuable information about incidents and helps to track any odd events details. And the process helps to study the relationship of any data source like a tag with its ecosystem.
To paraphrase Arthur Schopenhauer, by looking at historical data with the help of data aggregation, you can make forecasts and predictions of future performances. The Ignition platform has several built-in tools for this task, which can help you to open the magical window of data forecasting.
The 3 Most Important Pitfalls to Avoid
1. You start the aggregation process with insufficient details
While working with vast amounts of data, it is difficult to gather all possible information about the data. But try to grasp the main ingredients like the naming convention, major contributor of data, data sources and data flows. If any of these significant ingredients change, then you have to redesign the solution from ground zero.
2. You work with too broad a base
The data aggregation process collects data from various resources. If you start with all data sources simultaneously, variations in the data and possible dimensions will be overwhelming. Thus, designing the right solution for all will be cumbersome. It is a better idea to divide data from the comprehensive data source into different categories. A proper approach is to start with a sample data to create aggregation and test the results, and then expand it to the full data set.
We recently worked on a project that illustrates this problem. In the aggregation process, we started with all tags linked to the historian. The main issues were the massive number of tags, and all with a wide range of variation in the tag path and other associated properties. We decided to design the solution for all these tags, and the answer was composed of multiple layers. It was a nerve testing task to combine all the possible variations in each stage.
A better approach, in this case, would be to start with limited tag sources. Furthermore, develop the aggregation process in stages. One stage completed and tested must fuel the next step. By completing one stage before moving on, it is easier to identify the cause of the error and will improve the quality of data generated.
3. Your aggregation objective is unclear (or at least ambiguous)
The primary objectives of the aggregation process must be defined and followed. Ambiguity in intentions or deviation in plans will delay the end product. Aggregation is a vast domain with unlimited potential, and it is easy to get lost in the process. In many cases, it presents very tempting alternative options.
A wiser approach is to stick with the plan and keep the alternative options in the wish list and use them after the significant objective is achieved.
In a recent project we faced this challenge. The aggregation solution was very challenging and complex to design and implement. During the implementation, we met a challenge to change some parts of the overall solution. The changes requested were relevant, and thus, it was so tempting that we started this task before finalising the testing of the previous job. The result was a division of attention and efforts. In the end, we accomplished the task and resolved the challenge, but the diversion in plans delayed the net product.
Our Best Piece of Advice on Data Aggregation
Stay focused on your objective and do the development in stages!
What are you trying to achieve? Remember the definition of data aggregation: The storyteller of data. Data aggregation provides a compelling narrative for humans to make sense of an abundance of data. Raw data is nothing but random noise, and aggregation helps you make sense of the noise.
Be very clear in your narrative. Who are the users of the aggregation, and what are they supposed to do with it? What kind of story or narrative do you want to put in front of them?
The first and by far the most essential point for any aggregation project: Stay focused on the objective, and do not get lured by the abundant options available to you.
The aggregation process has many options for studying data like average, min, max, count, and you can apply it to any dimension of data. Before you know it, you have painted yourself into a corner of a far too complex model. Before doing anything else, you must decide on the aggregation pattern and make a list of dimensions you want to include in the analysis.
The second part of this advice comes naturally when working according to agile principles: Make your development in stages. What is the simplest aggregation you can create to see if your idea makes sense to the users? Think about it as a prototype, which you can test and discuss with the future users.
Any aggregation solution must start with modelling the data source. When you have a good working model, it will be easier to expand it horizontally as well as vertically. Horizontally we can expand to include more data sources and include more complex data variation. Vertically we can add multiple stages to achieve the desired results. A right approach is to complete one level of aggregation, study the effects of this stage before diving into the next step.
– The Enuda Team