Big Data is a term recently coined to indicate all technologies, techniques and analysis methods dedicated to data collections where the amount of values received are very large and numerous such that traditional processes are not able to process them effectively and efficiently.
Very often the data collected automatically are not heterogeneous because they come from different sources and in some cases they are also unstructured, i.e. they come both from integrated electronic sensors and from user interactions with software systems such as, for example, typed and entered data or the pressure of a button. The goal is to get information from this large amount of generic data so that it is useful for decision making.
The size of the information can grow to become classifiable as Big Data due to volume, variety, or velocity. These three characteristics are called the three Vs:
Each of these three elements leads the flow of data to grow in speed and variety, so we need to understand how to store them, which ones are important to monitor our system and which ones are important to plan the future strategies of the company. It is important to remember that only technologies and methods that are designed for Big Data can successfully gain knowledge in case all three V's are very high. The adoption of techniques dedicated to Big Data in the production process in the manufacturing field is exploding in recent years after being initially adopted in services and the tertiary sector and is expanding in all sectors that can benefit greatly from it thanks to the:
However, an ill-advised or even ill-considered approach to data acquisition hides a dangerous pitfall: limiting oneself to collection alone, however, does not guarantee having all the necessary information and above all extracting knowledge from it. The real value lies in the knowledge that this data can produce. In many cases, doubling the amount of data collected can lead to quadrupling the effort required to obtain information useful for decision making, and the analysis of a monstrous amount of data can take too long to produce useful results. That's why it's important to pay close attention and find the right balance between the data collected and the data processed to identify the information that is really needed..
Getting to this point in the article, let's look at the most common path to making data our allies for making thoughtful decisions.
1. Obviously, the starting point is the acquisition of data from all available sources, possibly in an automatic and continuous way, such as PLCs, industrial sensors, work centers and operator panels.
2. Then we move on to software that deals with extraction, i.e. all the operations necessary to have the values in electronic format and that can be managed automatically by software systems. These solutions apply procedures to clean all plant data: during this phase, the solution must identify and manage incorrect or out-of-scale values coming from sensors, as well as it is essential to prevent cases of missing data, for example because sensors have been disconnected.
3. Once you have skimmed through the data that you need to save, you come to the central part of the process, which is the storage, archiving, and long-term preservation of the numbers and data that have been measured. There are many different software programs for saving data, including relational databases, which store data in table form, and time series databases, which focus on the precise moment a value was measured. The choice must be made primarily on the basis of the needs of the next steps, on the characteristics and strengths that each different database can offer, also taking advantage of the advice of experts in the field. Care must be taken to ensure that all data is stored correctly, for a sufficiently long time in relation to the analysis to be carried out, and that the storage format is chosen in the light of the processing to be carried out in the next phase.
4. The fourth step, in fact, requires a model, i.e., an abstract representation of the information, which is consistent with the process being analyzed. The purpose is to allow the representation of all information so that it can be automatically processed by other software according to the intended purpose. Given the amount of information, models are often statistical, based on mathematical analysis using equations to be able to adequately process and analyze all available data. Using a mathematical formula, it becomes possible to recognize data that is anomalous and make predictions about future trends.
For example, in the case of an industrial furnace, the temperature curve during the heating phase is a known function that can be derived from the data, even in the presence of noise or small measurement errors. When the complete equation that corresponds to a particular furnace is known, it is possible to use the new data that is collected to verify that the furnace is still heating efficiently and is not slowing down due to a damaged burner or inadequate refractory lining. You can also estimate with good accuracy the time required to reach a particular temperature and predict the time of completion.
5. Only at the end of the fifth step can you finally see the information required to make decisions and intervene accordingly on the analyzed process. During all these steps, the use of technologies dedicated to big data is essential to properly support the cycle and the passage of data at the points we have described. Using dedicated software, this process becomes automatic and can be very fast: while the analysis of oven temperature curves performed manually by an employee can take days, using new technologies you can get the first results already after minutes (of course if the whole data cycle has been designed and implemented correctly). In addition, during many phases it's good to integrate other related technologies that can aid the process and bring maximum benefit, such as machine learning techniques that enable automated detection of mutually influencing and related data.
To avoid being overwhelmed by the sheer volume of data, it is important to make choices that allow the data to be domesticated. The first and most important element to establish is the selection of what data and values should be collected. Anything that is not useful for the purposes of our analysis leads us to clog the system unnecessarily, with the risk of slowing down modeling unnecessarily. In many cases, this decision can be made before performing the analysis, solely on the basis of the business objectives you want to achieve. In other contexts instead, the selection of the signals can be made only after having analyzed with the model the collected data, going to discard those that are not relevant. When the data to be collected have been chosen, it is equally important that for each of these, a frequency with which sensor data are measured is selected that is adequate for the purpose. The internal temperature of a furnace may be measured as little as once every ten seconds, while the rotational speed of a fan may need to be read every second or less. Of course, we could also select a very high sampling rate and then adopt a smarter strategy of keeping and storing only the data that has had a significant variation from the normal value. Thus, one can measure temperature even once per second, but if the variation is less than a tenth of a degree, do not consider it significant and do not store this information that does not provide any additional and new value compared to the previous measurement. Collecting the data is the central point around which the concept of Big Data revolves. The best collection in the world, however, should not distract us from the fact that it is merely a means. The end is different and could be varied. More generally, we can say that Big Data meets four needs such as:
1. Faster decision making - daily data collection allows the entrepreneur to check production processes having a new perspective and having a better clarity on how they are working. Real-time plant monitoring allows for timely action where needed and the more advanced the analysis the greater the competitive advantage.
2. Increase efficiency - as mentioned above, to make Big Data useful you have to use it in the right way. They must be analyzed correctly to simplify processes, reduce costs and optimize production.
3. Increase flexibility - thanks to the information obtained from the analysis of Big Data, an SME can quickly modify its strategies to adapt to market, production and quality needs, thus becoming more competitive.
4. Safeguard the plant - thanks to the amount of information coming from machinery, predictive maintenance can be used to decrease downtime and keep its fleet healthy, extending its life.
As we've seen, the Big Data in the possession of an entrepreneur are many and always growing, and it's not possible to manage them alone. In order to use them in the best way, you have to choose and use the best tool for your business needs and rely on a partner who knows how to exploit this mine of information, supporting the entrepreneur and his team in this path towards the future of the industry.