Predictive maintenance is an important maintenance tool that is based on the possibility to estimate future values of certain quantities characterizing a system (typically a machinery, a plant, or a manufacturing process) by means of particular mathematical models in order to identify in advance the anomalies and potential failures.

The standard framework of predictive maintenance is as follows:

- measurement of physical quantities in real time
- estimation of measurable parameters (or not measurable) at time t + dt
- identification of the system status that we considered anomaly or failure
- planning the preventive and corrective activities BEFORE the system reaches the critical condition.

Predictive maintenance examples are the following:

- vibrations of a machine component can signal degradation or deformation of the bearings of special mechanical parts.
- temperature of electric motor, and its current consumption, may indicate that the friction and possible mechanical jamming are degrading the functional characteristics of the rotating parts.
- measurement of particles in a lubricating indicate the degradation of parts in contact that they rub. With appropriate sensors can measure the lubricating oil composition and check the health of the machine;

About the measurement of physical quantities, I have already written in previous posts. Today I want to focus on **parameter estimation**.

At the base of predictive maintenance, there is, no doubt, the technology to make reliable predictions (forecast). If forecasting algorithms produce incorrect estimates or too large confidence intervals, it will be difficult to identify anomalies and take the proper care and correction decisions.

In broad terms, the forecast can be of two main types:

- Cross-Sectional Forecasting
- Time Series Forecasting

The estimation of parameters that you do not have measurements, using measurements of quantities that have been observed.

For example, measuring the life of electronic components according to the electrical current flowing through them and which are made in the laboratory tests, you want to predict the life of an electronic component used in particular conditions.

The estimation of quantities that change over time and which have been measured up to time t and you want to predict the value at time t + dt. Usually you can obtain measurements at regular intervals of that signal and try to predict future values.

The simplest example is the estimation of minutes of charge remaining in our cellulars, which is estimated according to consumption that we have done, and the way in which we use the phone.

In the time series it is usual to identify:

- trend, or an increase (or decrease) in long-term values.
- seasonal phenomena, namely the phenomena that determine changes of the values in a period of time that is always repeated in the same duration.
- cyclical phenomena that result in increases and decreases in values with fluctuations that do not always have the same length, i.e. they are not periodic.

One of the most important things that you have to understand when analyzing the data is the type of relationship between the measured quantities. To do this, the graphic display with scatter plots is very important because it helps to understand how the data are related among them (correlated), if they are. The following figure is an example of a scatter plot:

In particular, I would like to explore some aspects of the simplest and most used of estimation methods: **Linear Regression**. The basic hypothesis of the linear regression is that the phenomenon to be estimated has a linear behaviour. With the **Least Squares Method** we calculate the coefficients m and q of the line that we will use to estimate future values of the quantity we are going to forecast.

When we implement this very common algorithm, we have to keep in mind a very important question: *based on the values that we measured in the past, how do we know that the signal is truly linear?* A simple way to check that the signal we want to predict is linear or contains more information not included in our forecasting model is to analyse the residuals.

The residuals are the differences between the measured values of the quantity and the fitting values obtained with the straight line prediction and are calculated in this way:

**e _{i} =y_{i} - y’_{i}**

Where e_{i} is the residual of ith measure, y_{i} is ith measured value and y’_{i} = m’ t_{i} +q’ is the estimated value at time t_{i}.

The chart above shows 50 samples of a linear source signal, with m = 0.5 and q = 7.5, I added to source signal a white noise with variance = 1. With the least squares method, I obtain m '= 0.49 and q '= 7.4, very close to the source signal values. The residue diagram is shown in the following graph:

A practical method to test if the estimated line represents correctly the signal is **analyzing the Autocorrelation of Residuals**. First of all, we calculate the autocorrelation of residuals obtained at previous step. The result of the autocorrelation is a diagram shown in the following figure:

Now we count the residues that have a value in the range:

with N = number of measurements, in the case that I have proposed, N = 50 and threshold is:

In the previous example, 98% of the residues are in the range ±0,28 then we can be quite sure that the residuals are not related to each other and the noise on the signal can be considered **White Noise**. I suggest to periodically calculate the autocorrelation of residuals for the signals that were correctly estimated with a straight line. It may happen that the measured phenomena change over time, for example because of malfunction of some part of the system under observation, and such malfunctions occur with additives signals on the measured quantity.

The following example illustrates how residuals analysis allows us to detect the change of the source signal. Suppose that, at a certain point, there is a change in our system and the measured signal is added a sinusoidal signal with amplitude 0.5, caused by a malfunction of some part of system under our observation. The measures we would get are shown in the following diagrams:

These are collected measures. I challenge you to recognize the presence of the sine wave buried in white noise! The residuals of the diagram are showed in the following chart:

Using the Least Squares Method to the new set of measures, I calculated m '= 0.53 and q' = 7.46, values that do not differ greatly from the initial case. Again we calculate the autocorrelation of residuals, reported in the following diagram, and we find that only 87% of the residues is within the range ±0,28.

Commonly, we consider the 95% threshold to classifies a noise as White, so a program can detect automatically that something has changed in the measured signal and activates an intervention of some kind by a technician for further verification.

In the case of Linear Regression using Least Squares Method, it is important to define an analysis of the fairness of the system that allows us to understand if there are alterations of the system under observation. It is not sufficient to check if m' and q' estimates change over time because the case discussed above has showed that with a sine wave signal added to the measured signal, the LSM determines very similar coefficients to those calculated when the sinusoidal noise was not present. We used the autocorrelation of residuals to check how many of them were included in the threshold

And we found that 98% of the value obtained in the case of White Noise, we switched to the value 87% is presence of a sine wave. When the number of residuals included inside the calculated range is less than 95%, it means that the noise is not white, and probably our model does not represent all the information contained in the measured signal. This approach allows to build algorithms that use the linear regression for prediction of the signals, useful for predictive maintenance and capable to detect automatically and in real time any changes in the system under observation. It is also useful to use the residuals analysis to figure out if linear regression is the most suitable choice for the prediction model we want to build.