Performance Assessment of Customized LSTM based Deep Learning Model for Predictive Maintenance of Transformer

co.in

The transformer is the most crucial and expensive component of a power system.Its failure can cause significant economic losses due to system outages and increased maintenance costs.The transformer's health and lifespan gradually deteriorate due to ineffective cooling and heavy equipment loading.As a result, regular inspections are carried out to maintain its health, and its components are continuously monitored for any anomalies or faults.A rising alternative to conventional techniques, such as breakdown corrective maintenance for transformers, is predictive maintenance.The DGA technique is used to monitor the transformer's insulation oil, an important component, for identifying faults.The data on the concentration of dissolved gases in defective transformer oil likewise exhibits the limitations of minuscule gas data samples and sparse information.This feature establishes that the dissolved gas content in transformer oil cannot be predicted using existing prediction techniques.Data-driven techniques outperform model-based predictive maintenance approaches because they aim to automatically generate predictive models from the data, making them applicable to a variety of such issues.Use of relevant techniques is thus required to increase the precision of predictions.Machine learning [1] and deep learning techniques are used to analyze the DGA data and predict maintenance needs for the transformer.Several machine learning techniques are considered viz., Decision Trees, Neural Networks and SVMs [2][3][4][5].These methods help with fault diagnosis accuracy to some amount, although they typically have some shortcomings.For instance, while decision trees can be a useful tool for diagnosing transformer faults, there are several difficulties that include model complexity, complex fault patterns and limited features to achieve greater diagnostic accuracy.In ANN [6][7][8], the selection of several parameters has a significant impact on the diagnostic accuracy that is greatly impacted by the model's inability to use the most relevant parameters.Overfitting is another issue with ANNs.The selection of the Kernel function and associated parameters, such as the cost parameter, slack variables, and the fault feature on the margin of the hyper plane, determines the diagnostic accuracy of SVM, and that is a difficult task.
Deep learning, also referred to as a deep neural network (DNN), has received a lot of attention recently in the field of machine learning.Recurrent neural networks (RNNs) are a subclass of ANNs that are distinguished by their internal loop structure.An LSTM network, in contrast to conventional RNNs, is wellsuited to learn from experience to categorize, analyze, and predict time series when there are extremely long-time lags between significant events that are unknown in size [9,10].This is one of the primary reasons why, in many cases, LSTM Performance Assessment of Customized LSTM outperforms rival RNNs & other sequence learning approaches.The LSTM network shows a higher classification error for small DGA datasets [28] compared to baseline machine learning methods.To address this issue, a customized LSTM model is created.The effectiveness of the C-LSTM model is evaluated using precision, recall, F1-score, validation accuracy, and test accuracy metrics, in comparison with Decision Trees, Support Vector Machines, and Artificial Neural Networks, which are commonly used for predictive maintenance [10][11][12][13][14][15]30] of transformers.
The remainder of this paper is arranged as follows: section 2 presents DGA dataset, section 3 describes machine learning techniques for predictive maintenance of transformer, section 4 explains deep learning approach for predictive maintenance of transformer, section 5 explains results and discussion and lastly, section 6 confers the conclusion.

░ 2. DGA SET
In order to testify the effectiveness of the proposed prediction model, the study uses a 500 kVA, 11000/430V transformer's online monitoring as illustration to analyze time series data [31].In this analysis, a preprocessed dataset of one thousand fault cases is used, which was obtained from a large and unique DGA dataset [16] of test transformer available at CITD, Balangar, Hyderabad.The fault codes 1 to 7 are given to fault scenarios such as low-energy discharges (D1), high-energy discharges (D2), thermal and electrical faults (DT), partial discharges (PD), and low, medium and high-thermal faults (T1, T2, and T3) discovered by Duval's triangle [17].Figure 1 presents the data preparation process, including the sample dataset, test dataset, and training dataset.Decision Trees, neural networks, and support vector machines are trained using 700 data points, whereas testing is done using 350 data points.

Decision Trees (DT)
The method known as decision tree [18] separates the provided observations of DGA into various branches, which can be observed in figure 1.Although it has a topology resembling a neural network, this method for handling non-linear problems is less complex and more rapid.The root node is positioned at the top of the decision tree edifice and includes the entire dataset.According to figure 2, which is built on the boundaries of the fault zones that are based on different percentage composition of three dissolved gases outlined in figure 3, the overall hierarchical structure is stated as a tree, and the segments are known as nodes.The leaves of a binary tree are the nodes that are at the very end.The technique is effective for predictive maintenance due to its capability of managing large amounts of data, simple structure, and minimal data preparation requirements.Overfitting problems are prevented by growing simpler trees and controlling the depth with the maximum number of split settings.When the branches finally reach the leaf node in transformer fault prediction, the data is categorized as a certain transformer fault type.The decision tree foresees the classifications built on three prognosticators X1, X2, and X3 i.e., concentration of methane (CH4), ethylene (C2H4) and acetylene (C2H2), in percentage.

Neural Networks (NN)
The use of Artificial Neural Network (ANN) [1,10,29] in predictive maintenance of transformers has gained significant recognition due to its robust fault tolerance and exceptional selflearning ability.In this approach, the dissolved gas analysis dataset is given as an input to the ANN, and the output is the transformer's identified fault type.There are two stages to the transformer prognostic maintenance using ANN.A prediction function for trained data is developed by NNs, which will be further used in classifying new data of either validation or testing.Cross-validation is carried out with K-fold value of 5. Validation predictions and accuracy are obtained.

Support Vector Machines (SVM)
Support Vector Machines (SVM) [4] is a type of supervised machine learning model that is able to learn how to draw decision lines in order to separate different groups.It is utilized to determine an optimal solution for splitting hyperplanes [19] for both linearly and non-linearly separable databases by maximizing the margin between the distinct data points.These hyperplanes are used as decision boundaries for classifying data.A function called a Kernel is utilized to transmute a lowdimensional space into a higher-dimensional space.By creating decision limits or hyperplanes, this supervised machine learning method perceives transformer flaws that in turn aid the classification process.SVM provides an ideal separating hyperplane solution by maximizing the margin between the separating data, for both linearly and non-linearly separable datasets.The DGA data set is used for training a SVM classifier with parameters such as Kernel function, polynomial order, Kernel scale and class names.Kernel Function is Linear for Linear SVM (LSVM) [20], Polynomial for Quadratic & Cubic SVMs (QSVM, CSVM) and Gaussian for Fine Gaussian, Medium Gaussian and Optimizable SVMs (FGSVM, MGSVM & OptSVM).Polynomial Order is 1 for LSVM, Gaussian SVMs & OptSVM, 2 for QSVM and 3 CSVM.Kernel Scale is Auto for LSVM, QSVM & CSVM, 0.43 for FGSVM, 1.7 for MGSVM, 6.9 for CGSVM and 0.32 for OptSVM.The Kernel scale is chosen around sqrt(P)/4, sqrt(P), and sqrt(P)*4 [26] for Fine, Medium, and Coarse Gaussian SVMs, respectively.P is the number of predictors in this case, which is 7 i.e., the number of fault categories.A prediction function for trained data is developed by SVMs, which will be further used in classifying new data of either validation or testing.Cross-validation is carried out with K-fold value of 5. Validation predictions and accuracy are obtained.

░ 4. DEEP LEARNING APPROACH FOR PREDITIVE MAINTENANCE OF TRANSFORMER 4.1 Long Short-Term Memory (LSTM)
The LSTM technique is a type of Recurrent Neural Network (RNN) that has been proposed to outperform other neural networks in processing time series data [21].LSTM is used to address the limitation of RNNs that have only short-term memory and require lengthy training periods.In contrast to RNNs, LSTMs deploy an auxiliary memory unit to determine significance of data with a more intricate stochastic model.Its standard architecture contains a "long-time memory function" that gives it the ability to handle long-term nonlinear sequential prediction challenges by including a gating cell.Additionally, LSTM deals with the problem of gradient vanishing and explosion [22] during extended training sequences.Unlike conventional RNNs, LSTM substitutes the neurons in the hidden layer with memory cells that have gating mechanisms.Value that is transferred to the output from the present state is adjusted by the output gate, and the current input is improved before being added to the next state by the input gate.The forget gate chooses the elements of the current state to be carried forward.Figure 5 illustrates LSTM memory cells' fundamental structure [23].

Figure 5: LSTM memory cells' fundamental structure
The Memory Cell is a crucial component of the LSTM archetypal, where cell input consists of   (sequenceinput), ℎ −1 (hidden-layer cell state) and  −1 (memory cell state) at times t, t-1 and t-1 respectively.At time , the LSTM model produces two outputs: the memory cell state, represented by   , which stores long-term information, and the hidden layer cell state, represented by ℎ  , which stores short-term information.The information transfer between memory cells can be achieved by modifying and reading from the memory cells using the three gates discussed earlier.The following expressions [24] are used to formulate this process.
In this context, the computational outcomes for the forget, input, and output gates are denoted as   ,   and   respectively.The gates that correspond to the offset term and weight matrix are designated as   ,   ,  0 and  ℎ ,  ℎ ,  ℎ .Also,  indicates the activation function of sigmoid.There are two output products of structure shown in figure 5, one is memory cell state   and other is hidden layer state ℎ  at time t that are given by following expressions [24].
ℎ  =   ℎ(  ) In this context, the memory cell's capacity at time t is denoted as   ̃ and the hyperbolic tangent's activation is represented by ℎ.The offset term is referred to as   and the weight matrix as   .The element-wise multiplication operation is represented by .The LSTM network outperforms established techniques in predicting the presence of gases dissolved in oil of transformer.Also, it is proficient in managing the challenging job of forecasting nonlinear patterns.

Predictive Maintenance of Transformer using Standard-LSTM for with Small DGA Dataset
Memory cell is the fundamental constituent of LSTM network and its design is portrayed in figure 5.In the case of a restricted dataset, the input layer has a size of 3, and there are 7 classes.The network comprises 3 hidden layers with premeditated 1 st layer unit size of 3, 2 nd layer unit size of 10, and 3 rd layer unit size of 7. With 16 as minimum batch size and 0.2 as dropout, the training process entails 700 epochs.
The LSTM network's algorithm is outlined as follows: Step 1: Choose the faulty DGA data for training and testing purposes.
Step 2: Configure the LSTM network to start the training process.
Step 3: Train the LSTM network using the faulty training data.
Step 4: Use the faulty-test data to assess the LSTM network's performance.
Step 5: Configure the training options for both the testing and training data.
Step 6: Use the LSTM network to predict the type of fault in the test data.
Step 7: Determine the accuracy of the validation results.
Step 8: Evaluate the effectiveness of the categorization network by using the confusion matrix [25].

Proposed Model-C-LSTM for Predictive Maintenance of Transformer with Small DGA Dataset
Proposed Model-C-LSTM refers to a type of deep learning methodology that is established to facilitate low-data erudition.Its edifice comprises sequence input, LSTM, dropout, fullyconnected and output layers as demonstrated in figure 6.The output layer comprises softmax and classification layers.Sequence input layer accepts sequential data with time.The normalization process with zero-centering is conducted before the data is fed to the subsequent LSTM layer.Every one of the 128 neurons in the layer of LSTM, the data is subjected to activation and regression.With a probability of 0.2, the dropout layer eliminates extraneous data by setting it to zero.The fully connected layer is responsible for regression and activation.The data is received by the softmax layer, which propagates it to the terminal classification layer that serves as the stage of prediction.

░ 5. RESULTS AND DISCUSSION
To assess the effectiveness of machine learning and deep learning techniques for transformer's prognostic maintenance, their effectiveness is assessed using a set of metrics including Validation Accuracy (VA), Test Accuracy (TA), Precision (PPV), Recall (TPR), and F1-score (harmonic mean, HM).
The MATLAB simulation software's R2021a version is used to run the simulations.The validation and testing results are evaluated using precision, recall, and F1-score, that can be calculated with the following expressions using the respective confusion matrices.

Fault Diagnosis using Decision Trees
The provided dataset in figure 1 is used to conduct fault analysis using various decision tree methods, including Fine tree (FT), Medium tree (MT), Coarse tree (CT), and Optimized tree (OpT).The contrast amongst various performance metrics of decision tree techniques is shown in figure 7, concerning validation and testing.These metrics are calculated by using the expressions given in equations 7 to 9. The results indicate that the FT achieved a test accuracy of 75.7%, and OpT achieved a test accuracy of 99.4% & validation accuracy of 99.1%.It is apparent that the OpT performs well, while the CT shows deprived performance compared to the other methods.

Figure 8: Fault diagnosis capability of Decision trees
The faulty cases for each fault are diagnosed using different tree methods and percentage of PPV is calculated from the respective confusion matrices, presented in figure 8. From the above analysis it can be inferred that the coarse tree method is not useful in detecting DT and PD faults.Furthermore, the results suggest that among all decision tree methods, the optimized tree method shows outstanding performance in categorizing transformer faults.

Neural Network-Based Fault Diagnosis
The provided dataset in figure 1 is utilized to assess the ability of different neural networks cited in section 3.2 in classifying transformer faults.The contrast amongst various performance metrics of each NN technique is shown in figure 9, concerning validation and testing.The faulty cases for each fault are diagnosed using different NN methods and percentage of PPV is calculated from the respective confusion matrices, presented in the figure 10.The performance of the Neural Networks in diagnosing transformer faults is assessed using above analysis, and it is determined that the Trilayered NN exhibits superior capability in this regard.

SVM-Based Fault Diagnosis
The provided dataset in figure 1 is utilized to evaluate the capability of several SVM techniques, cited in section 3.3, in classifying transformer faults.The contrast amongst various performance metrics of each SVM technique is shown in figure 11, concerning validation and testing.These metrics are calculated by using the expressions given in equations 7 to 9.           The results show that the C-LSTM model outperforms the other approaches when working with limited datasets.This indicates that the C-LSTM model is a better choice for predictive maintenance of transformers, especially when dealing with limited datasets.

░ 6. CONCLUSION
In this paper, the efficacy of various machine learning algorithms for transformer predictive maintenance is evaluated and tested.The proposed work uses DGA dataset as its input.Initially four different types of Decision tree algorithms are compared for their performance and it is noted that the Optimizable tree is comparatively better than other Decision tree techniques in classifying transformer faults with a validation accuracy of 99.1% and test accuracy of 99.4%.On analysis of several NN approaches, it is observed that the Trilayered NN performs well in accurate determination of transformer faults with attained test accuracy of 100% & validation accuracy of 98.1%.In case of SVM classifiers, the Fine Gaussian SVM has attained test accuracy of 99.7% & validation accuracy of 99.3%.According to many researchers, the standard-LSTM model shows a higher classification error for small DGA datasets compared to the baseline machine learning techniques that are considered.This is also observed in the limited DGA dataset used in this study.A customized LSTM model is created to address this limitation.The devised C-LSTM model has shown greater validation and test accuracies of 100% and 98.57%, respectively, for predictive maintenance of transformers as compared to baseline machine learning techniques.

░ Acknowledgment
I want to thank Dr. G Sanath Kumar, Deputy Director of CITD, Balanagar, Hyderabad, for allowing me to use the most up-todate computing resources at CITD's MSME-Tool Room.I would like to take this opportunity to express my gratitude to Dr. MC Ajay Kumar and Dr. P Vinay for their unwavering support, inspiration, and assistance in completing this job.

░ Conflict of Interest
Regarding the publishing of this paper, the authors affirm that there are no conflicts of interest.

Figure 2 :
Figure 2: Structure of Decision tree based on fault zone limits

Figure 3 :
Figure 3: Representation of Fault Zones Boundary The DGA data set is used for training a Decision Tree classifier with parameters such as split criterion, number of splits and class names.The number of splits considered for Fine Tree, Coarse Tree, Medium Tree and Optimizable Tree are 100, 4, 20 and 75 respectively.A prediction function for trained data is developed by Decision Trees, which will be further used in classifying new data of either validation or testing.Cross-

Figure 4 :
Figure 4: Structure of different NN types During the learning phase, the testing data along with the DGA dataset is served as input.The Rectified Linear activation function (ReLU) is utilized in this work and the training process consists of 1000 iterations.The analysis results of the main module are conveyed to hidden layer, and activation function assists for propagating analysis outcomes to the output layer nodes during working phase.The edifice of several categories of NN is depicted in figure 4. The DGA data set is used for training a NN classifier with parameters such as layer sizes, activation function, iteration limit and class names.The training iteration number is 1000.The Narrow-neural network (NNN) has a 1 st layer size of 8 & only one fully-connected layer.The Medium-neural network (MNN) has a 1 st layer size of 22 & also only one fully-connected layer.The Wide-neural network (WNN) has a 1 st layer size of 100 & only one fully-connected layer.The Bilayered neural network (BLNN) has two fully-connected layers with a 1 st layer size of 10 and a second-layer size of 10.The Trilayered neural network (TLNN) has three fully-connected layers with a 1 st layer size of 14 and 2 nd and 3 rd layer sizes of 10.

Figure 7 :
Figure 7: Contrast among performance metrics of Decision trees

Figure 9 :
Figure 9: Contrast among performance metrics for NN techniques

Figure 10 :
Figure 10: Neural Networks fault diagnosis The results indicate that the Narrow NN achieved a test accuracy of 99.7% & validation accuracy of 97.7%, the Medium NN achieved a test accuracy of 99.4% & validation accuracy of 98.4%, the Wide NN achieved a test accuracy of 99.7% & validation accuracy of 98.7%, the Bilayered NN achieved a test

Figure 12 :
Figure 12: SVMs fault diagnosis The results indicate that the Linear SVM achieved a test accuracy of 94% & validation accuracy of 94.3%, the Quadratic SVM achieved a test accuracy of 99.4% & validation accuracy of 97.4%, the Cubic SVM achieved a test accuracy of 99.7% & validation accuracy of 98.6%, the Fine Gaussian SVM achieved a test accuracy of 99.7% & validation accuracy of 99.3%, the Medium Gaussian SVM achieved a test accuracy of 98.3% & validation accuracy of 97.1%, the Coarse Gaussian SVM achieved a test accuracy of 87.7% & validation accuracy of

5. 4 Fault
Diagnosis using Long Short-Term Memory (LSTM)The deep learning techniques used in the proposed predictive maintenance model include standard-LSTM and C-LSTM.The study compares the performance of both the techniques for classifying transformer faults using small DGA dataset.The performance metrics concerning validation & testing are presented in their respective LSTMs as shown in figure15, and the validation accuracy and test accuracy are calculated from these results.5.4.1 Standard-LSTMFigure13displays the validation accuracy and loss curves of the conventional LSTM for a small DGA dataset.It is noted that the used training and test data sets yielded validation accuracy of 64.29% and test accuracy of 67.14%.

Figure 13 :
Figure 13: Standard-LSTM's validation accuracy and Loss curves for a small DGA dataset 5.4.2Customized LSTM Figures 14 a & b displays C-LSTM's validation accuracy and loss curves for a small DGA dataset.It is noted that the used

Figure 14 a
Figure 14 a: C-LSTM's Validation accuracy curve for a small DGA dataset

Figure 14 b
Figure 14 b: C-LSTM's Loss curve for a small DGA dataset

Figure 16 :
Figure 16: Assessment of performance metrics of standard-LSTM and C-LSTM Figure 17 also shows a contrast between C-LSTM and other methods, such as Decision Trees, NNs, SVMs, and the traditional LSTM based on validation and test accuracies.

Figure 19 :
Figure 19: Comparison between C-LSTM and top-performing DT, NN & SVM for testing