VMD and CNN-Based Classiﬁcation Model for Infrasound Signal

Infrasound signal classiﬁcation is vital in geological hazard monitoring systems. The traditional classiﬁcation approach extracts the features and classiﬁes the infrasound events. However, due to the manual feature extraction, its classiﬁcation performance is not satisfactory. To deal with this problem, this paper presents a classiﬁcation model based on variational mode decomposition (VMD) and convolutional neural network (CNN). Firstly, the infrasound signal is processed by VMD to eliminate the noise. Then fast Fourier transform (FFT) is applied to convert the reconstructed signal into a frequency domain image. Finally, a CNN model is established to automatically extract the features and classify the infrasound signals. The experimental results show that the classiﬁcation accuracy of the proposed classiﬁcation model is higher than the other model by nearly 5%. Therefore, the proposed approach has excellent robustness under noisy environments and huge potential in geophysical monitoring.


Introduction
Infrasound (≤20 Hz) is generated by natural disasters and human activities, including earthquakes, tsunamis, mudslides, tornados, volcano eruptions, nuclear explosions, missile launching, and ship navigation. Infrasound propagates through the atmosphere thousands of kilometers. The attenuation in the propagation process is small and its loss is less than a few thousandths (Gi, Brown, 2017; De Angelis et al., 2019). Consequently, infrasound can serve as monitoring properties for geological hazards.
Infrasound has been widely applied in recent years in geological hazard monitoring. Many scholars began to pay attention to this topic. Leng et al. (2017) presented a debris-flow monitoring approach, which relied on the characteristics of infrasound signal.  To address it, a novel method based on variational mode decomposition (VMD) and convolutional neural network (CNN) is proposed for infrasound signal classification. Firstly, the infrasound signal is processed by VMD to eliminate the noise, and fast Fourier transform (FFT) is employed to convert the reconstructed signal into a frequency domain image. Then the obtained frequency domain image is used as the input of the CNN. Finally, a CNN model is established to automatically extract the features and classify the infrasound signals.
The rest of this paper is organized as follows. In Sec. 2, the basic theory of VMD and CNN used in this paper is shortly described. Section 3 compares the performance of the described methods in an experiment. Further, the experiment results are shown through the analysis of different methods in Sec. 4. Finally, conclusions are drawn in Sec. 5.

VMD
VMD decomposes an input signal f into a group of discrete modes u k , and each mode is compressed to obtain a central frequency w k (Dragomiretskiy, Zosso, 2014). The constrained variational model is shown in Eq. (1): where {u k } = {u 1 , ⋯, u k } are the k mode components obtained by decomposition and {w k } = {w 1 , ⋯, w k } are the center frequencies of each mode; δ(t) is the Dirac delta function. The augmented Lagrange function is introduced by Eq. (2), and the solution of Eq. (1) is obtained by the alternating direction method of multipliers: where secondary penalty item α ensures the signal reconstruction accuracy under the Gaussian noise, the Lagrange multiplier λ is the constraint value, and ⟨ ⟩ is the inner product calculation.
The VMD algorithm process is as follows: Step 1. Set the number of decomposition modes. Initialize frequency domain û 1 k , center frequency w 1 k , and the Lagrange multipliersλ 1 . Modal u k and center frequency w k are calculated by Eq. (3) and Eq. (4). Initialize,λ 1 , n ← 0.
Step 2. Set n ← n + 1, k ← k + 1 and execute the whole cycle. Updateû k and w k for all w ≥ to reach the preset decomposition number. When k = K, the cycle ends. The updated formula of the narrow-band component and the corresponding center frequency is: Step 3. Update λ according to the formula: Step 4. Return to step 2 and repeat the above process until the whole iterative process meets the constraints, and a series of narrow-band eigenmode component signals are obtained. Equation (6) is the constraint condition, where ε is set to 10-6:

CNN
CNN is an important part of deep neural networks (Lawrence et al., 1997). It is composed of a multilevel structure that can be trained. Due to its strong feature extraction ability, CNN has been widely used in the field of signal processing. Each level of CNN generally consists of a convolution layer, pooling layer, fully connected layer, and softmax layer. The feature extraction is obtained by multiple alternating operations. Finally, through the fully connected layer and classifier, the infrasound signal classification is realized. The structure of CNN is shown in Fig. 1.

Convolution layer
The convolution layer uses convolution kernels to perform convolution on input data and obtains the feature maps. Each convolution kernel outputs a feature map, which is conformed to a class of the extracted features. The mathematical expression of the convolution is as follows: where l is the l-th convolution layer, x l i is the l-th output, x l−1 i is the l-th input, k l ij is the weight matrix, b l j is the bias term, M j is the j-th convolutional region of the l − 1-th feature map, and f (⋅) is the activation function. In the CNN model, the activation function usually uses the ReLU function. The activation function is represented as:

Pooling layer
When the convolution layer finishes the convolution, the pooling layer performs downsampling on the input eigenvectors through the pooling kernels. It can reduce the dimension of the data and further highlight the extracted features. Generally, the pooling operations are divided into two types: max pooling and average pooling. The pooling is expressed as: where x i is the input, down (⋅) is the pooling function, β is the multiplicative bias, b is the additive bias, and f (⋅) is the activation function.

Fully connected layer and softmax layer
The fully connected layer and softmax layer are applied in the classification stage of CNN. It can connect the feature maps obtained after a series of convolution and pooling operations into the one-dimensional feature vector. The classification results are gained by the softmax layer. The mathematic model of the fully connected layer and softmax layer can be described as: where x k−1 is the input of the fully connected layer, y k is the output of the fully connected layer, w k is the weight coefficient, b k is the additive bias, and k is the k-th network layer.

The proposed approach
Infrasound signals collected from the International Monitoring System (IMS) usually exhibit non-linear and non-stationary characteristics. It makes feature extraction difficult and the classification performance unsatisfactory (Mayer et al., 2020). This paper proposes an intelligent infrasound signal classification method based on VMD and CNN. The flowchart of the proposed method is shown in Fig. 2. The general procedures are summarized as follows: Step 1. Collect the infrasound signal with sensors.
Step 2. The collected infrasound signal data is converted into U modes using VMD to eliminate the noise.
Step 3. FFT is applied to convert the reconstructed signal into a frequency domain image.
Step 4. The preprocessing data is separated into the training and testing samples. The proposed approach is used to extract deep features from the training samples based on CNN. The trained model is deployed for the infrasound signal classification.
Step 5. The t-distributed stochastic neighbor embedding (t-SNE) is employed to visualize features in softmax layers (Van Der Maaten, Hinton, 2008). Step 6. The classification results are presented to compare the performance of different methods.

Data set
The data used in this study comes from IMS with the help of the Comprehensive Nuclear-Test-Ban Treaty Beijing National Data Center. Three categories of infrasound events are classified in this study. The data are collected from six different infrasound sensor arrays with different locations around the world. This study uses 611 sets of data. Table 1 shows the details of infrasound data collected from different areas. The three categories of infrasound events are earthquake, tsunami, and volcano. The sampling frequency of all 611 infrasound signal recordings is 20 Hz. The map of the infrasound stations is showed in Fig. 3.

Experiments setup
According to the description of CNN in Subsec. 2.2, the main parameters of the CNN are summarized in Table 2. The simulation verification is devoted to applying the infrasound signal data mentioned above to evaluate the feature learning performance of the proposed CNN model. Every infrasound signal contains 10 400 data points. The data sets are divided into training samples and testing samples. The size of the input

Data preprocessing
The VMD is employed to decompose the infrasound signal. When the mode number U is different, the center frequency is different. The relationship between U and the center frequency is depicted in Fig. 4. a) The number of U  In the earthquake, tsunami, and volcano decomposition results, when the value of U starts from 6, the center frequency is close. This is an over-decomposition phenomenon. Hence, the U value taken in the test is 5. Based on VMD experience, the balance parameter constrained by data fidelity adopts the default value of 2000, and the time step of double rise is 0.1. Figure 5 demonstrates   and reconstructed signal. Compared with the original signal, the reconstructed signal eliminates the noise. Then, FFT is employed to convert the reconstructed signal into the frequency domain.

Model application results and analysis
In this study, the CNN structure contains five convolutional layers, three pooling layers, a flattened layer, a fully connected layer, and a softmax layer. The parameters on each layer are presented in Table 2, which are determined based on comparative trials and ex- perience. The CNN model is written in Python 3.5 and runs on Windows 64 with the Core (TM) i5-8250U CPU and 8G RAM. The classification accuracy of the 2-class catalog after VMD-FFT-CNN is presented in Fig. 6. Figure 7 shows the classification accuracy of the 3-class catalog after VMD-FFT-CNN, respectively. In order to better illustrate the feature learning process of the proposed model, the t-SNE technique is applied to visualize the output of the softmax layer. It is a machine learning algorithm for high dimensional data visualization using a non-linear dimensionality reduction technique. The feature visualizations of the 2-class catalog and 3-class catalog after VMD-FFT-CNN are shown in Figs. 8 and 9, respectively. It can be seen that the distribution of the points with the same color is relatively closely grouped and easy to distinguish.
To better analyze the classification performance of the proposed method, the infrasound signals are processed by FFT-CNN. The classification accuracy of the 2-class catalog after FFT-CNN is presented in Fig. 10. Figure 11 shows the classification accuracy of the 3-class catalog after FFT-CNN. The feature visualizations of the 2-class catalog and 3-class catalog after FFT-CNN are shown in Figs. 12 and 13, respectively.
In order to verify the stability of the proposed method, the proposed model is tested ten times to derive the final classification result. The classification accuracies on the 2-class catalog consisting of earthquake and tsunami (1), earthquake and volcano (2), and tsunami and volcano (3) are shown in Fig. 14. The classification accuracies of the two architectures on the 3-class catalog consisting of signals from earthquake, tsunami, and volcano (4) are also presented in Fig. 14. As shown in Fig. 14, the classification accuracy of VMD-FFT-CNN is higher than the FFT-CNN model by nearly 5%, which shows that the VMD denoising process is effective. This implies that VMD-FFT-CNN has a good classification performance.

Discussion
VMD-FFT-CNN outperforms the FFT-CNN approach in the denoising process. VMD-FFT-CNN has high performance for infrasound signal identification and can achieve 97.75% and 81% of the 2-class catalog and 3-class catalog classification accuracies, respectively. The proposed approach shows excellent performance in classification accuracy compared with other methods and shows a good robustness under noisy environments. For example, the classification accuracy of VMD-FFT-CNN on the 2-class catalog consisting of earthquakes and volcanic increases by 23.25% compared with CNN (ALBERT, LINVILLE, 2020). This result demonstrates that the model presented in this paper has good accuracy for infrasound signal classification. As shown in Fig. 3, the source locations are widespread but their number is small. Due to the limitation of the data, the proposed approach may not be generalized for global hazard monitoring.

Conclusion and future work
This paper proposed a valid classification and identification method for the infrasound signal of disasters. The infrasound signal was processed by VMD to eliminate the noise. FFT was used to convert the reconstructed signal into a frequency domain image. A CNN model was constructed for automatically extracting the features and classifying the infrasound signals. The experiment results show that the proposed approach improves the accuracy of geophysical monitoring.
Due to the limitations of the existing conditions, tests can only use small samples and a few infrasound types, which will affect the reliability of the test results. In order to obtain more accurate results, more infrasound data and infrasonic event types should be analyzed. For future work, real-time infrasound signal classification will be carried out and further study on infrasound types will be performed. Deep learning should be developed for global infrasound signal classification.