A Symmetric Approach in the Three-Dimensional Digital Waveguide Modeling of the Vocal Tract

Simulation of wave propagation in the three-dimensional (3D) modeling of the vocal tract has shown signiﬁcant promise for enhancing the accuracy of speech production. Recent 3D waveguide models of the vocal tract have been designed for better accuracy but require a lot of computational tasks. A high computational cost in these models leads to novel work in reducing the computational cost while retaining accuracy and performance. In the current work, we divide the geometry of the vocal tract into four equal symmetric parts with the introduction of two axial perpendicular planes, and the simulation is performed on only one part. A novel strategy is deﬁned to implement symmetric conditions in the mesh. The complete standard 3D digital waveguide model is assumed as a benchmark model. The proposed model is compared with the benchmark model in terms of formant frequencies and eﬃciency. For the demonstration, the vowels / O /, /i/, / E /, / A /, and /u/ have been selected for the simulations. According to the results, the benchmark and current models are nearly identical in terms of frequency proﬁles and formant frequencies. Still the current model is three times more eﬀective than the benchmark model.


Introduction
Three-dimensional (3D) acoustic simulations of the vocal tract show tremendous promise for the research of speech acoustics (Gully, Tucker, 2019). Many studies have focused on simulating the vocal tract because of its significance in the speech production system ( The most common method for analyzing vocal tract acoustics is FEM. However, the existing technology limits its usage in the real-time application of wave propagation due to the long computation time. The use of waveguide modeling in computational vocal tract modeling is quite common. Computational models are used in digital waveguide modeling for wave propagation in the vocal tract. It is assumed that each node in the vocal tract grid acts as a scattering junction connected by a unit of waveguide element, which is known as the delay line in the simulation (Speed et al., 2013a;Treyssčde, 2021). High-quality sound generation in real-time is made possible by the use of digital waveguide models, which are efficient and realistic computational models of the waveguide model (Mullen et al., 2007;Speed et al., 2013a). A few attempts have been made to extend the Kelly-Lochbaum model by incorporating fractional delays for the vocal tract's elongation (Mathur et al., 2006;Qureshi, Syed, 2019). The Kelly-Lochbaum model was based on the same length of each cylindrical segment.
In the literature, the transformation of the vocal tract into concatenated segments with various cross-sectional areas has proven successful for onedimensional waveguide modeling (Mullen et al., 2003;2006), whereas the vocal tract has been converted into better approximated one using conical segmentations (Makarov, 2009;Strube, 2003). The multidimensional digital waveguide model offers more precision than the one-dimensional waveguide model but has a higher computing cost ( . The restriction of uniform structured mesh is a major limitation in the digital waveguide models (Speed et al., 2013a;Van Duyne, Smith, 1995). The complexity and computational cost of the three-dimensional digital waveguide model for vocal tract modeling limits the interest of the researchers. When accuracy is a concern, the three-dimensional model becomes a suitable candidate for modern computers (Speed et al., 2013a;2013b). In recent works, three-dimensional digital waveguide modeling for the vocal tract has been adopted for the uniform linear grid as well as the non-uniform rectilinear grid pro- For the modeling of the vocal tract, the current work is focused on implementing a symmetric 3D digital waveguide model. For this purpose, a 3D geometry of the vocal tract is divided into four equal symmetric parts with the introduction of two axial perpendicular planes. The modeling of the single symmetric part is referred to as a 3D symmetric digital waveguide model of the vocal tract. A novel strategy has been devised to implement symmetric conditions on the two perpendicular axial sides of the meshing.
The proposed approach will reduce the high computation cost in the simulation while keeping the model's accuracy. For the validation of our approach, we choose a complete standard three-dimensional waveguide as a benchmark model in the proposed work. For the demonstrations of our model, we consider crosssectional areas of the different vowels /O/, /i/, /E/, /A/, and /u/ from (Story et al., 1996). In the proposed work, the developed model is compared with the benchmarked model in terms of formant frequencies and efficiency. As a result of the simulations, the symmetric model is found to match the benchmark model in terms of accuracy and is three times more efficient than the benchmark model.

Three-dimensional symmetric waveguide modeling of the vocal tract
For the realistic modeling of the vocal tract, the accuracy of the wave propagation in the vocal tract using 3D waveguide modeling makes it popular in the field of speech production. However, a significant restriction in the application of the three-dimensional waveguide model is the computational cost. Not many works have been reported in the literature to reduce the computational cost using the three-dimensional waveguide model of the vocal tract. Furthermore, the current work is focused on reducing computational cost in the vocal tract modeling. The computational cost is greatly reduced with the help of implementing symmetric three-dimensional waveguide modeling of the vocal tract. Using two axial perpendicular planes, a 3D geometry of the vocal tract is divided into four equal symmetric parts and the modeling of the single symmetric part is referred to as a symmetric threedimensional digital waveguide model of the vocal tract. The symmetric conditions are implemented on the symmetric sides of the meshing. The visualization of the current approach in the proposed work is demonstrated in Figs. 1 and 2. Figure 1 shows the three-dimensional geometry of the vocal tract orientation in the case of the vowel /O/, where the length of the vocal tract is along the x-axis. Figure 2 presents the symmetric part with the green color for the vocal tract, as shown in Fig. 1. The re-  maining three symmetric parts are shown with the same blue color in the figure, which is not considered in the simulation of wave propagation in the vocal tract. A vocal tract is a cylindrical tube of varying crosssectional area in a one-dimensional waveguide model. The total vocal tract length is divided into multiple cylindrical tube segments with different cross-sectional areas. The wave equation is used to formulate the relationship between velocity and pressure in this tube (Markel, Gray, 1976;Rabiner, Schafer, 1978). D'Alembert's process helps to solve one-dimensional wave equations as a sum of left and right wave components. In the continuity and momentum equation at the intersection of two cylinders k-th and (k + 1)-th (Karjalainen, Erkut, 2004), R k is the reflection coefficient, A k is the cross-sectional area of the k-th cylinder, and A k+1 is the cross-sectional area of the (k + 1)-th cylinder: The accuracy is increased by getting the threedimensional structure in the waveguide model, which improves spatio-temporal connectivity in the sample grid. In this case, the mesh of the waveguide model increases with the expansion of the spatio-temporal connection in the sample grid of the three-dimensional (Beeson, Murphy, 2004;Mullen et al., 2006; Van Duyne, Smith, 1993). In the standard threedimensional waveguide model, every node is called a scattering junction, where each wave is modified and moves in many directions. Every scattering junction is situated at the same delay lines in the grid and each scattering junction has six more neighboring scattering junctions at a point of 90 degrees, as illustrated in Fig. 3. except at the walls of the vocal tract. The interconnection of multiple junctions is shown in Fig. 4 with a three-dimensional view. Multiple grid topologies are utilized in the literature for the three-dimensional digital waveguide modeling of the vocal tract. However, we selected a threedimensional uniform rectilinear grid topology in the work because of its simplicity. At each junction J, the incoming wave pressures are symbolized as p + J,1 , p + J,2 , p + J,3 , p + J,4 , p + J,5 , and p + J,6 , respectively, and the outgoing pressures are denoted by 5 , and p − J,6 , respectively. Finally, the pressure p J,i on each scattering junction J can be formulated as (Mullen, 2006; Van Duyne, Smith, 1993): For N -port neighboring junctions of the junction J, the total pressure p at the junction J can be written as (Mullen, 2006; Van Duyne, Smith, 1993): For a three-dimensional case with uniform impedance, the aforementioned equation can be rewritten as: The outgoing pressure components can be considered as: The current work is focused on the symmetric three-dimensional waveguide mesh. To increase the efficiency of the three-dimensional waveguide model, we impose the symmetric conditions on the symmetric sides of the waveguide mesh. In the configuration of the rectilinear mesh in Fig. 3, we consider that each junction has six neighboring junctions at an angle of 90 degrees from one another. Junctions 1 and 2 are along the x-axis, junctions 3 and 4 are along the y-axis, while junctions 5 and 6 are along the z-axis. Consider that the length of the three-dimensional vocal tract is along the x-direction while the cross-sectional area is along the yz-plane. In our work, we consider the part of the vocal tract that rests in the first quadrant in the coordinate axes. To make a symmetric model, we impose symmetric boundary conditions on the xyand xz-planes.
For the symmetry on the xy-plane, the neighboring nodes 5 and 6 are between the line of symmetry along the y-axis. The outgoing wave from junction J to port 6 is unimportant and is omitted, while the incoming wave from port 6 to junction J is important, and we consider it. To employ symmetric, it must be the same as an outgoing wave from J to port 5. In Eq. (4), the pressure at the junction J for the symmetric line can be derived as: Output traveling-wave components at the node are calculated as: For the symmetry on the xz-plane, the neighboring nodes 3 and 4 are between the line of symmetry along the z-axis. The outgoing wave from junction J to port 4 is unimportant and is omitted, while the incoming wave from port 4 to junction J is important, and we consider it. To employ symmetric, it must be the same as an outgoing wave from J to port 3. In Eq. (4), the pressure at the junction J for the symmetric line can be derived as: Output traveling-wave components at the node are calculated as:

Numerical simulation
A series of cross-sectional areas are acquired from the particular geometric configurations of the vocal tract. In the proposed work, we consider the different configurations of the vocal tract for the vowels /O/, /i/, /E/, /A/, and /u/. The cross-sectional areas of these vowels are taken from (Story et al., 1996) with varying vocal tract lengths of 17.46, 15.88, 16.67, 17.46, and 18.25 cm, respectively. For the production of the threedimensional mesh, the cross-sectional areas are interpolated into the smooth shape of the three-dimensional vocal tract by using a piecewise third-order spline.
For the best resolution of the graphs, the length of the impulse response is chosen as 25 000 input samples for every vowel considered in this work. All input samples have zero values except the first input sample, which has a value of 1. The numerical solution of the current vocal model comprises two basic steps for each iteration. The first step is used to find the scattering of the waves at each junction of the mesh, while the wave delay of each junction is passed to the neighboring junctions in the second step. In the first step, we apply the glottal boundary condition (Speed et al., 2013a) on each junction that lies on the inlet of the vocal tract. For the scattering of the wave, Eq. (4) is used to calculate the total pressure on each interior junction of the mesh and we also measure the outgoing pressure in all six directions with the help of Eq. (5). The scattering of the waves at outlet junctions (Speed et al., 2013a) is also computed. For the junctions that lie on the symmetric xy-plane, Eqs. (6) and (7) are employed to calculate total and outgoing pressures, respectively. Similarly, for the case of a symmetric xz-plane, Eqs. (8) and (9) are used. In the second step, all components of the wave delay of each junction are passed to its neighboring junctions. We iterate this process according to the total number of input samples to get output samples. The frequency response is then obtained from the transfer function by applying fast Fourier transformation (FFT) and natural logarithms on the output samples. The simulations of the current work are compiled in the high-level computer programming language MATLAB 2020. MATLAB computer language also offers the use of C++-coded files to increase the efficiency of the simulations. We also developed a C++coded file used in MATLAB to increase the efficiency for the high-cost computational work in the 3D symmetric meshing. The reflection coefficient of the glottis r G , the reflection coefficient of the wall r W and the reflection coefficient lip r L are assigned with values of 0.97, 1.0, and −0.9, respectively.

Results and discussion
In this section, we demonstrate our current approach in the execution of the symmetric threedimensional digital waveguide model of the vocal tract. Our benchmark model is used to compare and validate our symmetric model. Based on accuracy and efficiency, the comparison between the symmetric and the benchmark models is established. The first three lowest formant frequencies are known to be sufficient for every vowel to be recognized. In the current work, we take six formant frequencies of the benchmark and symmetric models and compare them with each other so that we can get a better idea about the relative error, and check the efficiency of our symmetric model. The six formant frequencies are denoted by f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , which are mentioned in the first column of all tables.
For the comparison process, the sample delay d s is taken the same for both models. In the current work, vowels /O/, /i/, /E/, /A/, and /u/ are taken with different vocal tract lengths. The efficiency and accuracy of the current model are compared with the benchmark model in the collapsed time, frequency profiles and formant frequencies, respectively. At least three formant frequencies are required for each vowel. In our work, we consider the first six formant frequencies produced by the benchmark model and our proposed model (Arnela et al., 2016b). Figure 5 shows the frequency profiles generated by a transfer function of a vowel /O/ of the vocal tract up to 6000 Hz. We observe that the frequency profiles of benchmark and symmetric models overlap. There is no difference between the profiles of the benchmark and the symmetric models. In other words, the frequency profiles of both models are identical to each other. Numerically calculated values are given in Table 1. In this table, column 4 shows the relative error between the benchmark and symmetric models. This column presents the zero relative errors in all six formant frequencies in the current case of vowel /O/. Column 7 shows that the proposed model is about 3 times more efficient than the benchmark model due to the high efficiency of the symmetric model.
The frequency profiles of the vowel /i/ of the vocal tract obtained by its transfer function are shown in Fig. 6. It is also noted that the frequency profile of the symmetric model matches the frequency profile of the benchmark model. Measured values of formant frequencies are presented in Table 2. Column 4 of this table shows the zero relative errors in all six formants frequencies of vowel /i/. Column 7 shows that   Fig. 7. Comparison between the frequency profiles of the benchmark and symmetric models in the vowel /u/ case.

Conclusions
In the proposed work, the symmetric model was used for modeling three-dimensional waveguide model of the vocal tract. The computational cost was greatly reduced by employing a symmetric approach in this work. The simulation was performed on the vowels /O/, /i/, /E/, /A/, and /u/. By studying tables and figures, we draw the following conclusions: -successful implementation of the symmetric approach in the three-dimensional waveguide modeling of the vocal tract was conducted; -the formant frequencies of the symmetric model are the same as the benchmark model; -the frequency profiles of the current model overlap with that of the benchmark model; -the symmetric model is more efficient than the benchmark model. In all the cases, the proposed model is about 3 times more efficient than the benchmark model. We conclude that the symmetric model presents itself as a highly efficient and accurate three-dimensional waveguide model of the vocal tract. The proposed model provides an opportunity to efficiently investigate the vocal tract's frequency response for speech production. We believe the symmetric model may serve as a useful vocal tract model in speech synthesizers and provide a new dimension for further investigations.