Voice Analysis
The Gong system includes an interface for simple speech analysis. This interface is hidden in the voice analysis window because it is intended to be used by advanced users. To use this interface, simply click on the horizontal bar at the top of the window and the interface will be shown.
Analysis Shown
The Spectrogram
The spectrogram is an analysis of the voice recording in the frequency domain. It is produced by a Fourier Transform on the voice data. The analysis can be performed using different sizes of the transform depending on the computing power of the machine.
In the images below, we show two Fourier Transform using two different bin sizes. The first one is a spectrogram with a bin size of 32.
The next one is a spectrogram on the same voice recording with a bin size of 256.
The Waveform Display
The waveform display is a plain display of the voice recording according to the time domain data. Compared to the spectrogram it is much simpler. The audio samples are displayed using the selected time scale.
Word Alignments Information
The main feature in the analysis window is the word alignment information of the voice recording together with the text of the message. The word alignments are displayed at the top bar as shown in the previous three images above. Moreover, they can be shown on top of the spectrogram or waveform display.
The image on the right shows the word alignment in the middle of a voice recording. It shows the word "THE" is spoken at 38.59 seconds while the word "PATHFINDER" is spoken at 38.97 seconds of the whole speech.
The markers express the sequence of those words. For example, the image on the left shows that the word "THE" is word number 116 in the speech and the word "PATHFINDER" is word number 117.
The Gong Project