Voice Analysis

The Gong system includes an interface for simple speech analysis. This interface is hidden in the voice analysis window because it is intended to be used by advanced users. To use this interface, simply click on the horizontal bar at the top of the window and the interface will be shown.

The Hidden Interface
The Analysis Window
The Hidden Interface
The Same Window with
Analysis Shown

The Voice Analysis Interface
The Interface for a Simple Voice Analysis
This window, shown on the right, can essentially show three different views of the voice recording. They are the spectrogram view of the voice data, the waveform display of the data and the word alignments. The word alignments information is an important feature used by the selective word/phrase playback function.

The Spectrogram

The spectrogram is an analysis of the voice recording in the frequency domain. It is produced by a Fourier Transform on the voice data. The analysis can be performed using different sizes of the transform depending on the computing power of the machine.

In the images below, we show two Fourier Transform using two different bin sizes. The first one is a spectrogram with a bin size of 32.

The Spectrogram with a Size of 32
A Spectrogram of a Voice Recording with a Bin Size of 32

The next one is a spectrogram on the same voice recording with a bin size of 256.

The Spectrogram with a Size of 256
A Spectrogram of a Voice Recording with a Bin Size of 256

The Waveform Display

The waveform display is a plain display of the voice recording according to the time domain data. Compared to the spectrogram it is much simpler. The audio samples are displayed using the selected time scale.

The Waveform Display
A Waveform Display of a Voice Recording

Word Alignments Information

The main feature in the analysis window is the word alignment information of the voice recording together with the text of the message. The word alignments are displayed at the top bar as shown in the previous three images above. Moreover, they can be shown on top of the spectrogram or waveform display.

The Top Bar with Word Alignments and Timing
The Top Bar with Word Alignments Information

The image on the right shows the word alignment in the middle of a voice recording. It shows the word "THE" is spoken at 38.59 seconds while the word "PATHFINDER" is spoken at 38.97 seconds of the whole speech.

The Waveform Display with Word Markers
The Waveform Display with Word Markers

The markers express the sequence of those words. For example, the image on the left shows that the word "THE" is word number 116 in the speech and the word "PATHFINDER" is word number 117.