top of page

Word Recognition: Frequency Domain Techniques

We tried three different pre-processing approaches to maximize the accuracy of our classifier. These included the spectrogram, the discrete Fourier transform, and the periodogram
To evaluate each of these approaches we used a data set of four people saying ten digits 0-9 fifty times divided into training and verification sets to train a KNN classifier after it was processed using the various techniques . In the case of the periodogram and the discrete fourier transform we zero padded the sound sample so that the length was consistent for the classifier. For the averaged spectrogram the same effect was obtained by forcing the spectrogram spectrum to return a consistent number of frequencies then average across time. In each case data from the first three people was chosen to train the classifier and data from the last person was used to verify it. The averaged spectrogram had the best results.
The following graphs are called confusion matrices and the vertical axis refers to the true label of the digit and the horizontal axis corresponds to the label our classifier predicted. This means entries on the diagonal were predicted correctly. As can be seen from the data all of these approaches had a lot of misses.

Frequency Techniques: Text
Frequency Techniques: Gallery
Frequency Techniques: Files
bottom of page