Second Iteration: Word Recognition
Details
In order to train the KNN classifier using the MFCC we treated each window of the MFCC as a data entry to the classifier. For classifying we had the classifier predict a score for each window and then for each word we added up the score for each window corresponding to that word and chose the label with the highest score as the most likely label. We also changed how we divided training and testing data. For the following confusion matrix we trained with 40 samples of each digit from each person and verified with 10 samples of each digit from each person. The results compare well with the averaged spectrogram. While the spectrogram has high accuracy for some digits for others it falls short, such as zero where it only identifies it properly about 60% of the time, meanwhile using the MFCC for feature extracting leads to 97.5% accuracy in the worst case.
Mel-Frequency Cepstral Coefficients (MFCC)
Mel Filter Cepstral Coefficients are calculated using the following steps
Separate the signal into windows
For each window calculate the Discrete Fourier Transform and obtain its magnitude
Pass these magnitudes through a triangular filter bank with 20-40 filters
Take the logarithm of the resulting energy
Take the inverse cosine transform (similar to DFT except cosines are used instead of complex exponentials)
The m-th filter is defined as follows:
Where f() = is the list of m + 2 Mel Space Frequencies