Scanning Classification Details

This method uses a segment of the entire audio signal, preferably the same length as the censored word, and feeds it into the classifier. The classifier defines the score, which is tangent to probability, for each window and each trained word. This data is stored and then the windowed is moved forward a small amount of time. In our program it moves 10 milliseconds. All these classified windows and their scores are plotted. We then use a similar threshold detection to find the censored word. If a score is above a threshold it is considered to be the desired word, and is censored. The key tools used in this iteration are windowing, a digital signal processing tool from class, and posterior probability from knn classification.

The score is determined based on the posterior probability. This considers the category of the k nearest neighboring points, but also their relative distances from each other and from the test vector. The first figure below is taken from MATLAB code documentation for “Predict”

Scanning Classification Details: Text

nbd = the k nearest neighbors
W(i) = the weight of the current neighbor, or the relative proximity to the current point relative to the rest of the k nearest neighbors

Scanning Classification Details: Gallery

The below graphic is the graph of the score from each window. This image can be confusing because it has a lot of overlapping data due to the movement of the window for each iteration; however, it does show how, when the censored word begins to enter the window, the score drastically increases. This increase is detectable enough to find and censor the desired word.
The scanning method implemented requires a few parameters. The first parameter is the length of the word; this changes the window length and effects how well the classifier is fed. Another important parameter is the number of milliseconds to advance the window after each iteration. This can show more accuracy with smaller jump times, but can also cause your program to be slower. A third parameter is the threshold at which to consider an audio clip to be the censored word. We did not find a mathematical model that would work, but used experimental data to determine the current threshold. We found a trend between the number of categories used to train the classifier and the potential match score, which were inversely correlated, potentially not linearly.
By eliminating the word separation program, we were able to censor audio clips with much faster and more colloquial speech. As speech became quicker and more natural, words in the time-domain are more difficult to separate and are often blended within a sentence(amplitude or power thresholding will not detect a difference)

Scanning Classification Details: Text

Scanning Classification Details: Gallery

Home

Scanning Classification Details: Files

Scanning Classification Details

nbd = the k nearest neighborsW(i) = the weight of the current neighbor, or the relative proximity to the current point relative to the rest of the k nearest neighbors

nbd = the k nearest neighbors
W(i) = the weight of the current neighbor, or the relative proximity to the current point relative to the rest of the k nearest neighbors