1

I'm trying to implement some code from here

And I have trained the HMM with my coefficients but do not understand how the Viterbi Decoder algorithm works, for example:

 viterbi_decode(MFCC, M, model, q);
 where MFCC = coefficents 
 M = size of MFCC
 model = Model of HMM training using the MFCC coefficients 
 q = unknown (believed to be the outputted path).

But here is what I do not understand: I am attempting to compare two speech signals (training, sample) to find out the closest possible match. With the DTW algorithm for example, a single integer is returned where I can then find the closest, however, with this algorithm it returns a int* array and therefore differentiating is difficult.

Here is how the current program works:

vector<DIMENSIONS_2> MFCC = mfcc.transform(rawData, sample_rate);

int N = MFCC.size();
int M = 13;

double** mfcc_setup = setupHMM(MFCC, N, M);

model_t* model = hmm_init(mfcc_setup, N, M, 10);

hmm_train(mfcc_setup, N, model);

int* q = new int[N];

viterbi_decode(mfcc_setup, M, model, q); 

Could anyone please tell me how the Viterbi Decoder works for the problem of identifying which is the best path to take from the training, to the input? I've tried both the Euclidean distance as well as the Hamming Distance on the decode path (q) but had no such luck.

Any help would be greatly appreciated

1
  • Are training and sample signals same length? If so the int* array may be returning the distances between the mfcc array of the training and sample. Recall, that usually mfcc means you first chunk audio into segments then extract ~13 coeffecients from each audio get the mel features, so the output of mfcc is a 2d array, thus the difference between two samples (2 2d arrays), is a 1d array where each entry is the respective difference of a particular row of the 2d arrays. Commented Mar 14, 2013 at 7:39

1 Answer 1

1

In this example it seems to me that (q) is the hidden state sequence, so a list of numbers from 0->9. If you have two audio samples say, test and train, and you generate two sequences q_test and q_train, then thinking about |q_test - q_train|, where the norm is componentwise distance, is not useful because it isn't representing a notion of distance correctly, since hidden state labels in HMM may be arbitrary.

A more natural way to think about distance may be the following, given q_train, you are interested in the probability that your test sample took that same path, which you can compute once you have the transition matrix and emission probabilites.

Please let me know if I am misunderstanding your question.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.