Understanding Viterbi Algorithm

Question

I'm trying to implement some code from here

And I have trained the HMM with my coefficients but do not understand how the Viterbi Decoder algorithm works, for example:

 viterbi_decode(MFCC, M, model, q);
 where MFCC = coefficents 
 M = size of MFCC
 model = Model of HMM training using the MFCC coefficients 
 q = unknown (believed to be the outputted path).

But here is what I do not understand: I am attempting to compare two speech signals (training, sample) to find out the closest possible match. With the DTW algorithm for example, a single integer is returned where I can then find the closest, however, with this algorithm it returns a int* array and therefore differentiating is difficult.

Here is how the current program works:

vector<DIMENSIONS_2> MFCC = mfcc.transform(rawData, sample_rate);

int N = MFCC.size();
int M = 13;

double** mfcc_setup = setupHMM(MFCC, N, M);

model_t* model = hmm_init(mfcc_setup, N, M, 10);

hmm_train(mfcc_setup, N, model);

int* q = new int[N];

viterbi_decode(mfcc_setup, M, model, q);

Could anyone please tell me how the Viterbi Decoder works for the problem of identifying which is the best path to take from the training, to the input? I've tried both the Euclidean distance as well as the Hamming Distance on the decode path (q) but had no such luck.

Any help would be greatly appreciated

Are training and sample signals same length? If so the int* array may be returning the distances between the mfcc array of the training and sample. Recall, that usually mfcc means you first chunk audio into segments then extract ~13 coeffecients from each audio get the mel features, so the output of mfcc is a 2d array, thus the difference between two samples (2 2d arrays), is a 1d array where each entry is the respective difference of a particular row of the 2d arrays. — Anil Vaitla
– Anil Vaitla, Commented Mar 14, 2013 at 7:39

Anil Vaitla · Accepted Answer · 2013-03-15 03:00:17Z

1

In this example it seems to me that (q) is the hidden state sequence, so a list of numbers from 0->9. If you have two audio samples say, test and train, and you generate two sequences q_test and q_train, then thinking about |q_test - q_train|, where the norm is componentwise distance, is not useful because it isn't representing a notion of distance correctly, since hidden state labels in HMM may be arbitrary.

A more natural way to think about distance may be the following, given q_train, you are interested in the probability that your test sample took that same path, which you can compute once you have the transition matrix and emission probabilites.

Please let me know if I am misunderstanding your question.

edited Mar 15, 2013 at 3:00

answered Mar 14, 2013 at 7:48

Anil Vaitla

2,97824 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Understanding Viterbi Algorithm

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related