Learning Hidden Markov Models with Simple Examples [Part 2]
|
In the previous part, we covered the first task of the presented example. In this part, I’ll present the solution to the second task of the problem and try to explain the intuition behind it.
Problem Recap
Consider an HMM with the following properties:
Hidden state set:Q={1,2,3}
Observable state set:V={a,b}
Parameter:λ=(π,A,B)
Where the matrices are defined as follows:
Initial distributionπ:
π=0.20.40.4
Transition matrixA (probability of transitioning from state i to state j):
A=0.50.30.20.20.50.30.30.20.5
Emission matrixB (probability of observing symbol given hidden state):
B=0.50.40.70.50.60.3
We concern ourselves with a small time step of T=3 with the observed sequence being O=(a,b,a).
The Tasks
Use the Forward algorithm to calculate P(O∣λ). → DONE in Part 1
Find the optimal hidden state sequence I∗=(i1∗,i2∗,i3∗). Which are the 3 hidden states with the highest possibility of being behind the sequence “a → b → a”?
Solution to Part 2: The Viterbi Algorithm
To find out which is the most probable hidden state at each time step, it’s necessary to find the maximum product over the following properties:
Probability of having the current observed state given the current hidden state
Probability of having the current hidden state given the last hidden state
Probability of having the last hidden state
Important Aspects
We have evidence of the observed state at each time step, given in the task as O=(a,b,a).
We don’t have to look further back past the last hidden state, because the last hidden state “blocks” the influence of any previous evidence from the Markov chain.
If you’re not certain about the second aspect, be sure to check out the concept of conditional independence. The Markov property ensures that the future is independent of the past given the present state.
Mathematical Definition
Once this is established, we can give the mathematical definition of what needs to be calculated at each step:
This represents the highest probability of any path ending in state i at time t, given the observations up to time t.
Then, it’s a matter of calculating the most probable hidden state for each time step.
Step-by-Step Calculations
I’ll add the max only at the beginning for improved readability, but the important thing is to remember that we’re always trying to find a scalar value - the max scalar value.
Result: The most probable hidden state at the third time step is hidden state 3 (0.0147).
Final Answer
Taking the argmax at each time step gives us the optimal hidden state sequence:
I∗=(i1∗,i2∗,i3∗)=(3,2,3)
In words: If you observe the sequence “a → b → a” during the first 3 time steps, the hidden states with the highest probability of producing this sequence are “3 → 2 → 3”.
Comments, corrections, and other remarks are always appreciated.