Skip to content
Go back

Learning Hidden Markov Models with Simple Examples [Part 2]

Hidden Markov Model Diagram

In the previous part, we covered the first task of the presented example. In this part, I’ll present the solution to the second task of the problem and try to explain the intuition behind it.

Problem Recap

Consider an HMM with the following properties:

Where the matrices are defined as follows:

Initial distribution π\pi:

π=[0.20.40.4]\pi = \begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \end{bmatrix}

Transition matrix AA (probability of transitioning from state ii to state jj):

A=[0.50.20.30.30.50.20.20.30.5]A = \begin{bmatrix} 0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5 \end{bmatrix}

Emission matrix BB (probability of observing symbol given hidden state):

B=[0.50.50.40.60.70.3]B = \begin{bmatrix} 0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3 \end{bmatrix}

We concern ourselves with a small time step of T=3T = 3 with the observed sequence being O=(a,b,a)O = (a, b, a).

The Tasks

  1. Use the Forward algorithm to calculate P(Oλ)P(O|\lambda).DONE in Part 1
  2. Find the optimal hidden state sequence I=(i1,i2,i3)I^* = (i_1^*, i_2^*, i_3^*). Which are the 3 hidden states with the highest possibility of being behind the sequence “a → b → a”?

Solution to Part 2: The Viterbi Algorithm

To find out which is the most probable hidden state at each time step, it’s necessary to find the maximum product over the following properties:

Important Aspects

  1. We have evidence of the observed state at each time step, given in the task as O=(a,b,a)O = (a, b, a).
  2. We don’t have to look further back past the last hidden state, because the last hidden state “blocks” the influence of any previous evidence from the Markov chain.

If you’re not certain about the second aspect, be sure to check out the concept of conditional independence. The Markov property ensures that the future is independent of the past given the present state.

Mathematical Definition

Once this is established, we can give the mathematical definition of what needs to be calculated at each step:

μk(ik)=maxi1:k1(P(i1:k,O1:k)=maxik1P(Okik)P(ikik1)μk1(ik1))\mu_k(i_k) = \max_{i_{1:k-1}} \left( P(i_{1:k}, O_{1:k}) = \max_{i_k-1} P(O_k | i_k) \cdot P(i_k | i_{k-1}) \cdot \mu_{k-1}(i_{k-1}) \right)

This represents the highest probability of any path ending in state ii at time tt, given the observations up to time tt.

Then, it’s a matter of calculating the most probable hidden state for each time step.

Step-by-Step Calculations

I’ll add the maxmax only at the beginning for improved readability, but the important thing is to remember that we’re always trying to find a scalar value - the max scalar value.

First Time Step (Observation “a”)

For t=1t = 1:

μ1(i1)=max(P(ai1)π)=[0.50.40.7][0.20.40.4]=[0.100.160.28]\mu_1(i_1) = max \left( P(a | i_1) \cdot \pi \right) = \begin{bmatrix} 0.5 \\ 0.4 \\ 0.7 \end{bmatrix} \cdot \begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \end{bmatrix} = \begin{bmatrix} 0.10 \\ 0.16 \\ 0.28 \end{bmatrix} μ1(i1)=state 3=i1\mu_1(i_1) = state\ 3 = i_1^*

Result: The highest probability is for hidden state 3 - 28%28\%.

Second Time Step (Observation “b”)

For t=2t = 2:

For each possible current hidden state, we find the maximum over all possible previous states:

μ2(i2)=maxik1(P(bi2)P(i2i1)μ1(i1))\mu_2(i_2) = \max_{i_{k-1}} \left( P(b | i_2) \cdot P(i_2 | i_1) \cdot \mu_1(i_1) \right)

For hidden state 1:

μ2(1)=maxi1(P(b1)P(1i1)μ1(i1))=P(b1)maxi1(P(1i1)μ1(i1))==P(b1)maxi1([0.50.30.2][0.100.160.28])=0.50.056=0.028\mu_2(1) = \max_{i_1} \left( P(b | 1) \cdot P(1 | i_1) \cdot \mu_1(i_1) \right) = P(b | 1) \cdot \max_{i_1} \left( P(1 | i_1) \cdot \mu_1(i_1) \right) = \newline = P(b | 1) \cdot \max_{i_1} \left( \begin{bmatrix} 0.5 \\ 0.3 \\ 0.2 \end{bmatrix} \cdot \begin{bmatrix} 0.10 \\ 0.16 \\ 0.28 \end{bmatrix} \right) = 0.5 * 0.056 = 0.028

For hidden state 2:

μ2(2)=maxi1(P(b2)P(2i1)μ1(i1))=P(b2)maxi1(P(2i1)μ1(i1))==P(b2)maxi1([0.20.50.3][0.100.160.28])=0.60.084=0.0504\mu_2(2) = \max_{i_1} \left( P(b | 2) \cdot P(2 | i_1) \cdot \mu_1(i_1) \right) = P(b | 2) \cdot \max_{i_1} \left( P(2 | i_1) \cdot \mu_1(i_1) \right) = \newline = P(b | 2) \cdot \max_{i_1} \left( \begin{bmatrix} 0.2 \\ 0.5 \\ 0.3 \end{bmatrix} \cdot \begin{bmatrix} 0.10 \\ 0.16 \\ 0.28 \end{bmatrix} \right) = 0.6 * 0.084 = 0.0504

For hidden state 3:

μ2(3)=maxi1(P(b3)P(3i1)μ1(i1))=P(b3)maxi1(P(3i1)μ1(i1))==P(b3)maxi1([0.30.20.5][0.100.160.28])=0.30.14=0.042\mu_2(3) = \max_{i_1} \left( P(b | 3) \cdot P(3 | i_1) \cdot \mu_1(i_1) \right) = P(b | 3) \cdot \max_{i_1} \left( P(3 | i_1) \cdot \mu_1(i_1) \right) = \newline = P(b | 3) \cdot \max_{i_1} \left( \begin{bmatrix} 0.3 \\ 0.2 \\ 0.5 \end{bmatrix} \cdot \begin{bmatrix} 0.10 \\ 0.16 \\ 0.28 \end{bmatrix} \right) = 0.3 * 0.14 = 0.042

Result: The most probable hidden state at the second time step is hidden state 2 (0.0504).

Third Time Step (Observation “a”)

For t=3t = 3:

For hidden state 1:

μ3(1)=maxi2(P(a1)P(1i2)μ2(i2))=P(a1)maxi2(P(1i2)μ2(i2))==P(a1)maxi2([0.50.30.2][0.0280.05040.042])=0.50.01512=0.0076\mu_3(1) = \max_{i_2} \left( P(a | 1) \cdot P(1 | i_2) \cdot \mu_2(i_2) \right) = P(a | 1) \cdot \max_{i_2} \left( P(1 | i_2) \cdot \mu_2(i_2) \right) = \newline = P(a | 1) \cdot \max_{i_2} \left( \begin{bmatrix} 0.5 \\ 0.3 \\ 0.2 \end{bmatrix} \cdot \begin{bmatrix} 0.028 \\ 0.0504 \\ 0.042 \end{bmatrix} \right) = 0.5 * 0.01512 = 0.0076

For hidden state 2:

μ3(2)=maxi2(P(a2)P(2i2)μ2(i2))=P(a2)maxi2(P(2i2)μ2(i2))==P(a2)maxi2([0.20.50.3][0.0280.05040.042])=0.40.0252=0.01008\mu_3(2) = \max_{i_2} \left( P(a | 2) \cdot P(2 | i_2) \cdot \mu_2(i_2) \right) = P(a | 2) \cdot \max_{i_2} \left( P(2 | i_2) \cdot \mu_2(i_2) \right) = \newline = P(a | 2) \cdot \max_{i_2} \left( \begin{bmatrix} 0.2 \\ 0.5 \\ 0.3 \end{bmatrix} \cdot \begin{bmatrix} 0.028 \\ 0.0504 \\ 0.042 \end{bmatrix} \right) = 0.4 * 0.0252 = 0.01008

For hidden state 3:

μ3(3)=maxi2(P(a3)P(3i2)μ2(i2))=P(a3)maxi2(P(3i2)μ2(i2))==P(a3)maxi2([0.30.20.5][0.0280.05040.042])=0.70.021=0.0147\mu_3(3) = \max_{i_2} \left( P(a | 3) \cdot P(3 | i_2) \cdot \mu_2(i_2) \right) = P(a | 3) \cdot \max_{i_2} \left( P(3 | i_2) \cdot \mu_2(i_2) \right) = \newline = P(a | 3) \cdot \max_{i_2} \left( \begin{bmatrix} 0.3 \\ 0.2 \\ 0.5 \end{bmatrix} \cdot \begin{bmatrix} 0.028 \\ 0.0504 \\ 0.042 \end{bmatrix} \right) = 0.7 * 0.021 = 0.0147 μ3(i3)=[0.00760.010080.0147]\mu_3(i_3) = \begin{bmatrix} 0.0076 \\ 0.01008 \\ 0.0147 \end{bmatrix}

Result: The most probable hidden state at the third time step is hidden state 3 (0.0147).

Final Answer

Taking the argmax at each time step gives us the optimal hidden state sequence:

I=(i1,i2,i3)=(3,2,3)I^* = (i_1^*, i_2^*, i_3^*) = (3, 2, 3)

In words: If you observe the sequence “a → b → a” during the first 3 time steps, the hidden states with the highest probability of producing this sequence are “3 → 2 → 3”.


Comments, corrections, and other remarks are always appreciated.


Share this post on:

Previous Post
Building a Redis Clone in Rust: Parsing the RESP Protocol
Next Post
Learning Hidden Markov Models with Simple Examples [Part 1]