Learning Hidden Markov Models with Simple Examples [Part 1]

Hidden Markov Model Diagram

Sharing some exercises that helped me a lot in understanding Hidden Markov Models (HMMs). That’s why I thought this might be a helpful article for someone trying to grasp the basic concepts of HMMs in a practical way. The information will be delivered in the most basic approach: a pen-and-paper one.

Building a Foundation

Whenever I start learning something new, my go-to method is to create a basic foundation that I fully comprehend and then start building on top of it to understand more complex concepts.

If you’re not sure if you’ve got this “foundation” regarding HMMs, I’d recommend going through mathematicalmonk’s YouTube series on HMMs. It’s by far much better than any kind of summary I’d be able to write here.

Now, assuming the main ideas of HMMs are familiar to you, let’s get to the first problem that would hopefully help you understand HMMs better.

HMM the Old-Fashioned Way (Pen-and-Paper Part)

Consider an HMM with the following properties:

Hidden state set: $Q = \{1, 2, 3\}$
Observable state set: $V = \{a, b\}$
Parameter: $\lambda = (\pi, A, B)$

Where the matrices are defined as follows:

Initial distribution $\pi$ :

\pi = \begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \end{bmatrix}

Transition matrix $A$ (probability of transitioning from state $i$ to state $j$ ):

A = \begin{bmatrix} 0.5 & 0.2 & 0.3 \\ 0.3 & 0.5 & 0.2 \\ 0.2 & 0.3 & 0.5 \end{bmatrix}

Emission matrix $B$ (probability of observing symbol given hidden state):

B = \begin{bmatrix} 0.5 & 0.5 \\ 0.4 & 0.6 \\ 0.7 & 0.3 \end{bmatrix}

We concern ourselves with a small time step of $T = 3$ with the observed sequence being $O = (a, b, a)$ .

The Problems

Given this information, we’ll need to:

Use the Forward algorithm to calculate $P(O|\lambda)$ . What is the probability that the sequence “a → b → a” actually occurs?
Find the optimal hidden state sequence $I^* = (i_1^*, i_2^*, i_3^*)$ . Which are the 3 hidden states with the highest possibility of being behind the sequence “a → b → a”?

Solution to Part 1: The Forward Algorithm

The approach here is to go step by step and calculate the probability of observing the current state given the previously observed states at each time step.

Step 1: Initialization (First Observation)

For the first state, we calculate:

\alpha_1(i_1) = \pi \cdot P_i(a | i_1)

This is the (initial distribution probability) × (probability of observing state “a” given the previous hidden state) = P(state 0) * P(a|state 0).

\alpha_1(i_1) = \pi \cdot P_i(a | i_1) = \begin{bmatrix} 0.2 \\ 0.4 \\ 0.4 \end{bmatrix} \cdot \begin{bmatrix} 0.5 \\ 0.4 \\ 0.7 \end{bmatrix} = \begin{bmatrix} 0.10 \\ 0.16 \\ 0.28 \end{bmatrix}

Step 2: Recursion (Second Observation)

Unlike the calculation for the first state, here we have to consider the result from the previous state ( $\alpha_1$ ) and include it in the calculations. This results in a sum over the product of:

Probabilities of being in a previous state ( $\alpha_1$ )
Transition probabilities to the current state
Emission probability for the current observation

\alpha_2(i_2) = \sum_{j=1}^{3}\left[\alpha_1(j) \cdot P_i(b | i_2)\cdot P_i(i_2 | j)\right]

You can notice that there’s a whole element, which is independent of the sum - $P_i(b | i_2)$ , so the equation can be simplified to:

\alpha_2(i_2) = P_i(b | i_2) \cdot \sum_{j=1}^{3}\left[\alpha_1(j) \cdot P_i(i_2 | j)\right]

Computing for observation “b”:

\alpha_2(i_2) = \begin{bmatrix} 0.5 \\ 0.6 \\ 0.3 \end{bmatrix} \cdot \left( 0.10 * \begin{bmatrix} 0.5 \\ 0.2 \\ 0.3 \end{bmatrix} + 0.16 * \begin{bmatrix} 0.3 \\ 0.5 \\ 0.2 \end{bmatrix} + 0.28 * \begin{bmatrix} 0.2 \\ 0.3 \\ 0.5 \end{bmatrix} \right) = \begin{bmatrix} 0.0770 \\ 0.1104 \\ 0.0606 \end{bmatrix}

If you feel lost in this explanation (I tried my best, but I wouldn’t blame you), just compare the vectors in the formula to the matrix columns of B and matrix rows of A and you’ll get a feel for what exactly’s going on.

Step 3: Recursion (Third Observation)

The process is identical for all future states—just different numbers to plug in. For the third observation “a”:

\alpha_3(i_3) = \sum_{j=1}^{3}\left[\alpha_2(j) \cdot P_i(a | i_3)\cdot P_i(i_3 | j)\right]

Similarly, we can simplify to:

\alpha_3(i_3) = P_i(a | i_3) \cdot \sum_{j=1}^{3}\left[\alpha_2(j) \cdot P_i(i_3 | j)\right]

\alpha_3(i_3) = \begin{bmatrix} 0.5 \\ 0.4 \\ 0.7 \end{bmatrix} \cdot \left( 0.0770 * \begin{bmatrix} 0.5 \\ 0.2 \\ 0.3 \end{bmatrix} + 0.1104 * \begin{bmatrix} 0.3 \\ 0.5 \\ 0.2 \end{bmatrix} + 0.0606 * \begin{bmatrix} 0.2 \\ 0.3 \\ 0.5 \end{bmatrix} \right) = \begin{bmatrix} 0.041870 \\ 0.035512 \\ 0.052836 \end{bmatrix}

Interpreting the Result

We’ve gone through each observed state at the 3 time steps. The final vector $\alpha_3$ represents the probabilities of having observed the sequence “a → b → a” and ending up in each hidden state:

Hidden state 1: 4.187%
Hidden state 2: 3.5512%
Hidden state 3: 5.2836%

Final Answer

Summing all three probabilities gives us the total probability of observing $O = (a, b, a)$ regardless of which hidden state we end up in:

P(O|\lambda) = \sum_{i=1}^{3} \alpha_3(i) = 0.04187 + 0.035512 + 0.052836 \approx 0.13 = 13\%

The probability of observing the sequence “a → b → a” is approximately 13%.

What’s Next

In the next part, we’ll solve the second problem: finding the optimal hidden state sequence using the Viterbi algorithm. Also, I don’t think anyone does this by hand, so you can freely implement these algorithms in Matlab.

Comments, corrections, and other remarks are always appreciated.