
Sharing some exercises that helped me a lot in understanding Hidden Markov Models (HMMs). That’s why I thought this might be a helpful article for someone trying to grasp the basic concepts of HMMs in a practical way. The information will be delivered in the most basic approach: a pen-and-paper one.
Building a Foundation
Whenever I start learning something new, my go-to method is to create a basic foundation that I fully comprehend and then start building on top of it to understand more complex concepts.
If you’re not sure if you’ve got this “foundation” regarding HMMs, I’d recommend going through mathematicalmonk’s YouTube series on HMMs. It’s by far much better than any kind of summary I’d be able to write here.
Now, assuming the main ideas of HMMs are familiar to you, let’s get to the first problem that would hopefully help you understand HMMs better.
HMM the Old-Fashioned Way (Pen-and-Paper Part)
Consider an HMM with the following properties:
- Hidden state set:
- Observable state set:
- Parameter:
Where the matrices are defined as follows:
Initial distribution :
Transition matrix (probability of transitioning from state to state ):
Emission matrix (probability of observing symbol given hidden state):
We concern ourselves with a small time step of with the observed sequence being .
The Problems
Given this information, we’ll need to:
- Use the Forward algorithm to calculate . What is the probability that the sequence “a → b → a” actually occurs?
- Find the optimal hidden state sequence . Which are the 3 hidden states with the highest possibility of being behind the sequence “a → b → a”?
Solution to Part 1: The Forward Algorithm
The approach here is to go step by step and calculate the probability of observing the current state given the previously observed states at each time step.
Step 1: Initialization (First Observation)
For the first state, we calculate:
This is the (initial distribution probability) × (probability of observing state “a” given the previous hidden state) = P(state 0) * P(a|state 0).
Step 2: Recursion (Second Observation)
Unlike the calculation for the first state, here we have to consider the result from the previous state () and include it in the calculations. This results in a sum over the product of:
- Probabilities of being in a previous state ()
- Transition probabilities to the current state
- Emission probability for the current observation
You can notice that there’s a whole element, which is independent of the sum - , so the equation can be simplified to:
Computing for observation “b”:
If you feel lost in this explanation (I tried my best, but I wouldn’t blame you), just compare the vectors in the formula to the matrix columns of B and matrix rows of A and you’ll get a feel for what exactly’s going on.
Step 3: Recursion (Third Observation)
The process is identical for all future states—just different numbers to plug in. For the third observation “a”:
Similarly, we can simplify to:
Interpreting the Result
We’ve gone through each observed state at the 3 time steps. The final vector represents the probabilities of having observed the sequence “a → b → a” and ending up in each hidden state:
- Hidden state 1: 4.187%
- Hidden state 2: 3.5512%
- Hidden state 3: 5.2836%
Final Answer
Summing all three probabilities gives us the total probability of observing regardless of which hidden state we end up in:
The probability of observing the sequence “a → b → a” is approximately 13%.
What’s Next
In the next part, we’ll solve the second problem: finding the optimal hidden state sequence using the Viterbi algorithm. Also, I don’t think anyone does this by hand, so you can freely implement these algorithms in Matlab.
Comments, corrections, and other remarks are always appreciated.