Template Bases Graph Model

Hidden Markov Models


Hidden Markov Models (HMMs) are probabilistic graphical models used to model sequences of observations, where the underlying states generating the observations are hidden or unobserved. HMMs assume that the states form a Markov chain, meaning the probability of transitioning to a new state only depends on the current state. HMMs are widely used for tasks like speech recognition, part-of-speech tagging, and gene prediction

Let us understand using an example,
Suppose Darshan can be either sad or happy according to the weather. We are not living in that town, but we can ask about the mood of Darshan over a phone call. So, Mood is an observed variable, and States are hidden variables to us.



Here the hidden variables (the weather) form the markov chain and not the mood. Since here the mood is dependent on today’s weather and not yesterdays. However, there is a dependency on the weather today being dependent on the weather yesterday.
Therefore, Hidden Markov Models = Hidden Markov Chain + Observed Variables

The probability of the weather today given yesterday's weather is put in a matrix labeled the transitional matrix.

The probabilities of the observed states (the mood) depending on the current weather is put in the matrix called the emission matrix.



Now let’s see some problems where HMM can be used to solve them.

What is the probability of “Happy, Happy Sad” occurring provided that the weather on respective days was “Sunny Cloudy Sunny”?
Here we make use of the transitional matrix, emission matrix and the new stationary matrix.

Stationary matrix is obtained by the equation : πA = π

Here in this case π = [0.218, 0.273, 0.509]

The need for a stationary matrix is that in Markov models, we need the previous state information to predict further, however we do not know the weather before the first day. Hence, the stationary matrix helps us substitute the probability.



Let us consider another example,
If the states are hidden then what is the most likely weather sequence for a particular given mood sequence?

The issue here is that there can be many permutations of the weather that could be possible.


Now we need to find the most probable sequence that is possible to happen, for that we need to find the probability for all of the possible sequences

This process is however very extensive and instead we have two methods to easily solve it.
a. Forward Algorithm
b. Viterbi Algorithm

Forward Algorithm


Consider the same situation as before but instead of three weather conditions we only have two.
If Sad is Y0 and Happy is Y1.
What is the probability that Darshan have following mood sequence : (Sad, Sad, Happy)
There can be 8 possible sequences of weather that could exist.

To reduce the time complexity of previous expressions we use the Forward Algorithm.
Some calculations are repeatedly done in previous probability expressions.
So, we will try to store the results of these repeated expressions.
Here’s a video that explains how to solve this problem using the algorithm. https://www.youtube.com/watch?v=9-sPm4CfcD0

Advantages of HMM


- The underlying theoretical basis is much more sound, elegant and easy to understand.
- It is easier to implement and analyze.
- HMM taggers are very simple to train (just need to compile counts from the training corpus).
- Performs relatively well (over 90% performance on named entities).
- Statisticians are comfortable with the theoretical base of HMM.
- Liberty to manipulate the training and verification processes.
- Mathematical / theoretical analysis of the results and processes.
- Incorporates prior knowledge into the architecture with good design.
- Initialize the model close to something believed to be correct.
- It eliminates label bias problem
- It has also been proved effective for a number of other tasks, such as speech recognition, handwriting recognition and sign language recognition.
- Because each HMM uses only positive data, they scale well; since new words can be added without affecting learnt HMMs.

Disadvantages of HMM


- The underlying theoretical basis is much more sound, elegant and easy to understand.
- It is easier to implement and analyze.
- HMM taggers are very simple to train (just need to compile counts from the training corpus).
- Performs relatively well (over 90% performance on named entities).
- Statisticians are comfortable with the theoretical base of HMM.
- Liberty to manipulate the training and verification processes.
- Mathematical / theoretical analysis of the results and processes.
- Incorporates prior knowledge into the architecture with good design.
- Initialize the model close to something believed to be correct.
- It eliminates label bias problem
- It has also been proved effective for a number of other tasks, such as speech recognition, handwriting recognition and sign language recognition.
- Because each HMM uses only positive data, they scale well; since new words can be added without affecting learnt HMMs.