Ryan Eveloff, Denghui Chen
CλHMML (aka cahmml, pronounced camel) is a lightweight library meant to simplify complex Hidden Markov Models. We provide two abstract classes, Observation
and State
, which when implemented can run seamlessly in a parallelized HMM structure built on NumPy matrices.
During our research into multimodal genetic HMMs, we found that the majority of plug and play HMMs available require the input of a single transition matrix
Install cahmml from PyPi using the following command:
pip3 install cahmml
from cahmml import hmm
If necessary, you can also import the utilities for CλHMML via cahmml.util
, though it is unnecessary and generally not useful.
An implementation of hmm.State
requires 2 functions to be completed:
# State class
class MyState(hmm.State):
def emission_probability(self,obs:Iterable[Observation],t:int,hyperparameters:dict = {}) -> np.ndarray:
return P(obs|self,t,hyperparameters)
def transition_probability(self,next:"State",obs:Iterable[Observation],t:int,hyperparameters:dict = {}) -> np.ndarray:
return P(next|self,obs,t,hyperparameters)
An implementation of hmm.Observation
requires nothing to be completed and serves as a modable passthrough class for hmm.State
. You can even use built-in classes like int
or str
! In the case below, we use a simple str
# Observation Class
class myObservation(hmm.Observation):
def __init__(self,value:str):
self.v = value
Pass in a sample_id and an iterable of hmm.Observation
to create a sample.
# Given list[Observation] obs
myFirstSample = hmm.Sample("first sample!",obs)
Assuming you've already implemented hmm.State
and hmm.Observation
, running Viterbi on your HMM with a given input is convenient and fast!
# Given list[hmm.State] states, list[hmm.Sample] samples, and list[float] initial_probs
model = hmm.HMM(states)
pred_states = model.viterbi()
Note: Advanced users can specify hyperparameters for each function via e_hparams
and t_hparams
This code will yield an array corresponding to the Viterbi-predicted state of each sample at each observation.
Filling transition_probability
and emission_probability
. NumPy parallelization allows Viterbi runtime to scale linearly with the number of observations, or
Space complexity has been reduced to
More anecdotally, we expect a run of 100 states, 100 samples, 1,000,000 observations, and constant time
Coverage reports are available in our test branch; for simple HMM testing, we validated output using hmmlearn
by scikitlearn. For complex HMM testing, we used small, hand-reproducible examples.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.