Hidden Markov Model Part-of-Speech Tagger

This repository contains the framework for a Hidden Markov Model (HMM) implementation of an English part-of-speech tagger. The training data is a segment of the Wall Street Journal corpus.

Extracting probabilities

Run the following command to extract bigram emission and transition probabilities:

cat training_data | create_2gram_hmm.sh output_2gram_hmm

Run the following command to extract trigram emission and transition probabilities:

cat training_data | create_3gram_hmm.sh output_3gram_hmm l1 l2 l3 unk_prob_file

Sample training data provided in this repository is toy_input under examples/toy.
The smoothing technique used is linear interpolation, where l1, l2, l3 are lambda values.
Probabilities in unk_prob_file are used to account for unknown words. Sample unk_prob_file provided in this repository is toy_unk under examples/toy.

Train and test sets

The data used to train the tagger is wsj_sec0.word_pos
The unk_prob_file used during training is unk_prob_sec22
The test set used to evaluate the tagger is wsj_sec22.word_pos

All of these files can be found under examples.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
MilestoneA.py		MilestoneA.py
MilestoneB		MilestoneB
README.md		README.md
check_hmm.py		check_hmm.py
check_hmm.sh		check_hmm.sh
conv_format.py		conv_format.py
conv_format.sh		conv_format.sh
create_2gram_hmm.py		create_2gram_hmm.py
create_2gram_hmm.sh		create_2gram_hmm.sh
create_3gram_hmm.py		create_3gram_hmm.py
create_3gram_hmm.sh		create_3gram_hmm.sh
viterbi.py		viterbi.py
viterbi.sh		viterbi.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hidden Markov Model Part-of-Speech Tagger

Extracting probabilities

Train and test sets

About

Releases

Packages

Languages

vsoesanto/POSTagger

Folders and files

Latest commit

History

Repository files navigation

Hidden Markov Model Part-of-Speech Tagger

Extracting probabilities

Train and test sets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages