Skip to content

Implementing the FOFE method on character level as a layer in Pytorch and optimizing for best parameter configuration and comparing performance

Notifications You must be signed in to change notification settings

anbestCL/Project_FOFE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f92aba9 · Jul 10, 2019

History

42 Commits
Jan 28, 2019
Jul 10, 2019
Jul 10, 2019
Jul 10, 2019
Jul 10, 2019

Repository files navigation

FOFE Character Encoding

Project

This project's first aim is to implement a neural layer in Pytorch which performs the FOFE method on character level described in Zhang et al. (2015) to embed to the words. This layer is then passed to a bidirectional GRU architecture. In a second step the new FOFE layer is compared to a classical, randomly initialised embedding layer. The two architectures are tested on the English ATIS dataset and on parts of the German Tiger Corpus.

Repository

The source folder includes python and bash scripts designed for the different configurations. There is a main tagger program which uses the FOFE_ or the Classic depending on the model to be trained. Data preparation for both corpora is done in advance. To test different parameter configurations there is a wrapper class for the tagger module which can be used for hyper paramter optimisation.

Implementation

Settings of Neural Network

  • size of embedding layer = 50 (only for classic model)
  • drop-out rate = 0.5
  • size of hidden layer in GRU = 50
  • optimiser = Adam with default learning rate (lr = 0.001) and no weight decay
  • loss function = cross entropy loss

Results

Data set train loss dev loss test loss accuracy weighted F1 forgetting factor
Atis/Tiger FOFE Classic FOFE Classic FOFE Classic FOFE Classic FOFE Classic
Atis 0.28 0.04 0.34 0.08 0.47 0.19 0.91 0.98 0.48 0.74
Tiger 0.94 0.11 0.92 0.38 0.99 0.49 0.71 0.91 0.5 0.78

More details including visualisations can be found in the written report.

Conclusion

From the results obtained from two different data sets, using the FOFE method as an alternative embedding layer for tagging tasks does not lead to an increase in performance. It might be that different parameter settings produce better results. This could be efficiently tested using hyper parameter optimisation.

Future work

  • Rerun hyperparameter optimisation on all four configurations
  • Implement early stopping (at least for Tiger corpus)

About

Implementing the FOFE method on character level as a layer in Pytorch and optimizing for best parameter configuration and comparing performance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published