Note: I gathered all literature review in an Excel sheet. Will share it here once the project is near completion.
- text extraction could not be solved by a single type of algorithm alone.
- created an entirely new classification system to segregate the resumes into different types, based on their template, and tackle each type differently.
- most of them (like the ones that contain tables, partitions, etc)
- For such complex types, we decided to use Optical Character Recognition (OCR) along with some Deep NLP algorithms on top, to extract text.
- the hard way would be to build a deep learning model from scratch for OCR and NLP, and the smart way was to use the power of open source and deploy an off the shelf model for the task.
- we are able to extract text accurately from about 98% of simple resumes and 90% of the complex ones.
- Corpus not updated (new companies everyday)
- Different meanings of same word
- Apply NER to deep learning model (BIOES tagging)
- model architecture-LSTMs (it takes into account the context of a word in a statement)
- curate dataset for model training and evaluation (started unlabelled training data & searched for tools online for manual annotation efforts within the team).
- hired some people to do the data labelling and started the training process
- model had crossed our benchmark of 80.0
Article URL: https://medium.com/skillate/smart-recruitment-cracking-resume-parsing-through-deep-learning-part-ii-563ff2dc800b