Releases: deepset-ai/FARM
Releases · deepset-ai/FARM
Add multiprocessing to Datahandler, unify Processors
Besides fixing various smaller bugs, we focussed in this release on two major changes:
1. Speeding things up 🚀 :
- By adding multiprocessing to the data preprocessing, we reduced the execution time for many tasks from hours to minutes. Since the functionality is mostly hidden in the parent class, the user doesn't have to implement anything on his own. However, this required changing the interface of the processor slightly.
_dict_to_samples
and_sample_to_features
must now beclassmethods
and all objects accessed by them must beclass attributes
. - Multi-GPU support is now also available for the "building blocks mode"
2. Making the processor more user friendly 😊 :
- Instead of having one individual processor per dataset, we have implemented a more generic
TextClassificationProcessor
that you can instantiate easily for various predefined tasks (GNAD, GermEval ...) or your own dataset in CSV/TSV format
processor = TextClassificationProcessor(tokenizer=tokenizer,
max_seq_len=128,
data_dir="../data/germeval18",
columns=["text", "label", "unused"],
label_list=["OTHER", "OFFENSE"],
metrics=["f1_macro"]
)
Thanks for contributing @brandenchan @tanaysoni @tholor @Timoeller @tripl3a @Seb0 @waldemarhahn !
Modeling:
- [bug] Accuracy metric in LM finetuning always zero #30
- [enhancement] Multi-GPU only enabled in experiment mode #57
- [bug] Wrong number of total steps for linear warmup schedule #46
Data Handling:
- [enhancement] Unify redundant
Processor
; add newNERProcessor
andTextClassificationProcessor
- [enhancement] Add parallel dataprocessing #45
- [bug]
dev_size
param in run-by-config is being ignored #49 - [bug] output_dir parameter in run by config is being ignored #39
- [bug] Error when running by config with a list of batch sizes #38
Documentation:
- [bug] LM finetuning example missing data #47
- [bug] Colab Notebook referenced in readme does not work #27
Other:
- [enhancement] Proposition: improve dependency management with pipenv #35
Initial Release
First release of FARM package
Contributor list: @brandenchan @tanaysoni @tholor @Timoeller @tripl3a