-
Notifications
You must be signed in to change notification settings - Fork 21
Home
Table of Contents generated with DocToc
- TL;DR
- Introducing LIMA, The Libre Multilingual Analyzer, a Natural Language Processing (NLP) toolkit
LIMA python bindings are currently available under Linux only (x86_64).
Under Linux with python >= 3.7 and < 4, and upgraded pip:
# Upgrading pip is fundamental in order to obtain the correct LIMA version
$ pip install --upgrade pip
$ pip install aymara==0.5.0b6
$ lima_models.py -l eng
# Either simply use the lima command to produce an analysis of a file in CoNLLU format:
$ lima <path to the file to analyse>
# Or use the python API:
$ python
>>> import aymara.lima
>>> nlp = aymara.lima.Lima("ud-eng")
>>> doc = nlp('Hello, World!')
>>> print(doc[0].lemma)
hello
>>> print(repr(doc))
1 Hello hello INTJ _ _ 0 root _ Pos=0|Len=5
2 , , PUNCT _ _ 1 punct _ Pos=5|Len=1
3 World World PROPN _ Number:Sing 1 vocative _ Pos=7|Len=5
4 ! ! PUNCT _ _ 1 punct _ Pos=12|Len=1
LIMA is a multilingual linguistic analyzer developed by the CEA LIST, LASTI laboratory (French acronym for Text and Image Semantic Analysis Laboratory). LIMA is Free Software, available under the MIT license.
LIMA has state of the art performance for more than 60 languages thanks to its recent deep learning (neural network) based modules. But it includes also a very powerful rules based mechanism called ModEx allowing to quickly extract information (entities, relations, events…) in new domains where annotated data does not exist.
A commercial version is available, completed with modules useful to some CEA LIST industrial partners. The commercial version is available directly from CEA LIST through R&D partnerships or through other partners with offers including support and adaptation to one's needs.
We welcome external contributions in the form of comments, suggestions, bug reports, bugs corrections, resources, etc. However, let note that before merging your contributions, we will ask you to sign a Copyright Assignment Agreement in order to allow the proper functioning of the dual licensing model.
- performant and powerful C++ backend;
- easy to use native python binding (see TL;DR above);
- easy to use simple GUI;
- tokenization;
- morphologic analysis including:
- full-form dictionaries;
- hyphen-words splitting;
- concatenated words splitting (we're,...);
- idiomatic expression recognizing;
- part of speech tagging (deep-learning based with state of the art performance. Two other taggers are available for some languages: The LIMA legacy one, which is a little bit less performant but very useful for resources development, and a SVMTool++-based one;
- Named Entities Recognition (standard rule-based and neural network-based);
- coreference resolution;
- parsing (neural network-based with state of the art performance and the old surface rule-based dependency parsing);
- semantic analysis (disambiguation and semantic role labeling);
- regression testing;
- evaluation tools.
The easiest way to use LIMA is through its native python binding (see TL;DR above). We provide a Docker container and also packages for several different GNU/Linux versions (as of 05/04/2024, Debian 12 and Ubuntu 22.04, but you must check what is available at the time of your download). There is finally instructions for building from the source code under GNU/Linux:
- Native python binding
- Docker container
- Packages for various Linux distributions
- Packages for MS Windows 64
- Building from source code
LIMA is known to work under macOS, but there is currently no working binary package available. A CircleCI build runs and produce a package but it is does not work.
Thes Microsoft Windows build is currently broken.
- The LIMA User Manual;
- The LIMA Python User Manual to use the LIMA python modules and API (including new libtorch-based models);
- Explanation on the Linguistic Processing Steps in LIMA;
- Explanation on Linguistic Processing Steps Not Included in the AGPL version of LIMA.
LIMA is available under the MIT license. A commercial version exists too.
LIMA uses several open source libraries and linguistic resources. See the COPYING file for details.
For any discussion, please open a GitHub issue.
You can also contact directly [the LIMA maintainer](mailto:gael DOT de-chalendar AT cea DOT fr)
<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create', 'UA-48448560-1', 'github.com');ga('send', 'pageview');</script>Table of Contents generated with DocToc