Welcome to the Text Data Labeling and Evaluation project! This project utilizes a Language Model (LLM) to label text data for various entities based on a predefined contract dataset. The entities, including document name, party name, governing law, agreement date, effective date, and expiration date, can be customized and configured in a settings file. The labeling process involves engineering prompts for the LLM, which outputs are then parsed to extract information for the specified entities. The labeled data is further processed to meet a specific format for extensive evaluation using defined Key Performance Indicators (KPIs).
- Usage: Labels text data based on predefined entities.
- Configuration: Entities are defined in a config file for customization.
- Prompt Engineering: Crafted prompts to instruct the LLM for entity labeling.
- Role: Extracts labeled information for the defined entities from the LLM output.
- Purpose: Ensures the labeled data conforms to a specific format for evaluation.
- Metrics: Utilizes extensive Key Performance Indicators (KPIs) for evaluation.
- Similarity Metric: Employs TF-IDF vectorization and embedding for similarity evaluation.