Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design nlp submodule and skeleton of drift detection methods #152

Open
Anmol-Srivastava opened this issue Aug 3, 2023 · 5 comments · May be fixed by #163
Open

Design nlp submodule and skeleton of drift detection methods #152

Anmol-Srivastava opened this issue Aug 3, 2023 · 5 comments · May be fixed by #163
Assignees
Labels
enhancement New feature or request nlp Related to development of NLP capabilities

Comments

@Anmol-Srivastava
Copy link
Contributor

Task

NLP-based drift detection algorithms do not always fit into data-drift or concept-drift definitions, so a separate submodule can be made and a basic skeleton of a language or text-based algorithm can be made.

Impact

This makes implementing specific algorithms later on easier.

@Anmol-Srivastava Anmol-Srivastava added this to the Initial NLP Support milestone Aug 3, 2023
@Anmol-Srivastava Anmol-Srivastava self-assigned this Aug 3, 2023
@Anmol-Srivastava Anmol-Srivastava added the enhancement New feature or request label Aug 3, 2023
@Anmol-Srivastava
Copy link
Contributor Author

Worth thinking about returning to the pipeline idea:

class NLPMethod():
    def run():
        self = pipe(self, *self.operators)

n = NLPMethod(operators=[sklearn.some_preprocessor, transformers.some_transformer, some_encoder, some_evaluator])

@Anmol-Srivastava Anmol-Srivastava added the nlp Related to development of NLP capabilities label Aug 3, 2023
@Anmol-Srivastava
Copy link
Contributor Author

Also worth exploring multi-threading / HPC / GPU compatibility here. If adopting a pipeline approach, we may have several operators applied to the same data at a given stage, which is a good opportunity to demonstrate potential performance enhancements. We can use MD3 as a starting point

@anmol-srivastava-mitre
Copy link
Contributor

Also worth looking at iterators

@anmol-srivastava-mitre
Copy link
Contributor

  • below step ~ next(iter)
class FreeDetector():
     def step(inputs):
         data = pipe(*self.data, some_operators)
         state = # ... pipeline of operators e.g. divergence metrics ...
         self.state = state
 
    def run():
        while data:
            self.step()

@anmol-srivastava-mitre
Copy link
Contributor

The above can help simplify a joint interface for batch vs. stream data, and can be made relevant for NLP and other methods

@Anmol-Srivastava Anmol-Srivastava linked a pull request Oct 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request nlp Related to development of NLP capabilities
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants