Skip to content

Commit

Permalink
add warning in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
kjappelbaum committed Aug 25, 2024
1 parent cbe5fed commit be44bf4
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ MatText provides pipelines for seamless pretraining([`pretrain`](api.md#mattext.

### Pretraining on Single MatText Representation

!!! warning Deduplication
???+ warning Deduplication

The pretraining datasets we provide in MatText are only deduplicated based on the CIF string. That means that structures with slightly translated positions (e.g. conformers) might ocurr multiple times in the training set.
The pretraining datasets we provide in MatText are only deduplicated based on the CIF string. That means that structures with slightly translated positions (e.g. conformers) might occur multiple times in the training set.

Depending on the use case, this can lead to problems including data leakage. Hence, you might need to use, for example, one of the other representations for deduplications.

Expand Down

0 comments on commit be44bf4

Please sign in to comment.