Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uuui class #568

Draft
wants to merge 76 commits into
base: main
Choose a base branch
from
Draft

Uuui class #568

wants to merge 76 commits into from

Conversation

ccl-core
Copy link
Contributor

No description provided.

ccl-core and others added 30 commits February 20, 2024 11:01
- **Readability improvement**: we remove the function `last_operations`.

- **Correctness**: the operation graph has less bugs (see u.a.
"huggingface-c4" or "coco204-mini"):
<img width="1160" alt="image"
src="https://github.com/mlcommons/croissant/assets/17081356/5ef1ec77-ee02-47df-ada6-95bde116d2c2">

- **Performance improvement**: we remove a Dijkstra+for-loop (`O(n^2)`)
in profit of a hashmap storing the last operations (`O(1)`).

Example of dataset that used to timeout and is now usable:

```python
import mlcroissant as mlc
mlc.Dataset("https://datasets-server.huggingface.co/croissant?dataset=gcaillaut/citeseer")
```

More than 700 Hugging Face datasets were similarly impacted (see the
[announcement](https://groups.google.com/a/mlcommons.org/g/croissant/c/EpbC0wkuF6g)).

Fixes: #310 and
#525.
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@marcenacp
Copy link
Contributor

@ccl-core Can you please rebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants