Skip to content

Commit

Permalink
Modifications for bigcode-the-stack dataset (#522)
Browse files Browse the repository at this point in the history
  • Loading branch information
ccl-core authored Feb 15, 2024
1 parent cd64e12 commit 65b67d1
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 8 deletions.
6 changes: 3 additions & 3 deletions .devcontainer/postCreateCommand.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

sudo apt-get update && sudo apt-get install -y libgraphviz-dev git-lfs
sudo apt-get update && sudo apt-get install -y graphviz libgraphviz-dev git-lfs

# Install dependencies except mlcroissant itself
cd ../python/mlcroissant
pip install -e ../python/mlcroissant/.[dev]
cd /workspaces/croissant/python/mlcroissant
pip install -e .[dev]
5 changes: 1 addition & 4 deletions datasets/1.0/bigcode-the-stack/metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,7 @@
"source": {
"fileSet": "parquet-files",
"extract": {
"fileProperty": "fullpath"
},
"transform": {
"regex": "^data\\/(\\d\\w\\+\\-)+\\/train-\\d\\d\\d\\d\\d-of-\\d\\d\\d\\d\\d\\.parquet$"
"column": "lang"
}
}
},
Expand Down
Loading

0 comments on commit 65b67d1

Please sign in to comment.