-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: Create updated dataset-level extractor for BIDS datasets #104
Conversation
you are killing my dream of using glorious datalad-fuse! ;-) and how do you know that those files would not be annexed? |
Codecov Report
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
+ Coverage 85.12% 85.82% +0.69%
==========================================
Files 21 23 +2
Lines 1049 1206 +157
==========================================
+ Hits 893 1035 +142
- Misses 156 171 +15
Continue to review full report at Codecov.
|
good point. I don't know that they won't be annexed. Something that I think is important with the use of metalad is to make a distinction between dataset and file level extractors. With this update focusing on the dataset level, does it perhaps make sense to require a specific set of BIDS-compatible files that would be necessary for extraction of dataset-level metadata (e.g. I realise that there could be any number of files in the BIDS dataset from which to extract file-level metadata but which, if combined/aggregated/derived, could turn into metadata that is interpretable on the dataset-level. But the amount or combination of such files is arbitrary and not easily definable on the dataset-level. And I think even without that information, the currently proposed dataset-level extractor could still extract useful information. |
from datalad.metadata.definitions import vocabulary_id | ||
from datalad.utils import assure_unicode | ||
from typing import Dict, List, Union | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually this should see the impact of python -m isort -m3 -fgw 2 -tc datalad_neuroimaging/extractors/bids_dataset.py
TODO for @jsheunis:
|
The challenge of whether required files are annexed or not is addressed by correctly specifying the |
Ok, this PR has been open for a long time and I want to merge it. I'm going to finish whatever is remaining and easy to complete, and will then merge unless there is strong disagreement @datalad/developers |
This initial statement is not correct anymore. Any annexed content that is necessary for the extraction process will be fetched before extraction starts via metalad's |
Remaining issue is the failing test, will sort that now. |
Remaining test failure seems to be related to some dataset not containing gen4 metadata (probably outdated testdata?, or could be related to the metalad version bump?), while |
This reverts commit 4d43f0b.
This PR is in response to #94. It adds a dataset-level extractor for BIDS datasets, called
bids_dataset
that:datalad-metalad
(and introduces this dependency,datalad-metalad>=0.3.1
)dataset_description.json
andparticipants.tsv
are available locally before proceeding with the extraction processget
where applicable)pybids>=0.15.1
andBIDS v1.6.0
bids
extractor in any waybids
extractor), including information about subjects, sessions, runs, tasks, entities, and variables.Old PR comment:
This WIP PR intended to address #94 introduces a new extractor
bids_dataset
that:datalad-metalad
functionalityget
ting annexed databids.py
extractorMain changes:
bids_dataset.py
extractortest_bids_dataset.py
testsetup.cfg
setup.cfg