-
Notifications
You must be signed in to change notification settings - Fork 4
Version 2 Plans
-
RDF triplestore-based
-
Easy to customize
-
Optionally create BELFramework 3.0 Namespace, Annotation and Equivalence files
-
Provide a changelog
-
Allow use of close matches for equivalencing using synonyms (matching optionally restricted to domain/datasets)
-
Tests!!! specifically data access/format/parsing tests using a test framework
-
Monitoring of generated datasets via statistic comparisons to detect pipeline processing issues
-
Data Parsers will be distributable components that can be added/removed/enabled/disabled
-
Data Parsers will incorporate following components
-
configuration
-
allow registry of remote code for parsers or for pre-processed RDF
-
allow enabling/disabling parsers (via tags?)
-
make it easy to test new additional datasets (e.g. run single parser against pre-generated data)
-
-
optional: data freshness check (has data changed since last run?)
-
data access and localization
-
data parsing into RDF
-
logging
-
optional: data statistics generation
-
optional: tests
-
is data accessible?
-
has data format changed?
-
are current results significantly smaller than last set of results?
-
-
-
Pipeline framework will provide:
-
General configuration
-
Ability to pick up new Data Parsers and run them
-
Will provide location to save original dataset downloads
-
Will provide location to save generated RDF datasets
-
Will load RDF datasets into triplestore
-
Will run RDF enhancements
-
Add transitive closure to exact matches (and optionally close matches that are synonym-based)
-
Will add identifiers.org URI’s to BEL Entities where possible
-
-
Will create changelog from comparison of current and previous Resources in triplestore (via Named Graph comparisons)
-
Export from triplestore into BEL Namespace, Annotation and Equivalence resource files (optional)
-