Skip to content

Commit

Permalink
Merge branch 'main' into danieljohnson/doc_changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Vedant1 authored Dec 15, 2024
2 parents db46520 + b60599b commit 6f84d29
Show file tree
Hide file tree
Showing 95 changed files with 7,776 additions and 912 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/test_file_reader.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: file_reader test

on:
push:
branches:
- main
pull_request:
branches:
- main



jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install .
pip install graphviz
- name: Test reader
run: |
pip install pytest
pytest dsi/plugins/tests/test_file_reader.py
36 changes: 36 additions & 0 deletions .github/workflows/test_file_writer.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: file_writer test

on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m pip install opencv-python
pip install .
pip install graphviz
sudo apt-get install graphviz
- name: Test reader
run: |
pip install pytest
pytest dsi/plugins/tests/test_file_writer.py
34 changes: 34 additions & 0 deletions .github/workflows/test_sqlite.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: sqlite.py test

on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install .
pip install ipykernel
- name: Test reader
run: |
pip install pytest
pytest dsi/backends/tests/test_sqlite.py
15 changes: 14 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ __pycache__/
# Distribution / packaging
.Python
build/
dist/
*.dist/
lib/
lib64/
parts/
Expand All @@ -23,6 +25,10 @@ wheels/
share/python-wheels/
MANIFEST

# Local environment
pcenv/
dsienv/

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand Down Expand Up @@ -50,13 +56,20 @@ cover/

# Testing artifacts
*query.csv
dsi_parquet_driver_output.ipynb
dsi_parquet_backend_output.ipynb
*.ipynb_checkpoints
parquet.data

# Intermediate data types
# Unless manually specified in a git add
*.csv
*.cdb

# Force coverage svg include
!coverage.svg
# Misc
.vscode
docs/_build
dsi.egg-info/
*.egg-info/
.DS_Store
46 changes: 0 additions & 46 deletions .vscode/launch.json

This file was deleted.

3 changes: 0 additions & 3 deletions .vscode/settings.json

This file was deleted.

34 changes: 22 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,59 @@
DSI
=============

.. image:: coverage.svg
:target: https://lanl.github.io/dsi/htmlcov/index.html

The goal of the Data Science Infrastructure Project (DSI) is to provide a flexible, AI-ready metadata query capability which returns data subject to strict, POSIX-enforced file security. The data lifecycle for AI/ML requires seamless transitions from data-intensive/AI/ML research activity to long-term archiving and shared data repositories. DSI enables flexible, data-intensive scientific workflows that meet researcher needs.

DSI is implemented in three parts:

* Plugins
* Drivers
* Plugins (Readers and Writers)
* Backends
* Core middleware

Plugins curate metadata for query and data return. Plugins can have producer or consumer funcitonality. Plugins acting as consumers harvest data from files and streams. Plugins acting as producers execute containerized or baremetal applications to supplement queriable metadata and data. Plugins may be user contributed and a default set of plugins is available with usage examples in our `Core documentation <https://lanl.github.io/dsi/core.html>`_.
Plugins curate metadata for query and data return. Plugins can have read or write funcitonality acting as Readers and Writers for DSI. Plugins acting as readers harvest data from files and streams. Plugins acting as writers execute containerized or baremetal applications to supplement queriable metadata and data. Plugins may be user contributed and a default set of plugins is available with usage examples in our `Core documentation <https://lanl.github.io/dsi/core.html>`_.

Drivers are interfaces for the Core middleware. Drivers can have front-end or back-end functionalities. Driver front-ends are the interface between a DSI user and the Core middleware. Driver back-ends are the interface between the Core Middleware and a data store. Drivers may be user contributed and a default set of drivers is available with usage examples in our `Core documentation <https://lanl.github.io/dsi/core.html>`_.
Backends are interfaces for the Core middleware. Backends consist mostly of back-end/storage functionalities and are the interface between the Core Middleware and a data store. Backends may also have some front-end functionality interfacing between a DSI user and the Core middleware. Backends may be user contributed and a default set of backends are available with usage examples in our `Core documentation <https://lanl.github.io/dsi/core.html>`_.

DSI Core middleware provides the user/machine interface. The Core middleware defines a Terminal object. An instantiated Core Terminal can load zero or more plugins and drivers. A Terminal object can be used in scripting workflows and program loops.
DSI Core middleware provides the user/machine interface. The Core middleware defines a Terminal object. An instantiated Core Terminal can load zero or more plugins and backends. A Terminal object can be used in scripting workflows and program loops.

=====================
DSI Core Requirements
=====================
* python3 (3.11 tested)
* Linux OS (RHEL- and Debian-based distributions tested)
* Cross-platform (Unix / macOS / Windows)
* Git
* Plugins and Drivers introduce further requirements
* Plugins and Backends introduce further requirements

===============
Getting Started
===============

DSI does not yet have a versioned release and should be considered pre-alpha. Project contributors are encouraged to prototype solutions which do not contain sensitive data at this time. Consequently a PyPA release is planned but incomplete. It is possible to install DSI locally instead.
DSI has several versioned releases and cloning from 'main' can be considered as alpha-versions. Project contributors are encouraged to prototype solutions which do not contain sensitive data at this time. It is possible to install DSI locally instead with the following.

We recommend Miniconda3 for managing virtual environments for DSI::

. ~/miniconda3/bin/activate
conda create -n dsi python=3.11
conda activate dsi

Python virtual environments can also be used for DSI::

python3 -m venv dsienv
source dsienv/bin/activate
pip install --upgrade pip

After activating your environment::

git clone https://github.com/lanl/dsi.git
cd dsi/
python -m pip install .
python3 -m pip install .
=====================
Release Versions
=====================

Install release versions of DSI can be found in (https://pypi.org/project/dsi-workflow/), to install the latest try the following::

python3 -m pip install dsi-workflow

=====================
Copyright and License
Expand Down
21 changes: 0 additions & 21 deletions coverage.svg

This file was deleted.

19 changes: 10 additions & 9 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,16 @@ all: html
# Build HTML
html:
python ./plugins/generate_plugin_class_hierarchy.py ../dsi/plugins/
python ./drivers/generate_driver_class_hierarchy.py ../dsi/drivers/
python ./backends/generate_backend_class_hierarchy.py ../dsi/backends/
mv PluginClassHierarchy.gv.png PluginClassHierarchy.png
mv DriverClassHierarchy.gv.png DriverClassHierarchy.png
mv BackendClassHierarchy.gv.png BackendClassHierarchy.png
rm -rf $(BUILDDIR)/html
$(SPHINXBUILD) -b html . $(BUILDDIR)/html
rm -rf $(BUILDDIR)/html/htmlcov
pytest --cov --cov-report=html ../
mv ./htmlcov $(BUILDDIR)/html
# pytest --cov --cov-report=html ../
# mv ./htmlcov $(BUILDDIR)/html
rm -f ../coverage.svg
coverage-badge -o ../coverage.svg
git add ../coverage.svg
# coverage-badge -o ../coverage.svg
# git add ../coverage.svg
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

Expand All @@ -29,14 +29,15 @@ html:
# github for some reason runs jekyll automatically on gh-pages
# files, but we don't want that. 'touch .nojekyll' takes care
# of it.
# git add --force ./htmlcov && \
gh-pages: _build/html
root="$$(git rev-parse --show-toplevel)" && \
cd _build/html && \
rm -rf .git && \
touch .nojekyll && \
git init && \
git add . && \
git add --force ./htmlcov && \
git commit -m "DSI Documentation" && \
git push -f $$root master:gh-pages && \
rm -rf .git
Expand All @@ -60,4 +61,4 @@ help:
@echo " publish upload documentation to github"

clean:
-rm -rf $(BUILDDIR)/* *ClassHierarchy.gv *Hierarchy.png
-rm -rf $(BUILDDIR)/* *ClassHierarchy.gv *Hierarchy.png
Loading

0 comments on commit 6f84d29

Please sign in to comment.