Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Units tests changes #134

Merged
merged 34 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
8c74f57
updated plugins and sqlite backend with units stored as dict and rais…
Dec 15, 2024
ac0382c
core raises error if mismatched units for same column and updated bac…
Dec 15, 2024
82e8948
updated documentation and added new github CI files with updated DSI …
Dec 15, 2024
0eb9f32
added h5 and metadata readers, included systemkernel tests in test_env
Dec 18, 2024
101b4f2
added import json in test_env.py
Vedant1 Dec 18, 2024
b1094ae
uncommented a line in one test in test_env.py
Vedant1 Dec 18, 2024
5f534a9
added h5py to requirements.txt
Vedant1 Dec 18, 2024
79f4e7f
changed output_collector key name
Vedant1 Dec 18, 2024
5c2c213
commented out a systemkernel test function with error
Vedant1 Dec 18, 2024
3f8c96c
commented out another systemkernel test with error
Vedant1 Dec 18, 2024
5c12f22
specified interactive jupyter notebook FALSE
Vedant1 Dec 18, 2024
36b3786
updated inspect_artifacts call in test_sqlite.py
Vedant1 Dec 18, 2024
1b71c08
created CI file for sqlalchemy test
Vedant1 Dec 18, 2024
dc108bb
removed unused imports in sqlalchemy.py
Vedant1 Dec 18, 2024
f613c63
updated all CI files with new requirements.extra.txt file
Vedant1 Dec 18, 2024
2c5150f
requirements.txt just has base imports. extras has large imports whic…
Vedant1 Dec 18, 2024
1e640b7
removed h5 reader as it is not needed
Vedant1 Dec 18, 2024
9769783
add initial reader docs
Vedant1 Dec 19, 2024
faf7409
add type checking documentation stub for plugins
Vedant1 Dec 19, 2024
ee74c66
refine reader docs
Aug 2, 2023
68ab7ca
formatting changes
Aug 2, 2023
86c1a53
added loading, contributing, etc.
Aug 8, 2023
b67ac13
Update readers documentation with alternate add_rows()
Vedant1 Dec 15, 2024
593dfa8
added exception for pragma
jpulidojr Dec 12, 2024
5541771
wildfire core terminal 'get' example
jpulidojr Dec 12, 2024
11b24fe
Additional sqlalchemy tests to match the sqlite tests
hugegreenbug Dec 5, 2024
7257c7e
added wildfire query example to return table with col names
jpulidojr Dec 17, 2024
da92de6
incrementing version for follow-up publish
jpulidojr Dec 17, 2024
d7dae73
inlining git dependency
jpulidojr Dec 17, 2024
ad0a7a0
inline git import
jpulidojr Dec 17, 2024
7558476
Merge branch 'main' into units_tests_changes
Vedant1 Dec 19, 2024
bac716e
deleted old example data files
Vedant1 Jan 13, 2025
3d1e25f
updated metadata plugin reader
Vedant1 Jan 13, 2025
9e86f0e
updated coreterminal to comment out metadata reader test
Vedant1 Jan 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/test_core.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: core.py test

on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.extras.txt
pip install .
- name: Test reader
run: |
pip install pytest
pytest dsi/tests/test_core.py
34 changes: 34 additions & 0 deletions .github/workflows/test_env.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: env.py test

on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.extras.txt
pip install .
- name: Test reader
run: |
pip install pytest
pytest dsi/plugins/tests/test_env.py
3 changes: 1 addition & 2 deletions .github/workflows/test_file_reader.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ on:
- main



jobs:
linux:
runs-on: ubuntu-latest
Expand All @@ -27,8 +26,8 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.extras.txt
pip install .
pip install graphviz
- name: Test reader
run: |
pip install pytest
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/test_file_writer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,8 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m pip install opencv-python
pip install -r requirements.extras.txt
pip install .
pip install graphviz
sudo apt-get install graphviz
- name: Test reader
run: |
Expand Down
34 changes: 34 additions & 0 deletions .github/workflows/test_sqlalchemy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: sqlalchemy.py test

on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
linux:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.11']

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.extras.txt
pip install .
- name: Test reader
run: |
pip install pytest
pytest dsi/backends/tests/test_sqlalchemy.py
2 changes: 1 addition & 1 deletion .github/workflows/test_sqlite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.extras.txt
pip install .
pip install ipykernel
- name: Test reader
run: |
pip install pytest
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The Data Science Infrastructure Project (DSI)

introduction
installation
contributing_readers
plugins
backends
core
Expand Down
2 changes: 1 addition & 1 deletion docs/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@



The goal of the Data Science Infrastructure Project (DSI) is to manage data through metadata capture and curation. DSI capabilities can be used to develop workflows to support management of simulation data, AI/ML approaches, ensemble data, and other sources of data typically found in scientific computing. DSI infrastructure is designed to be flexible and with these considerations in mind:
The goal of the Data Science Infrastructure Project (DSI) is to manage data through metadata capture and curation. DSI capabilities can be used to develop workflows to support management of simulation data, AI/ML approaches, ensemble data, and other sources of data typically found in scientific computing. DSI infrastructure is designed to be flexible and with these considerations in mind:

- Data management is subject to strict, POSIX-enforced, file security.
- DSI capabilities support a wide range of common metadata queries.
Expand Down
5 changes: 0 additions & 5 deletions dsi/backends/sqlalchemy.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@
from sqlalchemy.orm import relationship
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
import csv
import json
import re
import yaml
import toml

from dsi.backends.filesystem import Filesystem

Expand Down
43 changes: 25 additions & 18 deletions dsi/backends/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,13 +170,18 @@ def put_artifacts(self, collection, isVerbose=False):
self.cur.execute(create_query)
for tableName, tableData in artifacts["dsi_units"].items():
if len(tableData) > 0:
for col_unit_pair in tableData:
str_query = f'INSERT OR IGNORE INTO dsi_units VALUES ("{tableName}", "{col_unit_pair[0]}", "{col_unit_pair[1]}")'
try:
self.cur.execute(str_query)
except sqlite3.Error as e:
for col, unit in tableData.items():
str_query = f'INSERT INTO dsi_units VALUES ("{tableName}", "{col}", "{unit}")'
unit_result = self.cur.execute(f"SELECT unit FROM dsi_units WHERE column = '{col}';").fetchone()
if unit_result and unit_result[0] != unit:
self.con.rollback()
return e
return f"Cannot ingest different units for the column {col} in {tableName}"
elif not unit_result:
try:
self.cur.execute(str_query)
except sqlite3.Error as e:
self.con.rollback()
return e

try:
self.con.commit()
Expand Down Expand Up @@ -218,10 +223,11 @@ def get_artifacts(self, query, isVerbose=False, dict_return = False):
else:
return data

def inspect_artifacts(self, collection, interactive=False):
def inspect_artifacts(self, interactive=False):
import nbconvert as nbc
import nbformat as nbf
dsi_relations, dsi_units = None, None
collection = self.read_to_artifact(only_units_relations=True)
if "dsi_relations" in collection.keys():
dsi_relations = dict(collection["dsi_relations"])
if "dsi_units" in collection.keys():
Expand Down Expand Up @@ -319,7 +325,7 @@ def inspect_artifacts(self, collection, interactive=False):
fh.write(html_content)

# SQLITE READER FUNCTION
def read_to_artifact(self):
def read_to_artifact(self, only_units_relations = False):
artifact = OrderedDict()
artifact["dsi_relations"] = OrderedDict([("primary_key",[]), ("foreign_key", [])])

Expand All @@ -340,14 +346,15 @@ def read_to_artifact(self):
if colInfo[5] == 1:
pkList.append((tableName, colInfo[1]))

data = self.cur.execute(f"SELECT * FROM {tableName};").fetchall()
for row in data:
for colName, val in zip(colDict.keys(), row):
if val == "NULL":
colDict[colName].append(None)
else:
colDict[colName].append(val)
artifact[tableName] = colDict
if only_units_relations == False:
data = self.cur.execute(f"SELECT * FROM {tableName};").fetchall()
for row in data:
for colName, val in zip(colDict.keys(), row):
if val == "NULL":
colDict[colName].append(None)
else:
colDict[colName].append(val)
artifact[tableName] = colDict

fkData = self.cur.execute(f"PRAGMA foreign_key_list({tableName});").fetchall()
for row in fkData:
Expand All @@ -372,8 +379,8 @@ def read_units_helper(self):
for row in unitsTable:
tableName = row[0]
if tableName not in unitsDict.keys():
unitsDict[tableName] = []
unitsDict[tableName].append((row[1], row[2]))
unitsDict[tableName] = {}
unitsDict[tableName][row[1]] = row[2]
return unitsDict

# Closes connection to server
Expand Down
2 changes: 1 addition & 1 deletion dsi/backends/tests/test_sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def test_artifact_inspect():
os.remove(dbpath)
store = Sqlite(dbpath, run_table=False)
store.put_artifacts(valid_middleware_datastructure)
store.inspect_artifacts(valid_middleware_datastructure)
store.inspect_artifacts()
assert True

def test_artifact_read():
Expand Down
46 changes: 16 additions & 30 deletions dsi/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class Terminal():
BACKEND_IMPLEMENTATIONS = ['gufi', 'sqlite', 'parquet']
PLUGIN_PREFIX = ['dsi.plugins']
PLUGIN_IMPLEMENTATIONS = ['env', 'file_reader', 'file_writer']
VALID_PLUGINS = ['Hostname', 'SystemKernel', 'GitInfo', 'Bueno', 'Csv', 'ER_Diagram', 'YAML1', 'TOML1', "Table_Plot", "Schema", "Csv_Writer"]
VALID_PLUGINS = ['Hostname', 'SystemKernel', 'GitInfo', 'Bueno', 'Csv', 'ER_Diagram', 'YAML1', 'TOML1', "Table_Plot", "Schema", "Csv_Writer", "MetadataReader1"]
VALID_BACKENDS = ['Gufi', 'Sqlite', 'Parquet']
VALID_MODULES = VALID_PLUGINS + VALID_BACKENDS
VALID_MODULE_FUNCTIONS = {'plugin': ['reader', 'writer'],
Expand Down Expand Up @@ -151,10 +151,14 @@ def load_module(self, mod_type, mod_name, mod_function, **kwargs):
for colName, colData in table_metadata.items():
if colName in self.active_metadata[table_name].keys() and table_name != "dsi_units":
self.active_metadata[table_name][colName] += colData
elif colName not in self.active_metadata[table_name].keys():# and table_name == "dsi_units":
elif colName in self.active_metadata[table_name].keys() and table_name == "dsi_units":
for key, col_unit in colData.items():
if key not in self.active_metadata[table_name][colName]:
self.active_metadata[table_name][colName][key] = col_unit
elif key in self.active_metadata[table_name][colName] and self.active_metadata[table_name][colName][key] != col_unit:
raise ValueError(f"Cannot have a different set of units for column {key} in {colName}")
elif colName not in self.active_metadata[table_name].keys():
self.active_metadata[table_name][colName] = colData
# elif colName not in self.active_metadata[table_name].keys() and table_name != "dsi_units":
# raise ValueError(f"Mismatched column input for table {table_name}")
elif mod_type == "backend":
if "run_table" in class_.__init__.__code__.co_varnames:
kwargs['run_table'] = self.runTable
Expand Down Expand Up @@ -207,7 +211,6 @@ def add_external_python_module(self, mod_type, mod_name, mod_path):

term = Terminal()
term.add_external_python_module('plugin', 'my_python_file',

'/the/path/to/my_python_file.py')

term.load_module('plugin', 'MyPlugin', 'reader')
Expand Down Expand Up @@ -270,7 +273,8 @@ def artifact_handler(self, interaction_type, query = None, **kwargs):

if interaction_type in ['put', 'set'] and module_type == 'back-write':
if self.backup_db_flag == True and os.path.getsize(obj.filename) > 100:
backup_file = obj.filename[:obj.filename.rfind('.')] + "_backup" + obj.filename[obj.filename.rfind('.'):]
formatted_datetime = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
backup_file = obj.filename[:obj.filename.rfind('.')] + "_backup_" + formatted_datetime + obj.filename[obj.filename.rfind('.'):]
shutil.copyfile(obj.filename, backup_file)
errorMessage = obj.put_artifacts(collection = self.active_metadata, **kwargs)
if errorMessage is not None:
Expand All @@ -284,28 +288,20 @@ def artifact_handler(self, interaction_type, query = None, **kwargs):
self.logger.info(f"Query to get data: {query}")
kwargs['query'] = query
get_artifact_data = obj.get_artifacts(**kwargs)
# else:
# #raise ValueError("Need to specify a query of the database to return data")
# # This is a valid use-case, may give a warning for now
# get_artifact_data = obj.get_artifacts(**kwargs)
operation_success = True

elif interaction_type == 'inspect':
# if module_type == 'back-write':
# errorMessage = obj.put_artifacts(
# collection=self.active_metadata, **kwargs)
# if errorMessage is not None:
# print("Error in ingesting data to db in inspect artifact handler. Generating Jupyter notebook with previous instance of db")
if not self.active_metadata:
raise ValueError("Error in inspect artifact handler: Need to ingest data to DSI abstraction before generating Jupyter notebook")
obj.inspect_artifacts(collection=self.active_metadata, **kwargs)
operation_success = True
if os.path.getsize(obj.filename) > 100:
obj.inspect_artifacts(**kwargs)
operation_success = True
else:
raise ValueError("Error in inspect artifact handler: Need to ingest data into a backend before generating Jupyter notebook")

elif interaction_type == "read" and module_type == 'back-read':
self.active_metadata = obj.read_to_artifact()
operation_success = True
elif interaction_type == "read" and module_type == 'back-write':
raise ValueError("Can only call read to artifact handler with a back-READ backend")
raise ValueError("Can only call read artifact handler with a back-READ backend")

end = datetime.now()
self.logger.info(f"Runtime: {end-start}")
Expand All @@ -332,16 +328,6 @@ def update_abstraction(self, table_name, table_data):
if not isinstance(table_data, OrderedDict):
raise ValueError("table_data needs to be in the form of an Ordered Dictionary")
self.active_metadata[table_name] = table_data

#allow more plugins to be loaded and can call transload again
# self.transload_lock = False

#need to unload all loaded plugins to prevent duplicate reading when transload called again
# mods = self.active_modules
# for obj in mods['reader']:
# self.unload_module('plugin', obj.__class__.__name__, "reader")
# for obj in mods['writer']:
# self.unload_module('plugin', obj.__class__.__name__, "writer")


class Sync():
Expand Down
2 changes: 1 addition & 1 deletion dsi/plugins/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from dsi.plugins.metadata import StructuredMetadata
from dsi.plugins.plugin_models import (
GitInfoModel, HostnameModel, SystemKernelModel
EnvironmentModel, GitInfoModel, HostnameModel, SystemKernelModel, create_dynamic_model
)


Expand Down
Loading
Loading