Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43722 EFD Transform Implementation #18

Open
wants to merge 119 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
57e1b52
Implement first version of EFD transformations
glaubervila May 27, 2024
83eadd3
Update misssing values in the config file
rcboufleur May 27, 2024
07ea890
Fix exception message for unimplemented dialect
rcboufleur May 27, 2024
b7c30cc
Refactor Aggregate class to handle column-wise operations
rcboufleur May 27, 2024
5e587ab
Refactor Aggregate class to Summary for better clarity and consistency
rcboufleur May 27, 2024
7134953
Refactor ATAOS_correctionOffsets_w function to use mean
rcboufleur May 27, 2024
5cc2231
the ExposureEfd table schema was changed to be compatible with config…
glaubervila May 27, 2024
8732968
Added VisitEFD table and fixed empty values
glaubervila May 27, 2024
ead6f10
Upsert transactions now commit every 100 rows. Docstrings were added …
glaubervila Jun 3, 2024
3816d47
Refactor configuration file loading and validation
glaubervila Jun 3, 2024
e1a94ee
Changed lsst_efd_client -> InfluxDB API for topic queries
glaubervila Jun 24, 2024
d65a16f
fixed lint
glaubervila Jun 24, 2024
f512a97
Lint fixes
rcboufleur Jun 24, 2024
7225480
Additional parameters inserted in config yml
rcboufleur Jun 24, 2024
11eba2d
Lint fixes
rcboufleur Jun 24, 2024
6fd11b4
Added Dockerfile for Efd Transform
glaubervila Jun 27, 2024
70be00e
The main command was added to the dockerfile
glaubervila Jun 27, 2024
f663262
Implement retrieval of packed time series from InfluxDB using API que…
rcboufleur Aug 12, 2024
4366644
Lint fixes
rcboufleur Aug 12, 2024
92462c8
Fixed lint
glaubervila Aug 20, 2024
45559a5
Fixed lint
glaubervila Aug 20, 2024
21dc88e
Implements new config file formats and validation
rcboufleur Aug 21, 2024
fb66a51
Implements unique topic querying to avoid duplicate queries
rcboufleur Aug 21, 2024
7c73a43
Changed base image to w_2024_33
glaubervila Aug 21, 2024
880e782
Connection to usdf_efd api now uses environment variables
glaubervila Aug 21, 2024
4c7c5ad
Added copy of sqlite test database
glaubervila Aug 21, 2024
0643536
Config changes refactored into transformations
rcboufleur Aug 26, 2024
613cdce
Config changes refactored into transformations
rcboufleur Aug 26, 2024
9d65166
Removed template envs
glaubervila Aug 26, 2024
c437b71
minor changes in dockerfile
glaubervila Aug 26, 2024
1ce8920
All files related to trasnform_efd have been moved to the same folder…
glaubervila Aug 26, 2024
95e9d04
The Dockerfile has been changed to the new file structure.
glaubervila Aug 26, 2024
bb93200
Processing warnings due to computation errors were accounted for
rcboufleur Aug 26, 2024
4abc0ba
Lint error fix
rcboufleur Aug 26, 2024
ef65871
ESS accelerometer fields were added
rcboufleur Aug 26, 2024
058855c
Segmentation of queries with large number of fields is implemented
rcboufleur Aug 26, 2024
e86bc75
Typo fix in the configuration file
rcboufleur Aug 26, 2024
4db0f53
Start date and end date parameters are now optional with default valu…
glaubervila Sep 30, 2024
7ef80f5
Updated summary functions
rcboufleur Oct 14, 2024
4aead0b
Fix in time format definitions
rcboufleur Oct 14, 2024
7f43ea4
Changed config file path
glaubervila Oct 14, 2024
635a8b5
Test postgresql consdb connection
glaubervila Oct 15, 2024
e30acba
Fixed missing schema name in transform efd insert queries
glaubervila Oct 16, 2024
294d816
Added more logs and test access to consdb tables
glaubervila Oct 16, 2024
f1049fc
Fix sqlalchemy get table from metadata
glaubervila Oct 16, 2024
90005f5
Fix comparisons of time aware indexes
rcboufleur Oct 16, 2024
0bd52d1
Fixed sqlalchemy table from metadata with sqlite
glaubervila Oct 16, 2024
284b0df
Column and topic mapping refactored
rcboufleur Oct 17, 2024
6745196
Fixed lint with pre-commit
glaubervila Oct 18, 2024
33fc5b7
Fixed lint
glaubervila Oct 18, 2024
6c2b725
Conficts resolved and merged.
rcboufleur Oct 24, 2024
2615fdb
Schema generator based on config files created
rcboufleur Oct 24, 2024
8ea745d
Temporary files update
rcboufleur Oct 24, 2024
c05ff0f
Column added to generate_schema.py
rcboufleur Oct 24, 2024
f46ce25
Column added to generate_schema
rcboufleur Oct 24, 2024
fd30ae8
A queue manager has been added to control execution by periods.
glaubervila Oct 24, 2024
9acbdab
minor fixes
glaubervila Oct 24, 2024
a4846da
Last minute value transformation included
rcboufleur Oct 24, 2024
7f80094
New config files generated
rcboufleur Oct 24, 2024
e4ce479
Updated instruments in testing files
rcboufleur Oct 25, 2024
8928f5c
Updated schema structures
rcboufleur Oct 25, 2024
c445d6c
Create new task after run all tasks
glaubervila Oct 25, 2024
ea09f56
Minor fix in create new tasks
glaubervila Oct 25, 2024
b66af91
Minor changes in run sh
glaubervila Nov 6, 2024
7a7f99d
Added permision
glaubervila Nov 6, 2024
df31088
fixed dockerfile
glaubervila Nov 6, 2024
9c38464
diferent schemas by instruments
glaubervila Nov 7, 2024
e114cce
Update to properly handle schemas by instrument
rcboufleur Nov 7, 2024
f4b416d
yaml extensions fixed
rcboufleur Nov 7, 2024
626f148
Fixed hardcoded schema for Queue manager
glaubervila Nov 7, 2024
927b852
Added timewindow to queue manager table
glaubervila Nov 8, 2024
1d7556a
Timewindow column added to the yaml files
rcboufleur Nov 8, 2024
160d265
Schema generator yaml files updated
rcboufleur Nov 8, 2024
38a8589
Test files updated
rcboufleur Nov 8, 2024
0e53e5c
Update Dockerfile.efdtransform
rcboufleur Nov 22, 2024
4fd17e3
Update Dockerfile.efdtransform
rcboufleur Nov 22, 2024
b43bc79
Merge remote-tracking branch 'origin/main' into tickets/DM-43722
rcboufleur Dec 6, 2024
92a2a4b
Updates configuration files
rcboufleur Dec 8, 2024
0cd3d38
Updates schemas yaml files
rcboufleur Dec 8, 2024
8ff1413
Implements unpivoted array support for array like results
rcboufleur Dec 8, 2024
6bc51b8
Updates test files
rcboufleur Dec 8, 2024
8ecd421
Implements new unpivoted tables
rcboufleur Dec 8, 2024
7cc9885
Updates column name in unpivoted tables
rcboufleur Dec 8, 2024
c889ec8
Rerun pre-commit
rcboufleur Dec 8, 2024
9de2cc6
Fixes error on github actions
rcboufleur Dec 8, 2024
28a900c
Update summary.py
rcboufleur Dec 8, 2024
59d2e2a
Fix column name bug unpivoted data
rcboufleur Dec 8, 2024
36fae9c
Realocates testing files
rcboufleur Dec 8, 2024
38c12fd
Fixes pytest for config_model
rcboufleur Dec 8, 2024
8acfdd0
Fixes bugs in processing unpivoted results
rcboufleur Dec 8, 2024
a5c1e66
Updates temporary files
rcboufleur Dec 9, 2024
84f9295
Fix trailing spaces
rcboufleur Dec 9, 2024
cbf7755
Fixes configuration files and schemas
rcboufleur Dec 9, 2024
c15a890
Update Dockerfile.efdtransform
rcboufleur Dec 9, 2024
e5f3d69
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
844c765
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
55b748a
Update import path and fixes last minute method computation.
rcboufleur Jan 16, 2025
2e25430
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
5aa6cdb
Update import path and fixes changes in summary methods and attributes.
rcboufleur Jan 16, 2025
db35f7e
Sets __init__ for dao module
rcboufleur Jan 16, 2025
c17fb15
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
b514760
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
b539b10
Update import path and updates changes to maintain compatibility with…
rcboufleur Jan 16, 2025
686e228
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
5a7cccf
Update import path for future compatibility and proper pytest funcion…
rcboufleur Jan 16, 2025
13d1a42
Fixes typo in column name
rcboufleur Jan 16, 2025
ef1fbec
Fixes typo in column names
rcboufleur Jan 16, 2025
cb38275
Implements pytest for config_model
rcboufleur Jan 16, 2025
84bed6a
Implements pytest for generate_schema
rcboufleur Jan 16, 2025
fb05010
Implements pytest for queue_manager
rcboufleur Jan 16, 2025
35c3455
Implements pytest for summary methods
rcboufleur Jan 16, 2025
abd378b
Updates temporary files used in local testing
rcboufleur Jan 16, 2025
fc262f4
Updates Dockerfile configurations
rcboufleur Jan 16, 2025
f877ff3
Update PYTHONPATH in Dockerfile
rcboufleur Jan 16, 2025
21b08c4
Fix LATISS schema tables
rcboufleur Jan 16, 2025
1346e2a
Fix lint errors
rcboufleur Jan 16, 2025
13f1aa4
Updates temporary files (local run)
rcboufleur Jan 16, 2025
c4115fd
Fix all Ruff linting issues across the codebase
rcboufleur Jan 17, 2025
cc46130
Fix pre-commit issues across the transform_efd codebase
rcboufleur Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,10 @@ jobs:
image: ${{ github.repository }}-pq
github_token: ${{ secrets.GITHUB_TOKEN }}
dockerfile: Dockerfile.pqserver

- name: Build efdtransform
uses: lsst-sqre/build-and-push-to-ghcr@v1
with:
image: ${{ github.repository }}-efdtransform
github_token: ${{ secrets.GITHUB_TOKEN }}
dockerfile: Dockerfile.efdtransform
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ repos:
# supported by your project here, or alternatively use
# pre-commit's default_language_version, see
# https://pre-commit.com/#top_level-default_language_version
language_version: python3.12
language_version: python3.11
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
Expand Down
15 changes: 15 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"editor.formatOnSave": true,
"[python]": {
"editor.tabSize": 4,
"editor.rulers": [
79,
110
],
},
"python.analysis.extraPaths": [
"./python",
"./python",
"./python/lsst"
]
}
46 changes: 46 additions & 0 deletions Dockerfile.efdtransform
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Define the LSST version
ARG OBS_LSST_VERSION=w_2024_47
FROM lsstsqre/centos:7-stack-lsst_distrib-${OBS_LSST_VERSION}

# Set user
USER lsst

# Install dependencies
RUN source loadLSST.bash && mamba install -y aiokafka httpx && \
pip install \
kafkit==0.2.1 \
lsst_efd_client==0.12.0

# Copy the Python module
COPY --chown=lsst:lsst python /opt/lsst/software/stack/python

# Create and populate the data directory
RUN mkdir -p /opt/lsst/software/stack/data
COPY --chown=lsst:lsst tmp/efd_transform/*.db /opt/lsst/software/stack/data/

# Environment variables
ENV CONFIG_FILE="/opt/lsst/software/stack/python/lsst/consdb/efd_transform/config_LATISS.yml" \

Check warning on line 22 in Dockerfile.efdtransform

View workflow job for this annotation

GitHub Actions / push

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$PYTHONPATH' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/
INSTRUMENT="LATISS" \
BUTLER_REPO="s3://rubin-summit-users/butler.yaml" \
S3_ENDPOINT_URL="https://s3dfrgw.slac.stanford.edu/" \
LSST_RESOURCES_S3_PROFILE_embargo="https://sdfembs3.sdf.slac.stanford.edu" \
PGUSER="rubin" \
CONSDB_URL="sqlite:////opt/lsst/software/stack/data/test.db" \
TIMEDELTA="5" \
LOG_FILE="/opt/lsst/software/stack/data/transform.log" \
PYTHONPATH="/opt/lsst/software/stack/python:$PYTHONPATH"

# Run the Python script
CMD ["bash", "-c", "source loadLSST.bash; setup lsst_distrib; python /opt/lsst/software/stack/python/lsst/consdb/efd_transform/transform_efd.py -c \"$CONFIG_FILE\" -i \"$INSTRUMENT\" -r \"$BUTLER_REPO\" -d \"$CONSDB_URL\" -E \"$EFD\" -t \"$TIMEDELTA\" -l \"$LOG_FILE\""]


# Example:
# docker run --rm -it \
# --volume $PWD/data:/opt/lsst/software/stack/data \
# -e CONFIG_FILE=/opt/lsst/software/stack/lsst/consdb/efd_transform/config_LATISS.yml \
# -e INSTRUMENT=LATISS \
# -e BUTLER_REPO=s3://rubin-summit-users/butler.yaml \
# -e EFD=usdf_efd \
# -e CONSDB_URL=sqlite:////opt/lsst/software/stack/data/test.db \
# -e TIMEDELTA=5 \
# consdb/efd_transform:latest
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ asyncio_mode = "auto"
test = ["pytest"]
dev = [
"documenteer[guide] < 2",
]
]
39 changes: 39 additions & 0 deletions python/lsst/consdb/efd_transform/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Provides a structured framework for processing and transforming data from the (EFD).

Overview:
---------
This module offers tools for accessing, transforming, and managing EFD data, supporting
workflows and integration with the LSST ecosystem. It includes capabilities for
data retrieval, transformation pipelines, configuration management, and schema
generation.

Components:
-----------
- **Data Access**: The `dao` subpackage provides Data Access Objects (DAOs) to interact
with specific database tables such as `exposure_efd` and `visit_efd`.
- **Data Transformation**: Includes utilities for applying structured transformations
to EFD data, including summarization and restructuring.
- **Configuration Handling**: Supports loading and validating instrument configurations
through YAML files for flexible setup and operation.
- **Schema Generation**: Automates the creation of database schemas based on predefined
instrument configurations.
- **Queue Management**: Implements tools for task scheduling and queue-based workflows.

Submodules:
-----------
- `dao`: Contains DAOs for database interactions.
- `config_model`: Defines models for validating YAML configurations.
- `generate_schema`: Includes schema generation utilities.
- `summary`: Provides tools for summarizing EFD data.
- `transform`: Manages transformation pipelines.
- `transform_efd`: Contains specialized transformation methods.
- `queue_manager`: Handles task queue management.

Configuration Files:
--------------------
The module uses YAML files for instrument-specific configurations:
- `config_LATISS.yaml`: Configuration for LATISS.
- `config_LSSTComCam.yaml`: Configuration for LSSTComCam.
- `config_LSSTComCamSim.yaml`: Configuration for LSSTComCamSim.

"""
Loading
Loading