Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change Postgres cron queries to BigQuery quries #1557

Closed
zqian opened this issue Dec 15, 2023 · 4 comments · Fixed by #1568
Closed

change Postgres cron queries to BigQuery quries #1557

zqian opened this issue Dec 15, 2023 · 4 comments · Fixed by #1568
Assignees
Labels
config change needed Changes are needed/included that may affect configuration files 📅 cron

Comments

@zqian
Copy link
Member

zqian commented Dec 15, 2023

Describe your problem or feature you'd like added

MyLA cron.py queries UDP context_store for Canvas data and UDP BigQuery for Caliper events.

Describe the solution you'd like

The UDP context_store tables can now be queried from UDP BigQuery. We can update MyLA cron.py file, and remove the SQLAlchemy and psycopg libraries.

@jonespm
Copy link
Member

jonespm commented Jan 22, 2024

If/when we go to BIgQuery we could consider switching to access the CD2 (canvas table) directly. This could more directly fix issues like #1559 where the context_store doesn't keep pseudonyms for all users. It would also bring up the question if we should use the context_store or just go to the canvas table directly? Are there advantages for MyLA to use the context_store?

This would put us back closer to the UDW and I think eliminate some other bugs we had to try to workaround with these UDP queries.

I think if we don't do a full switch for some tables like the users table we should use the data from the Canvas tables where it's possible at least replacing things like entity.person_email

@jonespm
Copy link
Member

jonespm commented Mar 28, 2024

@zqian I'm not sure if you started on this but I started working on this today.

I'm wondering if there's value in leaving the and PostGres code in there anymore since this is all using UDP anyway. I think we could write code so both work, at least for now, and remove that later. But we'd be leaving in dead code and probably nobody would be using it anyway. I feel like we're in the "Unizin 100% required" phase for this project now.

I think this still will need need sqlalchemy for writing to the MySQL database and removing that if we wanted to do that would be a different task.

@zqian zqian removed their assignment Mar 29, 2024
@zqian
Copy link
Member Author

zqian commented Mar 29, 2024

@jonespm I took my name from the assignee list. Please go ahead update the title and description of this issue, since the current design is to use Canvas Data 2 hosted in BigQuery.

@zqian zqian added 📅 cron config change needed Changes are needed/included that may affect configuration files labels Apr 18, 2024
jonespm added a commit that referenced this issue May 6, 2024
* WIP: Changes to support switching to bigquery. The queries currently
aren't working but it feels like it might be making progress!

* Converting queries over to user context_store_entity and
context_store_keymap

Currently gets as far as assignment table.

* Getting up to submission now!

* Changes to get this through submission, up to resources now

* Fix up resources and report queries

* Removing commented out methods
Adding in tbyte calculation to the new bq method

* Some cleanup around the bytes informational displays

* Removing support for DATA_WAREHOUSE and postgres

* Fixing up codacy issues, removing unused values

* Resolving SQL injection warning

* Missing import for Optional

* Incorrect value in the return

* Fixing time for course date

* Adding more explicit support for DATE and DATETIME

* Removing helper utility functions and passing explicit QueryJobConfig

* Removing some additional references to DATA_WAREHOUSE

* Removing unused imports
@jonespm jonespm closed this as completed May 6, 2024
@jonespm jonespm linked a pull request May 6, 2024 that will close this issue
16 tasks
@jonespm jonespm moved this to Review/QA - DEV in MyLA 2024.02.01 Aug 5, 2024
@zqian
Copy link
Member Author

zqian commented Aug 5, 2024

@jonespm: Can the "DATA_WAREHOUSE" settings be removed from ConfigMap?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config change needed Changes are needed/included that may affect configuration files 📅 cron
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants