Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1601 Add initial support for Unizin Synthetic data #1600

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/cron_udp.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@
(
select
distinct cse.person_id as user_id
from context_store_entity.course_section_enrollment cse
from context_store_entity.course_section_enrollment cse
left join context_store_entity.course_section cs
on cse.course_section_id = cs.course_section_id
left join context_store_keymap.course_offering co
Expand Down
11 changes: 9 additions & 2 deletions config/env_sample.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@
"ROOT_PASSWORD": "student_dashboard_root_pw"
},
# Default Canvas Data id increment for course id, user id, etc
jxiao21 marked this conversation as resolved.
Show resolved Hide resolved
"CANVAS_DATA_ID_INCREMENT": 17700000000000000,
# for Unizin synthetic data, the value is 1000000000000
"CANVAS_DATA_ID_INCREMENT": 1000000000000,
# Canvas Configuration
"CANVAS_USER": "",
# strings for construct file download url
Expand Down Expand Up @@ -304,5 +305,11 @@
"COURSES_ENABLED": false,

# Path to the hjson file contains cron queries
"CRON_QUERY_FILE": "config/cron_udp.hjson"
"CRON_QUERY_FILE": "config/cron_udp.hjson",

# Change the default Bigquery Project ID
"DEFAULT_PROJECT_ID": "udp-umich-prod",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a placeholder value here, instead of "udp-umich-prod", like

"DEFAULT_PROJECT_ID": "<UDP_institution_id>",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just empty before. This should all work with the service account and the regular data if it's not set and I don't think it will work yet.

# Change the dataset project ID where queries are run against
"DATASET_PROJECT_ID": "unizin-shared"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this line commented by default, so developer needs to enable DATASET_PROJECT_ID in purpose, to connect to Unizin synthetic data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think it would be needed otherwise. Though there could be a case where a default project is used but the data sets are in another project. Like when/if we get true shared repositories.

Anyway, this is a temporary solution and I believe we'll need a future issue to set this value on a per course level in the admin.


}
39 changes: 23 additions & 16 deletions dashboard/cron.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def setup_queries(self):

def setup_bigquery(self):
# Instantiates a client
self.bigquery_client = bigquery.Client()
self.bigquery_client = bigquery.Client(project=settings.DEFAULT_PROJECT_ID)

# BQ Total Bytes Billed to report to status
self.total_bytes_billed = 0
Expand Down Expand Up @@ -99,24 +99,24 @@ def execute_bq_query(self, query: str, bq_job_config: Optional[bigquery.QueryJob
# Remove the newlines from the query
query = query.replace("\n", " ")

if bq_job_config:
try:
# Convert to bq schema object
query_job = self.bigquery_client.query(query, job_config=bq_job_config)
query_job_result = query_job.result()
# Create a new QueryJobConfig if none is provided
if bq_job_config is None:
bq_job_config = bigquery.QueryJobConfig()

self.total_bytes_billed += query_job.total_bytes_billed
logger.debug(f"This job had {query_job.total_bytes_billed} bytes. Total: {self.total_bytes_billed}")
return query_job_result
except Exception as e:
logger.error(f"Error ({str(e)}) in setting up schema for query {query}.")
raise Exception(e)
else:
query_job = self.bigquery_client.query(query)
# Add the dataset_project_id connection property to the job config
bq_job_config.connection_properties = [bigquery.ConnectionProperty("dataset_project_id", settings.DATASET_PROJECT_ID)]

try:
# Convert to bq schema object
query_job = self.bigquery_client.query(query, job_config=bq_job_config)
query_job_result = query_job.result()
self.total_bytes_billed += query_job.total_bytes_billed
logger.debug(f"This job had {query_job.total_bytes_billed} bytes. Total: {self.total_bytes_billed}")
return query_job_result
except Exception as e:
logger.error(f"Error ({str(e)}) in setting up schema for query {query}.")
raise Exception(e)

return query_job_result

# Execute a query against the MyLA database
def execute_myla_query(self, query: str, params: Optional[Dict] = None) -> ResultProxy:
Expand Down Expand Up @@ -226,7 +226,10 @@ def update_unizin_metadata(self):

logger.debug(metadata_sql)

status += self.util_function(metadata_sql, 'unizin_metadata')
try:
status += self.util_function(metadata_sql, 'unizin_metadata')
except Exception as e:
logger.warn(f"Could not directly access metadata, this is likely just an issue when using synthetic data.")

return status

Expand Down Expand Up @@ -313,6 +316,7 @@ def update_resource_access(self):
'canvas_event_urls', 'STRING', settings.CANVAS_EVENT_URLS))
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
job_config.connection_properties = [bigquery.ConnectionProperty("dataset_project_id", settings.DATASET_PROJECT_ID)]

# Location must match that of the dataset(s) referenced in the query.
bq_job = self.bigquery_client.query(final_query, location='US', job_config=job_config)
Expand Down Expand Up @@ -651,8 +655,10 @@ def do(self) -> str:

# continue cron tasks

logger.info("** term")
status += self.update_term()

exception_in_run = False
if len(self.valid_locked_course_ids) == 0:
logger.info("Skipping course-related table updates...")
status += "Skipped course-related table updates.\n"
Expand All @@ -676,6 +682,7 @@ def do(self) -> str:
status += str(e)
exception_in_run = True

logger.info("** informational")
status += self.update_unizin_metadata()

all_str_course_ids = set(
Expand Down
6 changes: 6 additions & 0 deletions dashboard/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,12 @@ def apply_env_overrides(env: Dict[str, Any], environ: os._Environ) -> Dict[str,
# Only need view permission for exports
IMPORT_EXPORT_EXPORT_PERMISSION_CODE = 'view'

# Change the default project ID for BigQuery if needed (This is typically the one that quotas are run against and logged into)
DEFAULT_PROJECT_ID = ENV.get("DEFAULT_PROJECT_ID", None)

# Override the default project ID for BigQuery if needed, like to unizin-shared
DATASET_PROJECT_ID = ENV.get("DATASET_PROJECT_ID", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default value be DEFAULT_PROJECT_ID, instead of None?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but then we might need a new variable to indicate whether or not this is running on synthetic data or not.

Maybe it's better to make that explicit. I'm not sure yet, this still has some more work on it.


# IMPORT LOCAL ENV
# =====================
try:
Expand Down