You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
, any_value(destination_table).project_id as dst_project
, any_value(destination_table).dataset_id as dst_dataset
, any_value(normalized_dst_table) as dst_table
, any_value(ref).project_id as src_project
, any_value(ref).dataset_id as src_dataset
, any_value(normalized_ref_table) as src_table
, max(creation_time) as job_latest
, approx_top_sum(query, unix_seconds(creation_time), 10)[safe_offset(0)].value as query
, approx_count_distinct(user_email) as n_user
, approx_count_distinct(query) as n_queries
, approx_count_distinct(job_id) as n_job
, sum(total_bytes_processed) as total_bytes
, approx_quantiles(processed_time_ms, 10) as processed_time__quantiles
, approx_quantiles(wait_time_ms, 10) as wait_time__quantiles
, sum(total_slot_ms) as total_slots_ms
, approx_quantiles(total_slot_ms, 10) as total_slots_ms__quantiles
from job, unnest(referenced_tables) as ref
left join unnest([struct(
extract(millisecond from processed_time)
+ extract(second from processed_time) *1000
+ extract(minute from processed_time) *60*1000
+ extract(hour from processed_time) *60*60*1000
as processed_time_ms
, extract(millisecond from wait_time)
+ extract(second from wait_time) *1000
+ extract(minute from wait_time) *60*1000
+ extract(hour from wait_time) *60*60*1000
as wait_time_ms
, regexp_extract(ref.table_id, r'\d+$') as _src_suffix_number
, regexp_extract(destination_table.table_id, r'\d+$') as _dst_suffix_number
, destination_table = ref as is_self_reference
, starts_with(destination_table.dataset_id, '_') and char_length(destination_table.dataset_id) >40as is_temporary
, starts_with(destination_table.table_id, 'anon') as is_anonymous_query
)])
left join unnest([struct(
if(safe.parse_date('%Y%m%d', _src_suffix_number) is not null, regexp_replace(ref.table_id, r'\d+$', '*'), ref.table_id) as normalized_ref_table
, if(safe.parse_date('%Y%m%d', _dst_suffix_number) is not null, regexp_replace(destination_table.table_id, r'\d+$', '*'), destination_table.table_id) as normalized_dst_table
)])
where
not is_self_reference
and not statement_type in ('INSERT', 'DELETE', 'ALTER_TABLE', 'DROP_TABLE')
and statement_type is not null
group by unique_key
)
, user_query as (
select
format('%s.%s.%s', src_project, src_dataset, src_table) as destination
RECURISVE構文とINFORMATION_SCHEMA.JOBS_BY_* を組み合わせることで
ジョブの実行履歴からデータリネージを生成することができる
RECURSIVE CTE による再帰的SQLの記述
RECURSIVE構文を利用することにより、SQLの最適的取り扱いが可能である。
INFORMATION_SCHEMAによる参照テーブルの取得
またBigQueryが提供するメタデータ INFORMATION_SCHEMA.JOBS_BY_* によってクエリの実行履歴が取得できるが
これには
referenced_tables
といった形でクエリの実行時に参照したテーブルの情報が記録されている。実装
次の実装は、ジョブの実行履歴からテーブルのリネージを生成する例となる。
ついでにテーブルの依存関係だけでなくユーザの呼び出しクエリも記載している。
これを利用することで、
などができるだろう
感想
WITH RECURSIVEは TVFなどで関数化することができないため、ちょっと使いづらく困った
estante/codes/bigquery/dashboard/deps.sql
Lines 1 to 116 in 2b483d6
The text was updated successfully, but these errors were encountered: