You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make use of gcloud alpha storage in open_local() method, sinks.py.
Findings -- using gsutil for downloading the data from gcs to the local file system is 5 times slower compared to gcloud alpha storage.
Memory Efficient:
Every time when we log xr_dataset.nbytes it will takes the complete dataset in-memory which is causing OOM killer invocation.
TODO: Find a better way for logging the dataset size.
Real-time data ingestion into BQ:
beam.io.WriteToBigQuery() -- in case of batch pipeline data is not ingested into BQ in real-time. Because batch pipeline processes all elements before writing to BigQuery.
The text was updated successfully, but these errors were encountered:
Time Efficient:
Make use of
gcloud alpha storage
in open_local() method, sinks.py.Findings -- using
gsutil
for downloading the data from gcs to the local file system is 5 times slower compared togcloud alpha storage
.Memory Efficient:
Every time when we log
xr_dataset.nbytes
it will takes the complete dataset in-memory which is causing OOM killer invocation.TODO: Find a better way for logging the dataset size.
Real-time data ingestion into BQ:
beam.io.WriteToBigQuery() -- in case of batch pipeline data is not ingested into BQ in real-time. Because batch pipeline processes all elements before writing to BigQuery.
The text was updated successfully, but these errors were encountered: