You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think about the Spark way to generate data a while and I don't have a clear picture how this should be done, but this pattern seems very likely to be a SparkStreaming scenario: TPC-DS tool keeps generating data(data source) , Spark should receive them(receiver), it will last for a while (streaming).
nds_gen_data.py enables data generation locally and on HDFS. Allow options to generate data on GCS, S3 and Azure (in that priority order).
The text was updated successfully, but these errors were encountered: