Pool before the event #12

pl-marasco · 2023-10-18T07:20:21Z

pl-marasco
Oct 18, 2023
Maintainer

I'm used to doing a pool before the event or at the real beginning of the event to understand a little bit more about the attendees.
If you think that is not necessary please make a clear statement; don't worry, already someone else did.

Here below my proposal:

For each question, please select the option that best represents your level of knowledge or experience.

How would you rate your familiarity with xarray?

a) Never heard of it
b) Heard of it, but never used
c) Basic knowledge (e.g., can create and manipulate DataArray and Dataset objects)
d) Intermediate knowledge (e.g., can perform operations like resampling, grouping, rolling)
e) Advanced knowledge (e.g., understand internal architecture, have made custom extensions)

How would you rate your familiarity with dask?

a) Never heard of it
b) Heard of it, but never used
c) Basic knowledge (e.g., can create and manipulate Dask arrays and DataFrames)
d) Intermediate knowledge (e.g., can optimize computations, use Dask delayed)
e) Advanced knowledge (e.g., can configure and use Dask distributed clusters, understand internal architecture)

How familiar are you with the tools and objectives of the Pangeo project?

a) Never heard of the Pangeo project
b) Have a general idea of its goals
c) Understand its main tools and their purpose
d) Regularly use tools within the Pangeo ecosystem
e) Actively contribute or have contributed to Pangeo project initiatives

How would you rate your knowledge regarding satellite data?

a) No knowledge
b) Know the basics, such as what satellite data generally consists of
c) Can name and understand the purpose of popular satellite missions and their data products
d) Have experience analyzing or processing satellite data
e) Deep expertise, including knowledge of raw data, calibration, corrections, and advanced processing techniques

How do you prefer to analyze large datasets?

a) Always locally on my computer
b) Sometimes on local HPC clusters
c) Sometimes in the cloud
d) Frequently use both local HPC and cloud resources
e) Exclusively in the cloud

How would you describe the main difference between analyzing data locally and analyzing data on cloud/HPC systems?

a) I don’t know the difference.
b) Cloud/HPC provides more computational power.
c) Cloud analysis is more scalable and flexible, but may come with additional costs.
d) Local analysis gives more control and privacy, but might be limited in scalability.
e) Both c and d.

What is your preferred method for handling out-of-core computations (computations that don’t fit in memory)?

a) I'm not familiar with out-of-core computations.
b) I break the data into chunks and process them serially.
c) I use tools like dask to handle them seamlessly.
d) I rely on cloud/HPC resources to provide sufficient memory.
e) I use a combination of the above methods.

What type of storage solutions do you primarily use for storing large datasets?

a) Local hard drives or SSDs
b) Network attached storage (NAS)
c) Object storage (e.g., Amazon S3, Google Cloud Storage)
d) Distributed file systems (e.g., HDFS, GlusterFS, Ceph)
e) Tape storage

How do you typically acquire large datasets for your analyses?

a) Direct download from public repositories (e.g., CDSE, Copernicus services WeKEO, NASA)
b) Access through cloud-native datasets (e.g., data available directly on cloud platforms like AWS, GEE, Planetary Computer)
c) Physical media (e.g., external drives, DVDs)
d) Streamed or accessed through APIs
e) Generated in-house or through experiments/simulations

How do you handle data versioning and tracking for reproducibility?

a) I don't specifically handle data versioning.
b) Use filesystem-based versioning (e.g., timestamped folders, naming conventions).
c) Use dedicated data versioning tools (e.g., DVC, Pachyderm).
d) Rely on cloud storage features for versioning (e.g., versioned buckets in S3).
e) Use databases with versioning capabilities.

When working with large datasets, how often do you encounter challenges related to data storage limitations?

a) Frequently
b) Occasionally
c) Rarely
d) Never

When working with large datasets, how often do you encounter challenges related to the scalability of data exploitation (e.g., processing speed, ability to handle growing datasets, efficiently applying algorithms on large-scale data)?

a) Frequently
b) Occasionally
c) Rarely
d) Never

What programming language do you use for your analysis?
a) Python
b) R
c) Julia
e) C++, Rust, Java
f) Software/application where I do not need to write program

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool before the event #12

{{title}}

Replies: 0 comments

Select a reply

Pool before the event #12

pl-marasco Oct 18, 2023 Maintainer

Replies: 0 comments

pl-marasco
Oct 18, 2023
Maintainer