Pool before the event #12
pl-marasco
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm used to doing a pool before the event or at the real beginning of the event to understand a little bit more about the attendees.
If you think that is not necessary please make a clear statement; don't worry, already someone else did.
Here below my proposal:
For each question, please select the option that best represents your level of knowledge or experience.
a) Never heard of it
b) Heard of it, but never used
c) Basic knowledge (e.g., can create and manipulate DataArray and Dataset objects)
d) Intermediate knowledge (e.g., can perform operations like resampling, grouping, rolling)
e) Advanced knowledge (e.g., understand internal architecture, have made custom extensions)
a) Never heard of it
b) Heard of it, but never used
c) Basic knowledge (e.g., can create and manipulate Dask arrays and DataFrames)
d) Intermediate knowledge (e.g., can optimize computations, use Dask delayed)
e) Advanced knowledge (e.g., can configure and use Dask distributed clusters, understand internal architecture)
a) Never heard of the Pangeo project
b) Have a general idea of its goals
c) Understand its main tools and their purpose
d) Regularly use tools within the Pangeo ecosystem
e) Actively contribute or have contributed to Pangeo project initiatives
a) No knowledge
b) Know the basics, such as what satellite data generally consists of
c) Can name and understand the purpose of popular satellite missions and their data products
d) Have experience analyzing or processing satellite data
e) Deep expertise, including knowledge of raw data, calibration, corrections, and advanced processing techniques
a) Always locally on my computer
b) Sometimes on local HPC clusters
c) Sometimes in the cloud
d) Frequently use both local HPC and cloud resources
e) Exclusively in the cloud
a) I don’t know the difference.
b) Cloud/HPC provides more computational power.
c) Cloud analysis is more scalable and flexible, but may come with additional costs.
d) Local analysis gives more control and privacy, but might be limited in scalability.
e) Both c and d.
a) I'm not familiar with out-of-core computations.
b) I break the data into chunks and process them serially.
c) I use tools like dask to handle them seamlessly.
d) I rely on cloud/HPC resources to provide sufficient memory.
e) I use a combination of the above methods.
a) Local hard drives or SSDs
b) Network attached storage (NAS)
c) Object storage (e.g., Amazon S3, Google Cloud Storage)
d) Distributed file systems (e.g., HDFS, GlusterFS, Ceph)
e) Tape storage
a) Direct download from public repositories (e.g., CDSE, Copernicus services WeKEO, NASA)
b) Access through cloud-native datasets (e.g., data available directly on cloud platforms like AWS, GEE, Planetary Computer)
c) Physical media (e.g., external drives, DVDs)
d) Streamed or accessed through APIs
e) Generated in-house or through experiments/simulations
a) I don't specifically handle data versioning.
b) Use filesystem-based versioning (e.g., timestamped folders, naming conventions).
c) Use dedicated data versioning tools (e.g., DVC, Pachyderm).
d) Rely on cloud storage features for versioning (e.g., versioned buckets in S3).
e) Use databases with versioning capabilities.
a) Frequently
b) Occasionally
c) Rarely
d) Never
a) Frequently
b) Occasionally
c) Rarely
d) Never
a) Python
b) R
c) Julia
e) C++, Rust, Java
f) Software/application where I do not need to write program
Beta Was this translation helpful? Give feedback.
All reactions