Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots and lots of workflows for CoDSummit #26

Closed
Tracked by #2186
philwinder opened this issue Mar 17, 2023 · 2 comments
Closed
Tracked by #2186

Lots and lots of workflows for CoDSummit #26

philwinder opened this issue Mar 17, 2023 · 2 comments

Comments

@philwinder
Copy link
Contributor

philwinder commented Mar 17, 2023

Brain dump

  • IPFS data statistics workflow
  • Format conversion, like csv to parquet, or csv to json
  • Validation
  • Descriptions

Use Cases

IPFS Data Statistics

Important to do this first, because there's no point working on ${DataType} use cases if there are no ${DataTypes} in IPFS -- {insert data type here}.

Metadata -> Pushing metadata info to external database (security concern, whitelist?)

Web-Focussed Image Compression

Metadata | predicate image/*-> Image Processing -> Merge

Web-Focussed Video Compression

Metadata | predicate video/*-> Video Processing -> Merge

Transcription of Video and Audio

Metadata | predicate [video|audio]/* -> Transcription model(s) -> Merge

CSV/Parquet Summary Statistics

Interesting one here, parquet doesn't have a registered mime type yet. Wonder if tika can parse?

Metadata | predicate [text/csv| application/parquet] -> Load and produce summaries of data -> Merge

${DataType} Enrichment

For example, given CSV file, columns have a lat and long, run job that converts lat/long to country/city and creates output csv with the same row format, then merge back together with original data.

Metadata | predicate ${DataType} -> Parse columnar data type | predicate ${ColumnDataType} -> Data Enrichment -> Merge

Image Dataset Analysis

https://cleanvision.readthedocs.io/

Metadata | predicate image/*-> Image analysis -> Merge

@davidgasquez
Copy link

Adding a couple more:

@philwinder
Copy link
Contributor Author

Finished all workflows for Boston. Can obviously add more workflows in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants