YAML file describes DAG where each node can accept inputs, process data, and return outputs.
Nodes might include:
-
Literals (string, number)
-
Allow literal inputs to be inlined in YAML?
-
-
URI fetcher/file download
-
Archive extractor
-
Archive extractor should receive download URL, no point in loading entire packed archive into filestore
-
-
Get file by path from filetree
-
Minecraft specific:
-
Curse manifest → github:erisia/builder YAML manifest
-
github:erisia/builder YAML → .nix
-
-
Filetree filter
-
exclude list? include list? both?
-
-
File patcher
-
Git-style diffs?
-
Apply to file or filetree?
-
Each node runs in a separate thread (tokio?).
Nodes read input from/send output to broadcast SPMC channels.
Node threads block until all channels return output, then begin processing.
Inputs to nodes are cloned by channel, no interference between threads on data.
All datatypes passed must implement Clone.
Additional channels for nodes to stream status/logs to main thread.
(this feels like a golang style design, but I want the compile-time checks of Rust)
Bespoke DI container?
-
Holds all channels
-
Holds threadsafe references to useful globals
-
HTTP client for 429 handling
-
API secrets from env
-
Misc. config?
-
-
Pass
&ref
of container to node constructor, which then copies references of required components into self, trait for "has constructor accepting DI cont."
Crate enum_dispatch
perfect fit for node constructor trait
-
Create enum with variants matching structs that implement a trait
-
Annotate enum and trait
-
Each enum variant (which can be included in serde deserialize output) will have the functions defined in the trait, and will delegate to the correct variant statically
-
Nodes have unique names
-
Create a broadcast channel for each output
-
DI container owns all channels
-
Tokio broadcast channels are created by cloning/subscribing to the sender end, so container holds the sender sides
-
Receiver ends initally created by channel constructor could be dropped
-
DI container has separate methods for obtaining a sender or receiver channel
-
-
Channels are generic (enum by datatype)
-
Constructor
match
es each of its inputs to validate types -
Constructor returns Err if
match
fails
-
-
DI container holds "waker" broadcast channel
tokio::sync::broadcast::Sender<()>
-
Node threads obtain a rx channel from container, and block until a unit message is transmitted
-
DI container has method that sends wakeup message to all holders of the rx channel
Tokio broadcast receivers only get messages that were sent after they are created by calling tx.subscribe()
Node outputs should be immutable, no edit in place
Duplicated files in memory not feasible
Filetrees represented as list of file paths mapped to content hashes, no nested structure
File paths are represented as a list of each path component, with any trickery mangled by the FilePath struct’s constructor
Separate shared content addressible store holds files indexed by hash
-
Memory safe insert/retrieve
-
Pass file bytes to store, store returns hash
-
If store already has hash, do not overwrite file (or do? hopefully no collisions)
-
Pass hash to store, store returns file bytes
File store holds only file contents, file tree holds directory structure, filenames, metadata
File tree memory cost should be low enough to allow frequent cloning
Renaming, moving, deleting files from tree has low cost
File store can be abstract, with multiple backends for incremental development without changing interface
-
In-memory
-
DB
-
Filesystem
Initial implementation will use an in-memory filestore
Nodes can stream log messages/progress meter updates through channels
Build abstraction layer for reading messages
-
Get DAG structure (nodes/channels)
-
Get logs for a node
-
Get progress for a node
Build UI layers
-
CLI
-
Write log messages directly to stdout
-
-
TUI
-
Display DAG visualization with progress indicators
-
Log view?
-
-
Web
-
Display DAG visualization with progress indicators
-
Request/WS stream logs from backend
-
Should abstraction layer fetch data from node outputs? Should it expose this data?
Look into Tokio tracing for exposing log messages