Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What if we replaced nonce with cause(al) links #35

Open
Gozala opened this issue May 10, 2024 · 8 comments
Open

What if we replaced nonce with cause(al) links #35

Gozala opened this issue May 10, 2024 · 8 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented May 10, 2024

I've been finding that most times I need to perform repeated task invocation I have two options:

  1. Use nonce that captures (time based) randomness
  2. Acknowledge existence of previous call via causal relation

Downside of first option is that it makes things non-deterministic, and leads to non referentially transparent code. Second less obvious option does not have same issues and better yet encourages to capture (partial) order. It is also worth calling out that one could always encode arbitrary nonce as an link with identity hashed to a raw block, which is couple bytes prefix but otherwise same.

Which is why I'd like to propose replacing nonce with an optional cause: &any field (could be a different name) as I think it would encourage thinking about order and more deterministic approach

@expede
Copy link
Member

expede commented May 14, 2024

Hmm that's interesting (and a good point). There are two edge cases:

  1. It's an effectful invocation, and I haven't tracked the last time I ran it (e.g. 2 years ago)
  2. I'm firing a bunch of these in parallel, so they don't have a total order
  3. I want a coordination-free way to track calls via Task ID (i.e. you can't rely on hysteresis)

In all of these, a nonce is still useful. It's worth thinking on more, though! 🤔🤔🤔

@Gozala
Copy link
Contributor Author

Gozala commented May 14, 2024

Hmm that's interesting (and a good point). There are two edge cases:

  1. It's an effectful invocation, and I haven't tracked the last time I ran it (e.g. 2 years ago)

My point is that in distributed systems you probably should have ? At least framing this as cause as opposed to nonce makes you pause to consider do I need to be tracking or do I not need that ? If you don't need it you can always treat it as nonce.

I would also say that tracking each invocation perhaps is bit too granular ? You could instead you could use hash of the vector clock or perhaps drand.

  1. I'm firing a bunch of these in parallel, so they don't have a total order

Unless they have exact same task CID it probably does not matter right ? But again having to make you ask those questions upfront is kind of the goal of proposed reframing.

  1. I want a coordination-free way to track calls via Task ID (i.e. you can't rely on hysteresis)

If that is the case than you really want some randomness I suppose and that's cool to just take random value and link to it.

In all of these, a nonce is still useful. It's worth thinking on more, though! 🤔🤔🤔

Perhaps calling it cause is bit too prescriptive and a bit more neutral name perhaps something like origin, base or something along those lines ?

@Gozala
Copy link
Contributor Author

Gozala commented May 14, 2024

I think since maybe a better name as it better captures intention, especially because it can be interpreted as:

  1. A reference to a particular event
  2. Reference to a particular time
  3. A reason / cause of the event

@expede
Copy link
Member

expede commented May 14, 2024

My point is that in distributed systems you probably should have ?

I don't think that's true 🤔 e.g. a non-malicious Byzantine node can definitely find itself in this situation

I would also say that tracking each invocation perhaps is bit too granular ?

How so? Don't we need a way to distinguish between calls regardless of user, especially if they're effectful?

having to make you ask those questions upfront is kind of the goal of proposed reframing.

I'm all for making this stuff more salient, but that's largely above the protocol (in tooling), right?

Perhaps calling it cause is bit too prescriptive and a bit more neutral name perhaps something like origin, base or something along those lines ?

The more I think on this, the more I'm sure that I can't make this work for IPVM. The topologies that result break semantic addressing.

@expede
Copy link
Member

expede commented May 14, 2024

I think since maybe a better name as it better captures intention

We don't need to reinvent the wheel. cause and prev are common field names for this use case in other systems.

@Gozala
Copy link
Contributor Author

Gozala commented May 14, 2024

I don't think that's true 🤔 e.g. a non-malicious Byzantine node can definitely find itself in this situation

If it lost a state I would imagine it should either

  1. Recover it some way
  2. Rotate identifier in the vector clock to start from fresh state

How so? Don't we need a way to distinguish between calls regardless of user, especially if they're effectful?

I think we're loosing connection here. Specifically I think of nonce as a differentiating factor among two identical calls where CIDs would match without a nonce. I think solutions right now is either caller generates nonce randomly (current take) or chooses some deterministic way to differentiate (proposed take).

In many cases linking to an invocation you intend to be different from seems like a reasonable choice, effectively saying I'm intentionally invoking same exact task as the one I did before. If you were not aware of the former invocation you'll probably hit the cache and get receipt from the first invocation, if you want to bypass cache and invoke it once more, solution is logical take the CID of the first invocation and link to it (implying this is happening afterwards)

Tracking each individual invocation may not be always practical however (even if it ends up acting like local LRU cache), but then again caller can take more coarse approach and use local vector clock (or hash of it to conceal order) which would require less state maintenance just a node id & a local counter. If that state is lost node can regenerate node id and carry on with it's business.

In cases where vector clock is too much, node could simply stick to wall clock or random values or synthesis of two. At which point all the determinism is lost but sounds like that was a best choice available given constraints.

I'm all for making this stuff more salient, but that's largely above the protocol (in tooling), right?

I'm not sure what you mean here. Changing from nonce: bytes to since: &any does not really affect protocol (or format rather) it simply uses different term and type to impose (subjectively) better defaults. In other words I hope it will makes one consider what would be a right value to put here as opposed to follow the cowpath and pull some random bytes.

The more I think on this, the more I'm sure that I can't make this work for IPVM. The topologies that result break semantic addressing.

I'd like to understand why is this a case ? Or isn't the invoker responsible for providing these to IPVM ? I'm also not sure why IPVM could not just treat it as nonce (in worst case scenario) ?

We don't need to reinvent the wheel. cause and prev are common field names for this use case in other systems.

Well I thought I have confused you by implying that there must be a causal relation. I meant to say that causal link is a convenient nonce, but then again it does not need to capture causal relation it just a way to discriminate one call from another that otherwise looks identical.

@expede
Copy link
Member

expede commented May 15, 2024

I think solutions right now is either caller generates nonce randomly (current take) or chooses some deterministic way to differentiate (proposed take).

This may be the core item: not all invocations are pure & deterministic. We probably want to avoid coordination, so running a nondeterministic job with the same nonce is a problem (at least for IPVM).

As an example, each email someone sends is a distinct event regardless of history. The following dependency graph doesn't work for tracking unique email messages since they have "exactly-once semantics":

flowchart TB
  b -->|after| a
  c -->|after| a
Loading

On the other hand, pushing CRDT updates via invocation would work with the above.

The nice thing about a nonce is that you can either use a deterministic process if you want to (e.g. monotone counter, Lamport clock, random number, etc). It's just left unspecified by the protocol so that you don't depend on any particular mechanism system-wide. Randomness or UUID is an easy default that works in most cases, but for things like pure Wasm execution in IPVM we always set the nonce to 0 since ever run has the same meaning.

I think having cause and nonce be separate fields is helpful even in the case where the nonce is a constant, since we still want the trace for observability reasons.

If it lost a state I would imagine it should either

Recover it some way
Rotate identifier in the vector clock to start from fresh state

Can you maybe go more into how this would work? I think that this gets stuck: in the case that it's lost, recovery is not an option. Rotating the identifier means that you lose the history pointer that you're proposing to otherwise depend on.

since: &any

I don't understand the use case for arbitrary data in here?

replacing nonce with an optional cause: &any

We already have a cause field that captures that information, right?

Screenshot 2024-05-14 at 18 07 32

My understanding is that you're saying "why have both fields when my use case only needs one?". I think maybe your use case operates under the assumption that repeat actions are okay (AKA "at-least-once semantics"), but there are a lot of exactly-once actions in the world.

I'm also not sure why IPVM could not just treat it as nonce (in worst case scenario) ?

I guess you could use the identity CID to treat it as a nonce, but that always feels like a kludge to me. Mostly I'm confused what the benefit of the link is here when we have several other fields for capturing related information?

it simply uses different term and type to impose (subjectively) better defaults

I was trying to say that the DX for this ultimately lies in the tooling. People probably aren't going to build UCANs manually. At least at Fission we generally had tools to remove common tasks like finding the current DID, finding the expiration time, signing the token, etc.

@Gozala
Copy link
Contributor Author

Gozala commented May 15, 2024

Can you maybe go more into how this would work? I think that this gets stuck: in the case that it's lost, recovery is not an option. Rotating the identifier means that you lose the history pointer that you're proposing to otherwise depend on.

Maybe I'm making too many assumptions here, but let me call them out they are clear:

  1. I assume vector clock to be something like {id: PublicKey, n: Int} or a hash of it if you don't want to conceal details.
  2. If node is unable to recover clock state it rotates key. If it has no means of recovery I would consider it a different node hence different key.

You are correct that this is no longer linear history, but under defined constraints I don't think node has a better option.

But again there are other mechanisms one could use like https://drand.love/ to avoid having to maintain clock yourself.

I don't understand the use case for arbitrary data in here?

It is IPLD link to anything. So it could be a link an event like vector clock or it could be a link to the previous invocation of the otherwise identical task. It also could be a link to arbitrary bytes which is effectively a nonce.

Which is my primary hypothesis: You really want to acknowledge passage of time which could be by acknowledge some event (kind of spin on merkle clocks) or you you do it by acknowledging some random bytes which in practice is still acknowledge some events in an outside world.

Which is why I think it's just reframing of the nonce to better signify the purpose.

My understanding is that you're saying "why have both fields when my use case only needs one?". I think maybe your use case operates under the assumption that repeat actions are okay (AKA "at-least-once semantics"), but there are a lot of exactly-once actions in the world.

The way I'm thinking about it is if I'm sending email with exact same recipients, from same address and same body it is a noop and I would expect to just get a receipt. If my intention is to do it despite having done it prior, I should be explicit about it been different. I could do it either by:

  1. Acknowledge it is different by calling out the one it's different from.
  2. Acknowledge it's different this time, by calling out something that has changed since e.g. time (vector clock, wall clock or some derived randomness)

We already have a cause field that captures that information, right?

I do not think the cause field was the same thing although there are indeed some overlaps, specifically in chained tasks receipts leading to the initial invocation would provide uniqueness.

I guess you could use the identity CID to treat it as a nonce, but that always feels like a kludge to me. Mostly I'm confused what the benefit of the link is here when we have several other fields for capturing related information?

I think my hypothesis is that you do need nonce or cause, when you have later you probably don't need former. Ironically this whole idea was inspired by what you were telling me about how you've considered making cause in rhizome an arbitrary CID as opposed to a link to other fact.

I was trying to say that the DX for this ultimately lies in the tooling. People probably aren't going to build UCANs manually. At least at Fission we generally had tools to remove common tasks like finding the current DID, finding the expiration time, signing the token, etc.

Sure, but I was talking about spec implementers. I fear that most will just reach for putting random values in nonce (in fact I have encountered this happen already), I would very much prefer if people put more deliberation here and I hope reframing would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants