-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swip responsible nbhood split #43
base: master
Are you sure you want to change the base?
Swip responsible nbhood split #43
Conversation
Having read through this SWIP, and watching what is about to occur in the sepolia testnet swarm, and having monitored the pusher behavior when errors occur, I have doubts on the wisdom of pausing new chunk acceptance with an error message to the push. The pusher tries really hard (and fast) to deliver deferred chunks. When an error occurs, it just keeps trying the push until some other node accepts it. This happens on errors as well as if a "shallow receipt depth" is detected. And that latter is what I suspect would eventually happen if the target neighborhood rejects a push because it cannot split. And this would then cause an outward ripple effect as the "shallow" chunk accepting node(s) that have errored-out all of their closer peers, would then accept the chunk(s) into their reserve, eventually filling it and causing yet another neighborhood to attempt a split and possibly pause. Rinse and repeat in an outward direction. IMHO, it would be better for the over-full, cannot split neighborhood nodes to continue to accept chunks so that the swarm can continue to fully operate. New data can still be stored, and existing data would not be evicted until the newly split neighborhoods have sufficient peers to cover them. Then the reserve evictions can resume, knowing that the chunks have a new protected home. I've often thought that nodes should have a pseudo-reserve, secure-cache, where they pull and retain chunks for their adjacent neighborhoods all the time. I call this pseudo-reserve because these chunks would be stored WITH their stamps, unlike the stamp-less chunks in the cache. That way, the stamped chunks can be pulled back into the adjacent neighborhoods when/if new nodes appear to cover them. This provides better storage redundancy, and even ensures (somewhat) retrievability because of the kademlia routing to get "close" to the target storage neighborhood. Retrieval requests would (hopefully, or eventually) be routed through the adjacent neighborhood nodes which would be able to satisfy the request from the pseudo-reserve. And an extension to this is that the storage compensation schelling game could actually be competed in the pseudo-neighborhoods because in theory they would be fully populated with all of the required chunks. |
Keeping an extra "reserve" for accepting chunks could be problematic, as one does not know in advance how many it would need to accept - perhaps filling up the hardrive? Or stopping at some point, where again the mechanism described would need to be used. A node could also signal much before it runs out of space that a negative situation is arising. As for the situation described above, if I understand correctly, it should be solved in general, so that error messages do not overwhelm the network, that the network adapts more appropriately. Adding @istae to the thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Chunk distribution is pretty uniform, so any file of reasonable size will be expected to push at least one chunk to all neighbourhoods.
- A file upload constitutes a failure if any of its chunks fails to upload.
- Swarm's operation with the required redundancy is incentivised by the pricing mechanism
- It is not in the direct interest of node operators to stop accepting chunks.
- Erasure coded content upload should not be considered necesarily a failure if chunk push to certain neighbourhoods fail since downloaders can still reconstruct the content.
- the new stewartship tool to repair missing chunks of an erasure coded file ensure that content can survive temporary upload failures as long as only the proportion of filled up neighnbourhoods does not exceed the chunk failure rate assumption behind the respective erasure coding redundancy level.
1 and 2 implies that it is sufficient to indicate to users directly that the network may be saturated and due to certain neighbourhoods not being able to split, file upload is prolematic
3 and 4 implies that the problem is tackled by the incentive system and also that any further measures as proposed in this SWIP are not inentive-aligned and at best are of dubious added value given the complexity and reliability due to compliance.
5 and 6 imply that there may not be merit in rejecting uploads due to saturated neighbourhoods and in fact there is already a user-side mitigation that enables content to survice temporary non-storage helped by so called stewards who do have the incentive to keep files retrievable.
Although I think the observation in this swip that users may want to be informed about the conditions of their newly uploaded content being retrievable from the network is a correct one but also valid for the retrievability of past data. Therefore I recommend implementing a warning as part of network monitoring that tracks neighbouthood health and suggest the appropriate level of erasure redundancy necessary for downloads and therefore required for uploads.
The motivation for the SWIP is for nodes not to increase radius if chunks would get lost in the process. It is giving priority to data that is already stored to data that is yet to be uploaded. Sure, the uploads might not proceed, but existing data will not be discarded. (It does not solve the problem of a whole neighbourhood of nodes leaving the network. So it is a partial solution, at best. ) Any monitoring to instruct uploaders of EC level to use, would also only cover future uploads. Past uploads with lesser EC guarantees would maybe not meet the "current" requirements. It is my belief that it is in the interest of nodes (node operators) for data to persist on the network as expected by the uploaders. As the uploaders will otherwise stop using the network, which is not in the interest of nodes (node operators). |
Describe how a responsible neighborhood split / storage radius increase should be handled by a node.