Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add web annotation JSON-LD validation #1

Open
snvfk1n opened this issue Oct 29, 2019 · 4 comments
Open

add web annotation JSON-LD validation #1

snvfk1n opened this issue Oct 29, 2019 · 4 comments
Assignees

Comments

@snvfk1n
Copy link
Member

snvfk1n commented Oct 29, 2019

JSON schema for JSON-LD web annotations validations should be carried out to a) validate the hypermerge storage (?) and b) for all REST endpoints.

@snvfk1n snvfk1n self-assigned this Oct 29, 2019
@snvfk1n
Copy link
Member Author

snvfk1n commented Nov 4, 2019

ajv: https://github.com/epoberezkin/ajv

@snvfk1n
Copy link
Member Author

snvfk1n commented May 2, 2020

this grows or falls with schema enforcement methods of hypermerge/automerge; if any client can add any data with all other hypercore/hypermerge peers accepting the data, the data model can be compromised easily. however, this is even more related to consensus principles among peers of the swarm than to JSON validation.

if we're talking about private collaboration, the user shares a document with a selected group of peers. the "attack vectors" would concern any action that would add non-compliant data to the annotation collection. i imagine this being either faulty clients, undetermined accidents, or deliberate actions from users putting in bad data manually.

hypercore logs are append-only and thus we can't simply change history; however, they preserve history and would allow to time-travel back to particular states. a repair mechanism ("non-annotation data included? repair your notebook by clicking here!") could either traverse the log and undo any bad changes, and/or just create a new, repaired log with a healthy history.

nevertheless, the server should validate PUT/POST bodies before appending them to the annotation log.

@snvfk1n
Copy link
Member Author

snvfk1n commented May 15, 2020

in a recent discussion, @pvh addressed the issue of validation, local-first (ownership), and consensus. in a local-first system, proper ownership goes before consensus; inspired by physical ownership, as i understood it, a replica of a hypermerge document is first and foremost a copy and the subsequent collaboration/replication/merging is just a feature.

this should support the above concept of treating annotation collections more defensively and not opting for some sort of consensus on a notebook data structure. the gateway server, however, takes an important role when connecting notebooks to thin clients on the web:

  1. it can guarantee that incoming data from web clients (create/update) is compliant with the WADM (JSON schema validation) and reject any invalid data.
  2. it can validate entire notebooks and ensure WADM compliance to web clients. for example, the WAP endpoints would return just valid annotations (if there are any) and inform web clients about invalid data via extra HTTP headers such as x-invalid-entries: abc123,def456. fragmenting representations of data is not necessarily a good approach, but web clients would need to filter invalid annotations either way.
  3. it even could offer straightforward repair mechanisms ("remove all invalid annotations"), but maybe that should be handled by actual local-first apps instead of thin clients on the web for the sake of security/privacy.

@pvh
Copy link

pvh commented May 15, 2020

Thanks so much for listening to me natter on about this stuff, @falafeljan, so forgive me for clarifying my thoughts.

My view is that local-first software should, as much as possible, not be degraded by the loss of non-local infrastructure. That means embracing the cloud for responsibilities where it is required but avoiding it otherwise.

In this case, in my view, the decision of how to decode confusing or malformed data is something a client can make independently. If I want to collaborate with another client running a new or forked version of the code, why should a server interpose itself into that relationship? A client might choose to be more or less restrictive in what it accepts from a peer, just as it determines what to write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants