Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

filip26 · 2024-12-13T00:28:17Z

Hi,
I’d like to propose avoiding JSON Lines for the following reasons:

Added Complexity

Supporting JSON Lines requires additional implementation effort to handle both standard JSON and JSON Lines parsing.
Converting JSON Lines into standard JSON through pre-processing is inefficient, as it results in redundant parsing with no added value other than compatibility.

Limited Extensibility

JSON Lines does not allow adding metadata, such as positions or links to subsequent chunks, etc.

Inefficient Processing

Processing line-by-line in a streaming context is less efficient compared to handling chunks or pages.
JSON Lines enforces sequential, linear history processing. Standard JSON Object with embedded links enables non-linear history processing.

Using JSON improves adoption, speeds up processing, and supports extensibility.

Please consider the outcome. Thank you.

JSON Lines were likely intended to serve as a replacement for CSV

brianorwhatever · 2025-01-14T19:33:23Z

I ran some comparisons in python, go and node (see below)

Comparison Results

Python

--- Testing with 10000 items ---
Single JSON parse: 12ms
JSONL parse: 31ms

--- Testing with 100000 items ---
Single JSON parse: 126ms
JSONL parse: 326ms

--- Testing with 1000000 items ---
Single JSON parse: 1228ms
JSONL parse: 2834ms

GO

--- Testing with 10000 items ---
Single JSON parse: 20ms
JSONL parse: 27ms

--- Testing with 100000 items ---
Single JSON parse: 240ms
JSONL parse: 312ms

--- Testing with 1000000 items ---
Single JSON parse: 2294ms
JSONL parse: 3159ms

NODE JS

--- Testing with 10000 items ---
Single JSON parse: 7ms
JSONL parse: 20ms

--- Testing with 100000 items ---
Single JSON parse: 129ms
JSONL parse: 128ms

--- Testing with 1000000 items ---
Single JSON parse: 2218ms
JSONL parse: 1441ms

Python: Single JSON parse is roughly 2–3x faster than JSONL parsing at all tested sizes. For 1 million items, it finishes in 1228ms compared to JSONL’s 2834ms.

Go: Single JSON parse is consistently faster for all sizes. The difference ranges from 7ms at 10,000 items to about 865ms at 1 million items.

Node: Single JSON parse is faster for 10,000 and 100,000 items. At 1 million items, JSONL parse finishes sooner (1441ms) compared to single JSON parse (2218ms).

These results do show a slight performance preference for a single JSON array in Python and Go, with Node having some interesting behavior at larger sizes. Still, JSONL brings important benefits for our use case:

Streaming: We can process data line by line without loading everything into memory at once.
Incremental Processing: It's simpler to parse each entry independently, which can help when partial consumption is needed.
Flexibility: Adding new records becomes easier—just append another line.
Line-Based Handling: Tools and standard Unix utilities can read and write line-delimited entries naturally.

Given these advantages, the small performance tradeoff is acceptable for scenarios where streaming and incremental processing matter. I have opened digitalbazaar/cel-spec#12 as well to attempt to move that specification in at least an array data model direction.

filip26 · 2025-01-14T19:44:17Z

Sorry, @brianorwhatever , but your comparison of parsing speeds between JSON objects and JSON Lines isn't relevant to the issue I raised.

There is no valid justification for maintaining both JSON and JSON Lines, as it only serves to introduce additional complexity.

brianorwhatever · 2025-01-14T19:48:01Z

@filip26 I have listed 4 reasons that justify why JSON Lines? The comparison of parsing speeds are in response to "improve processing speeds" in the issue title.

filip26 · 2025-01-14T19:50:46Z

@brianorwhatever Clearly, we’re not on the same page when it comes to computer science and engineering. I interpret your argument as an attempt to justify introducing it, though I’m not sure why. I raised this issue to improve WebH - take it or leave it.

andrewwhitehead · 2025-01-14T20:11:58Z

@brianorwhatever I think native JSON parsing in Python is still notoriously slow, it might be more fair to use an additional library like orjson.

brianorwhatever · 2025-01-14T21:14:40Z

FWIW others both artificial and human are on the same page

filip26 · 2025-01-14T21:29:53Z

@brianorwhatever You're comparing JSON objects and JSON Lines, and that’s exactly my point. Choose one - JSON or JSON Lines - not both, to keep things simple. Having both adds unnecessary complexity without providing any real value.

If you dislike JSON objects, consider using JSON arrays (just add the comma) and stick with JSON. That approach would be far better than forcing everyone to adopt and maintain a relatively obscure technology designed for a completely different purpose.

I’ve also noticed - though perhaps I’m mistaken - a tendency to view the world through the lens of a single programming language. Let’s also consider portability; after all, there are approximately 900 active programming languages to account for.

brianorwhatever · 2025-01-14T21:55:27Z

Ok now we are getting somewhere although I'm not sure I understand. JSON Lines is exactly that - Lines of JSON. You can't have JSON Lines without JSON.. Further we are using Data Integrity and specifically eddsa-jcs-2022 which I don't see changing..

So I think what you are arguing for is changing the file from something that looks like this (with a .jsonl) extension

{"versionId":"1-QmNt8Q34JdjyfshJkoLZhJdx725QfBeekiNJhwZ7KUP4N2","versionTime":"2025-01-10T19:27:34Z","parameters":{"method":"did:webvh:0.5","scid":"QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP","updateKeys":["z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8"],"portable":false,"nextKeyHashes":[],"witnesses":[],"witnessThreshold":0,"deactivated":false},"state":{"@context":["https://www.w3.org/ns/did/v1","https://w3id.org/security/multikey/v1"],"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","assertionMethod":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#ytNgTDR8","did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#mRpKhRQf"],"verificationMethod":[{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#ytNgTDR8","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8"},{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#mRpKhRQf","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf"}]},"proof":[{"type":"DataIntegrityProof","cryptosuite":"eddsa-jcs-2022","verificationMethod":"did:key:z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8#z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8","created":"2025-01-10T19:27:34Z","proofPurpose":"assertionMethod","proofValue":"z6wvuMGVbY29jAGpj6rSKwF4zg3cUWC4rKAi4tZbJZnu67vC3Zg143r5WMNeH28oZUYG9xBgqDqLy3GZS7SA9EDK"}]}
{"versionId":"2-QmYJji9MhMNMwjWpcR7PqfcEgN3wGxCaBzfHrDe1LMxXjC","versionTime":"2025-01-10T19:27:34Z","parameters":{"witnesses":[],"witnessThreshold":0},"state":{"@context":["https://www.w3.org/ns/did/v1","https://w3id.org/security/multikey/v1"],"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","controller":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com"],"assertionMethod":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#VSqMo7Va","did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#nVmTqzjV"],"verificationMethod":[{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#VSqMo7Va","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkf8NpNCYVtmgxeamda3mszJTqDur6TpJnyQUiVSqMo7Va"},{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#nVmTqzjV","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6MkoADLG9zCh4CymnUZnDahLzx2pfyNkCzxKQFqnVmTqzjV"}]},"proof":[{"type":"DataIntegrityProof","cryptosuite":"eddsa-jcs-2022","verificationMethod":"did:key:z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf#z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf","created":"2025-01-10T19:27:34Z","proofPurpose":"assertionMethod","proofValue":"z5TGXBbSbjgvJ4eSRSAYVBMffosJtWpxUimwSmMokfzeoynvnEv9sBPRYxEMRjihZUWCWcdLuwWj5utjyL8JxpGhz"}]}

To something that looks like this (with a .json extension)

[
  {"versionId":"1-QmNt8Q34JdjyfshJkoLZhJdx725QfBeekiNJhwZ7KUP4N2","versionTime":"2025-01-10T19:27:34Z","parameters":{"method":"did:webvh:0.5","scid":"QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP","updateKeys":["z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8"],"portable":false,"nextKeyHashes":[],"witnesses":[],"witnessThreshold":0,"deactivated":false},"state":{"@context":["https://www.w3.org/ns/did/v1","https://w3id.org/security/multikey/v1"],"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","assertionMethod":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#ytNgTDR8","did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#mRpKhRQf"],"verificationMethod":[{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#ytNgTDR8","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8"},{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#mRpKhRQf","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf"}]},"proof":[{"type":"DataIntegrityProof","cryptosuite":"eddsa-jcs-2022","verificationMethod":"did:key:z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8#z6Mkt8ZfdufKQWY1svrZenBeXsTvWjukWWTNbDAUytNgTDR8","created":"2025-01-10T19:27:34Z","proofPurpose":"assertionMethod","proofValue":"z6wvuMGVbY29jAGpj6rSKwF4zg3cUWC4rKAi4tZbJZnu67vC3Zg143r5WMNeH28oZUYG9xBgqDqLy3GZS7SA9EDK"}]},
  {"versionId":"2-QmYJji9MhMNMwjWpcR7PqfcEgN3wGxCaBzfHrDe1LMxXjC","versionTime":"2025-01-10T19:27:34Z","parameters":{"witnesses":[],"witnessThreshold":0},"state":{"@context":["https://www.w3.org/ns/did/v1","https://w3id.org/security/multikey/v1"],"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","controller":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com"],"assertionMethod":["did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#VSqMo7Va","did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#nVmTqzjV"],"verificationMethod":[{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#VSqMo7Va","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6Mkf8NpNCYVtmgxeamda3mszJTqDur6TpJnyQUiVSqMo7Va"},{"id":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com#nVmTqzjV","controller":"did:webvh:QmVt4QVuJbvMSuSHLae6kJwh9k9hZFV58yZZ4XKCxZa5RP:example.com","type":"Multikey","publicKeyMultibase":"z6MkoADLG9zCh4CymnUZnDahLzx2pfyNkCzxKQFqnVmTqzjV"}]},"proof":[{"type":"DataIntegrityProof","cryptosuite":"eddsa-jcs-2022","verificationMethod":"did:key:z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf#z6Mkvqc9EJp3YQWupmSrCEko12aq8WtUUiW65iFfmRpKhRQf","created":"2025-01-10T19:27:34Z","proofPurpose":"assertionMethod","proofValue":"z5TGXBbSbjgvJ4eSRSAYVBMffosJtWpxUimwSmMokfzeoynvnEv9sBPRYxEMRjihZUWCWcdLuwWj5utjyL8JxpGhz"}]}
]

Your argument is:

.jsonl is an obscure file format
parsing the data is more complicated
parsing the data performs worse

My argument is:

Although it is obscure it is extremely simple
I don't think it's that complicated it just requires splitting on \n
the parsing performance might be slightly worse, but the memory requirements can be lower as you don't have to load the entire log into memory everytime.
streaming is impossible with json
entries can be accessed and manipulated by line
writing a new log entry can be an append operation and doesn't require a full file rewrite like in JSON

Does that sum it up? I'd be interested in hearing any implementers point of view of whether JSON Lines is a dealbreaker when looking to implement did:webvh

filip26 · 2025-01-14T21:59:44Z

As an implementer 😉, yes, this is indeed a bit of a blocker:

I don’t want to preprocess data just to replace \n with , - It’s redundant, negates whatever minor benefits JSON Lines might provide, and makes things even worse
I don’t want to introduce or rely on an unmaintained JSON Lines implementation.
I don’t want to implement or maintain JSON Lines myself.

brianorwhatever · 2025-01-14T22:14:15Z

haha yes I meant other implementers as I suspected that was your answer.

All 3 of these points are essentially the same point as "implementation" and "preprocessing" are the same thing. There isn't anything more to maintain other then splitting the file on new line characters. After that you have an array of strings to be processed as JSON and verified as per the spec. Any programming language that can read a file and split a string can easily do this.

I agree - if it were more work than this it wouldn't be worth the benefits (minor to you, major to me).

I am interested in seeing how this shakes out in CEL (see digitalbazaar/cel-spec#3 and digitalbazaar/cel-spec#12) and hopefully we can some day align this spec with that one 😄

Thanks for the discussion

filip26 · 2025-01-14T22:19:04Z

@brianorwhatever It’s not the same 😉 - one point is about time complexity, while the other two focus on portability, implementation and maintenance costs.

How many languages have solid support for JSON Lines? That’s why obscure: it offers little value, limited support, and is only valid for a narrow set of use cases. webvh is not one of them.

[Updated after the comment below: Obviously, I meant libraries/packages/etc. - simply existing solid implementations to use. This is getting ridiculous. I’m sorry, but I don’t understand the strong pushback. My intention is solely to help and make this easier for others to adopt - it should be a primary goal for any specification.]

brianorwhatever · 2025-01-14T22:26:38Z

0 languages have support for JSON Lines. They don't need it. We could probably write the required code for the top 20 programming languages in the time we've spent on this thread 🥲

brianorwhatever · 2025-01-15T04:53:35Z

In any software system there is a point where new software needs to be written and libraries aren't necessary. I am proposing this lives in that area, and I don't believe it takes up too much of it. I acknowledge the slight performance increase of JSONL but I currently believe the benefits it gives (easy appending, streaming support, line-based tools) to be worth it.

brianorwhatever mentioned this issue Jan 14, 2025

Use array for data model digitalbazaar/cel-spec#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

filip26 commented Dec 13, 2024 •

edited

Loading

brianorwhatever commented Jan 14, 2025

Python

GO

NODE JS

filip26 commented Jan 14, 2025

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025

andrewwhitehead commented Jan 14, 2025

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025 •

edited

Loading

brianorwhatever commented Jan 14, 2025 •

edited

Loading

filip26 commented Jan 14, 2025 •

edited

Loading

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025 •

edited

Loading

brianorwhatever commented Jan 14, 2025

brianorwhatever commented Jan 15, 2025

Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160

Comments

filip26 commented Dec 13, 2024 • edited Loading

brianorwhatever commented Jan 14, 2025

Python

GO

NODE JS

filip26 commented Jan 14, 2025

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025

andrewwhitehead commented Jan 14, 2025

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025 • edited Loading

brianorwhatever commented Jan 14, 2025 • edited Loading

filip26 commented Jan 14, 2025 • edited Loading

brianorwhatever commented Jan 14, 2025

filip26 commented Jan 14, 2025 • edited Loading

brianorwhatever commented Jan 14, 2025

brianorwhatever commented Jan 15, 2025

filip26 commented Dec 13, 2024 •

edited

Loading

filip26 commented Jan 14, 2025 •

edited

Loading

brianorwhatever commented Jan 14, 2025 •

edited

Loading

filip26 commented Jan 14, 2025 •

edited

Loading

filip26 commented Jan 14, 2025 •

edited

Loading