-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace JSON Lines with JSON to simplify implementation, improve processing speed, and enhance extensibility #160
Comments
I ran some comparisons in python, go and node (see below) Comparison Results
Python--- Testing with 10000 items --- --- Testing with 100000 items --- --- Testing with 1000000 items --- GO--- Testing with 10000 items --- --- Testing with 100000 items --- --- Testing with 1000000 items --- NODE JS--- Testing with 10000 items --- --- Testing with 100000 items --- --- Testing with 1000000 items --- Python: Single JSON parse is roughly 2–3x faster than JSONL parsing at all tested sizes. For 1 million items, it finishes in 1228ms compared to JSONL’s 2834ms. Go: Single JSON parse is consistently faster for all sizes. The difference ranges from 7ms at 10,000 items to about 865ms at 1 million items. Node: Single JSON parse is faster for 10,000 and 100,000 items. At 1 million items, JSONL parse finishes sooner (1441ms) compared to single JSON parse (2218ms). These results do show a slight performance preference for a single JSON array in Python and Go, with Node having some interesting behavior at larger sizes. Still, JSONL brings important benefits for our use case:
Given these advantages, the small performance tradeoff is acceptable for scenarios where streaming and incremental processing matter. I have opened digitalbazaar/cel-spec#12 as well to attempt to move that specification in at least an array data model direction. |
Sorry, @brianorwhatever , but your comparison of parsing speeds between JSON objects and JSON Lines isn't relevant to the issue I raised. There is no valid justification for maintaining both JSON and JSON Lines, as it only serves to introduce additional complexity. |
@filip26 I have listed 4 reasons that justify why JSON Lines? The comparison of parsing speeds are in response to "improve processing speeds" in the issue title. |
@brianorwhatever Clearly, we’re not on the same page when it comes to computer science and engineering. I interpret your argument as an attempt to justify introducing it, though I’m not sure why. I raised this issue to improve WebH - take it or leave it. |
@brianorwhatever I think native JSON parsing in Python is still notoriously slow, it might be more fair to use an additional library like orjson. |
FWIW others both artificial and human are on the same page |
@brianorwhatever You're comparing JSON objects and JSON Lines, and that’s exactly my point. Choose one - JSON or JSON Lines - not both, to keep things simple. Having both adds unnecessary complexity without providing any real value. If you dislike JSON objects, consider using JSON arrays (just add the comma) and stick with JSON. That approach would be far better than forcing everyone to adopt and maintain a relatively obscure technology designed for a completely different purpose. I’ve also noticed - though perhaps I’m mistaken - a tendency to view the world through the lens of a single programming language. Let’s also consider portability; after all, there are approximately 900 active programming languages to account for. |
Ok now we are getting somewhere although I'm not sure I understand. JSON Lines is exactly that - Lines of JSON. You can't have JSON Lines without JSON.. Further we are using Data Integrity and specifically So I think what you are arguing for is changing the file from something that looks like this (with a
To something that looks like this (with a
Your argument is:
My argument is:
Does that sum it up? I'd be interested in hearing any implementers point of view of whether JSON Lines is a dealbreaker when looking to implement |
As an implementer 😉, yes, this is indeed a bit of a blocker:
|
haha yes I meant other implementers as I suspected that was your answer. All 3 of these points are essentially the same point as "implementation" and "preprocessing" are the same thing. There isn't anything more to maintain other then splitting the file on new line characters. After that you have an array of strings to be processed as JSON and verified as per the spec. Any programming language that can read a file and split a string can easily do this. I agree - if it were more work than this it wouldn't be worth the benefits (minor to you, major to me). I am interested in seeing how this shakes out in CEL (see digitalbazaar/cel-spec#3 and digitalbazaar/cel-spec#12) and hopefully we can some day align this spec with that one 😄 Thanks for the discussion |
@brianorwhatever It’s not the same 😉 - one point is about time complexity, while the other two focus on portability, implementation and maintenance costs. How many languages have solid support for JSON Lines? That’s why obscure: it offers little value, limited support, and is only valid for a narrow set of use cases. webvh is not one of them. [Updated after the comment below: Obviously, I meant libraries/packages/etc. - simply existing solid implementations to use. This is getting ridiculous. I’m sorry, but I don’t understand the strong pushback. My intention is solely to help and make this easier for others to adopt - it should be a primary goal for any specification.] |
0 languages have support for JSON Lines. They don't need it. We could probably write the required code for the top 20 programming languages in the time we've spent on this thread 🥲 |
In any software system there is a point where new software needs to be written and libraries aren't necessary. I am proposing this lives in that area, and I don't believe it takes up too much of it. I acknowledge the slight performance increase of JSONL but I currently believe the benefits it gives (easy appending, streaming support, line-based tools) to be worth it. |
Hi,
I’d like to propose avoiding JSON Lines for the following reasons:
Added Complexity
Limited Extensibility
Inefficient Processing
Using JSON improves adoption, speeds up processing, and supports extensibility.
Please consider the outcome. Thank you.
JSON Lines were likely intended to serve as a replacement for CSV
The text was updated successfully, but these errors were encountered: