Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VStreamer: improve representation of integers in json data types #12630

Merged
merged 2 commits into from
Mar 16, 2023

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Mar 14, 2023

Description

The binlog parser in vstreamer currently uses the github.com/spyzhov/ajson module to decode the value from the binlog image to its json value. However the library only supports a single type Numeric (float64) as a catchall for all numeric types including signed and unsigned integers. As a consequence, the generated JSON represents integers as floats and the string representation in a VEvent can contain decimals or values in scientific notation. So integers can be stored as floats on the target and larger ints sent with scientific notation in VStream events.

This results in VDiff failures since the json strings stored are different. Also parsing the VEvents sent using the VStream API can result in errors if, for example, the JSON is being parsed by golang. See #8686.

This PR uses a forked version of the library, https://github.com/rohit-nayak-ps/ajson, that adds Integer and UnsignedInteger data type JSON Nodes. Once we submit the related changes upstream to github.com/spyzhov/ajson and they get merged we will switch back to using upstream again.

Minor refactoring is also done as part of this PR.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

…ssue of integers being parsed as float64 by the source binlog parser. This results in larger integers being stored as floats on the target and sent with scientific notation in vstream events.

Signed-off-by: Rohit Nayak <[email protected]>
@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 14, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Mar 14, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@rohit-nayak-ps rohit-nayak-ps added Type: Bug Component: VReplication Forwardport to: main This will forward port the PR to the main branch and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 14, 2023
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review March 14, 2023 21:31
@rohit-nayak-ps rohit-nayak-ps requested review from dbussink and removed request for systay and ajm188 March 14, 2023 21:31
data: []byte{9, 255, 255, 255, 255, 255, 255, 255, 127},
expected: `9223372036854775807`,
}, {
name: "uint16/1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we use uint16/1 here and int64 -1 above?

@@ -65,7 +65,6 @@ require (
github.com/spf13/cobra v1.6.1
github.com/spf13/pflag v1.0.5
github.com/spf13/viper v1.15.0
github.com/spyzhov/ajson v0.7.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given spyzhov/ajson#63 seems positive about merging this upstream, would it be preferable here to use a go mod replacement instead of a hard fork? We can point the replacement at the fork for now until it's part of upstream.

That also makes it easier to do the actual PR upstream since you don't need to rename packages etc. then there as well in our fork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will make the change now.

Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink comment looks reasonable.

…to the upstream module once changes are made there

Signed-off-by: Rohit Nayak <[email protected]>
@dbussink
Copy link
Contributor

While this does somewhat improve on the current situation, JSON vreplication is still broken with this change. It only solves the range of numbers from 2^53 to 2^64 for integers and it doesn't handle decimals at all (decimals are today also broken already).

Take the following table definition:

CREATE TABLE t1 (id int not null auto_increment primary key, jdoc JSON);    

When inserting data using the MySQL JSON object parser, it shows that MySQL parses into doubles itself (according to the JSON standard):

INSERT INTO t1 (jdoc) VALUES('{"int": 2313123, "very_large_int": 21312332514254356413641614614, "large_decimal": 31234321532546461346413641.5154151, "double": 3251464164.341e0 }');

test/main> select * from t1;
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| jdoc                                                                                                                                             |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| {"int": 2313123, "double": 3251464164.341, "large_decimal": 3.123432153254646e25, "very_large_int": 2.1312332514254357e28}                       |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
test/main> drop table t1;

But, it's also possible to build a JSON object using JSON_OBJECT and literals in MySQL. Those literals are then potential other numerical types:

test/main> INSERT INTO t1 (jdoc) VALUES(JSON_OBJECT("int", 2313123, "very_large_int", 21312332514254356413641614614, "large_decimal", 31234321532546461346413641.5154151, "double", 3251464164.341e0));
test/main> select * from t1;
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+
| id | jdoc                                                                                                                                             |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | {"int": 2313123, "double": 3251464164.341, "large_decimal": 3.123432153254646e25, "very_large_int": 2.1312332514254357e28}                       |
|  2 | {"int": 2313123, "double": 3251464164.341, "large_decimal": 31234321532546461346413641.5154151, "very_large_int": 21312332514254356413641614614} |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+

What can be seen here is that MySQL maintains the types and stores decimals (and the very large integer is also stored as a decimal).

But now, if we'd run a vreplication workflow on this like with using online DDL, it breaks the actual data stored (alter table t1 add column foo varchar(100) using online DDL):

test/|⚠ main ⚠|> select * from t1;
+----+----------------------------------------------------------------------------------------------------------------------------+------+
| id | jdoc                                                                                                                       | foo  |
+----+----------------------------------------------------------------------------------------------------------------------------+------+
|  1 | {"int": 2313123, "double": 3251464164.341, "large_decimal": 3.123432153254646e25, "very_large_int": 2.1312332514254357e28} | NULL |
|  2 | {"int": 2313123, "double": 3251464164.341, "large_decimal": 3.123432153254646e25, "very_large_int": 2.1312332514254357e28} | NULL |
+----+----------------------------------------------------------------------------------------------------------------------------+------+

What can be seen here is that also the decimal values and integer outside of int64 range are now converted into doubles.

@rohit-nayak-ps rohit-nayak-ps merged commit 2b43fd7 into vitessio:release-16.0 Mar 16, 2023
@rohit-nayak-ps rohit-nayak-ps deleted the rn-json-bigint branch March 16, 2023 16:41
@hmaurer hmaurer mentioned this pull request Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Forwardport to: main This will forward port the PR to the main branch Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants