Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work In Progress] VReplicating JSON Columns: fix data loss with large NUMERIC and DECIMAL values #12731

Closed
wants to merge 1 commit into from

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Mar 24, 2023

Description

Motivation

Large fixed point NUMERIC and DECIMAL values in JSON Objects are not correctly vreplicated today. Such numbers are converted to a float64 by the binlog json parser.

So a json value like {"a": 12345678901234567890123456789012345678901234567890123456789012345678901234567890, "b": 987654321.012345678901234567890} on the source gets replicated like {"a": 1.2345678901234573e79, "b": 987654321.0123456}

Approach

VStreamer

The VStreamer needs to deserialize the binlog values exactly. This is done by adding support for fixed type integers and decimals in ajson, the library we use today to define a json value in an ast and stringify it while creating the correspondig VEvent.

VPlayer

The VPlayer needs to generate mysql queries so that MySQL retains precision.MySQL will only keep the true value of fixed point values if they are specified in a json_object() while inserting or updating. If they are just specified as a string representation, the mysql parser will convert it to a float and lose precision. Same holds for json arrays.

The snippets below illustrate the issue:

insert into jsontest values 
('{"a": 12345678901234567890123456789012345678901234567890123456789012345678901234567890,
 "b": 987654321.012345678901234567890}');

insert into jsontest values 
(json_object("a",12345678901234567890123456789012345678901234567890123456789012345678901234567890, 
"b", 987654321.012345678901234567890)); 

select * from jsontest;
+-------------------------------------------------------------------------------------------------------------------------------+
| j                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------+
| {"a": 1.2345678901234573e79, "b": 987654321.0123456}                                                                          |
| {"a": 12345678901234567890123456789012345678901234567890123456789012345678901234567890, "b": 987654321.012345678901234567890} |
+-------------------------------------------------------------------------------------------------------------------------------+

So we need to also generate json_object()s and json_array()s while inserting json (dictionary) objects and arrays so as to replicate the exact value.

Status

  • VStreamer now generates data correctly for DECIMALS with a local change to ajson which has not yet been pushed to our ajson fork. VStreamer: improve representation of integers in json data types #12630 already fixed issues with large integers like 930701976723823 which would get vreplicated as 9.30701976723823e+14 and 1234567890 which would get vreplicated as 1234567890.0 .

  • TODO:

Related Issue(s)

#8686

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 24, 2023
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Mar 24, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@rohit-nayak-ps rohit-nayak-ps added Type: Bug Component: VReplication and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Mar 24, 2023
@rohit-nayak-ps rohit-nayak-ps changed the title [Work In Progress] VReplicating JSON Columns: switch to using Vitess sqlparser' ast package [Work In Progress] VReplicating JSON Columns: fix data loss with large NUMERIC and DECIMAL values Mar 27, 2023
@rohit-nayak-ps rohit-nayak-ps added the Skip CI Skip CI actions from running label Mar 27, 2023
@rohit-nayak-ps rohit-nayak-ps force-pushed the rn-json-object branch 3 times, most recently from 052912f to 70fdc68 Compare March 27, 2023 09:42
…perly. Add more decimal tests to unit and e2e tests

Signed-off-by: Rohit Nayak <[email protected]>
@rohit-nayak-ps
Copy link
Contributor Author

Closed in favor of #12761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant