Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More carry calculation fixing #835

Merged

Conversation

reunanen
Copy link
Contributor

Add test cases provided by @azrafe7, and update carry calculation to a method that seems to actually work.

I can't say that I fully understand why exactly, but I too compared this against the naive method (using a native uint128 type), and this approach seems to work when the previous one does not always give exactly the same results.

@tomwiel
Copy link

tomwiel commented May 12, 2024

A small observation: This change leads to five mult-operations for the product,
while other implementations (like clipper1 or Emulate64x64to128) just need four.

@reunanen
Copy link
Contributor Author

@tomwiel Right – and that's because carry is calculated separately, which is leftover from a previous version that bailed out after a straightforward (carry-ignoring) multiplication, in case the result was not equal. Good catch, let me try to improve on this front...

@AngusJohnson
Copy link
Owner

AngusJohnson commented May 12, 2024

I can't say that I fully understand why exactly

I think I've finally gotten my head around this ...

  a * b ==>
  split a and b into upper and lower 32bits
  (aHi + aLo) * (bHi + bLo) ==>
  (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo) [ie 4 multiples]
  1. aHi * bHi: XXXXXXXX00000000 * XXXXXXXX00000000
  2. aHi * bLo: XXXXXXXX00000000 * 00000000XXXXXXXX
  3. aLo * bHi: XXXXXXXX00000000 * 00000000XXXXXXXX
  4. aLo * bLo: 00000000XXXXXXXX * 00000000XXXXXXXX
     { overflow bits }
  1. XXXXXXXX XXXXXXXX 00000000 00000000 +
  2. 00000000 XXXXXXXX XXXXXXXX 00000000 +
  3. 00000000 XXXXXXXX XXXXXXXX 00000000 +
  4. 00000000 00000000 XXXXXXXX XXXXXXXX
  given bit shifting to keep each multiplication within 64 bits
  then overflow bits equals ...
  a. all of (1) PLUS
  b. upper 32bits of both (2) and (3) PLUS
  c. overflow of addition of (4) and lower 32bits of both (2) and (3)
  note: overflow of addition in c. will be between 0 and 2.

What I hadn't appreciated until now is the addition c. above can potentially overflow by 2.

@reunanen
Copy link
Contributor Author

This change leads to five mult-operations for the product

This should be fixed now.

@AngusJohnson AngusJohnson merged commit 82cd887 into AngusJohnson:main May 12, 2024
7 checks passed
@reunanen reunanen deleted the more-carry-calculation-fixing branch May 12, 2024 13:44
@azrafe7
Copy link

azrafe7 commented May 12, 2024

I can't say that I fully understand why exactly

I think I've finally gotten my head around this ...

  // aLo = a & 0xFFFFFFFF;
  // aHi = a & 0xFFFFFFFF00000000;
  // bLo = b & 0xFFFFFFFF;
  // bHi = b & 0xFFFFFFFF00000000;

  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  // (aHi * bHi) => up to 128bits where bottom 64bits must be 0
  // (aHi * bLo) and (bHi * aLo)  => up to 96bits where bottom 32bits must be 0
  // (aLo * bLo) => up to 64bits

  // 64bit overflow carry of a * b consists of 
  // 1. all of (aHi * bHi) PLUS 
  // 2. the upper 32bits of both (aHi * bLo) and (bHi * aLo) PLUS
  // 3. 0 - 2: the overflow of ((aHi * bLo)<<32) + (bHi * aLo)<<32 + (aLo * bLo) 

What I hadn't appreciated until now is that ((aHi * bLo)<<32) + (bHi * aLo)<<32 + (aLo * bLo) can potentially overflow by 2.

Yes, just realized that much later than you. 😅

And found out that that extra_carry can be computed as:

uint64_t extra_carry = (((aHiShr * bLo) & 0xFFFFFFFF) +
                        ((bHiShr * aLo) & 0xFFFFFFFF) +
                        ((aLo * bLo) >> 32)) >> 32;

@AngusJohnson
Copy link
Owner

And found out that that extra_carry can be computed as:

Neat. Thanks.
And I've just rewritten my overflow explanation above so I hope it's less confusing about when aHi etc are shifted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants