Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation throws: InexactError: check_top_bit(UInt64, -3141633) #177

Open
nathanaelbosch opened this issue May 10, 2023 · 14 comments
Open

Comments

@nathanaelbosch
Copy link

nathanaelbosch commented May 10, 2023

I wanted to use package that relies on Octavian, but the precompilation step fails in the following way:

julia> using Octavian
[ Info: Precompiling Octavian [6fd5a793-0b7e-452c-907f-f8bfe9c57db4]
ERROR: LoadError: InexactError: check_top_bit(UInt64, -3141633)
Stacktrace:
  [1] throw_inexacterror(f::Symbol, #unused#::Type{UInt64}, val::Int64)
    @ Core ./boot.jl:634
  [2] check_top_bit
    @ ./boot.jl:648 [inlined]
  [3] toUInt64
    @ ./boot.jl:759 [inlined]
  [4] UInt64
    @ ./boot.jl:789 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:492 [inlined]
  [7] malloc
    @ ./libc.jl:355 [inlined]
  [8] valloc
    @ ~/.julia/packages/VectorizationBase/0dXyA/src/alignment.jl:36 [inlined]
  [9] init_bcache
    @ ~/.julia/packages/Octavian/XhL0C/src/init.jl:19 [inlined]
 [10] __init__()
    @ Octavian ~/.julia/packages/Octavian/XhL0C/src/init.jl:3
 [11] macro expansion
    @ ~/.julia/packages/Octavian/XhL0C/src/Octavian.jl:80 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/SnoopPrecompile/1XXT1/src/SnoopPrecompile.jl:119 [inlined]
 [13] top-level scope
    @ ~/.julia/packages/Octavian/XhL0C/src/Octavian.jl:77
 [14] include
    @ ./Base.jl:457 [inlined]
 [15] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
    @ Base ./loading.jl:2010
 [16] top-level scope
    @ stdin:2
in expression starting at /home/me/.julia/packages/Octavian/XhL0C/src/Octavian.jl:1
in expression starting at stdin:2
ERROR: Failed to precompile Octavian [6fd5a793-0b7e-452c-907f-f8bfe9c57db4] to "/home/me/julia/compiled/v1.9/Octavian/jl_1m6rbd".

This is on a compute cluster, so the error might be linked to the setup I suppose, but I don't know enough about such things to figure out how to solve this. Any pointers?

EDIT: This happens both on the new Julia 1.9.0, as well as on 1.8.5

@chriselrod
Copy link
Collaborator

This is on a compute cluster, so the error might be linked to the setup I suppose

It's probably not reading the cache sizes directly.

@nathanaelbosch
Copy link
Author

Any idea on how I can fix this?

@chriselrod
Copy link
Collaborator

chriselrod commented May 10, 2023

What do you get for

julia> using CPUSummary

julia> CPUSummary.cache_size(Val(1))
static(32768)

julia> CPUSummary.cache_size(Val(2))
static(1048576)

julia> CPUSummary.cache_size(Val(3))
static(1441792)

julia> using Hwloc

julia> Hwloc.cachesize()
(L1 = 32768, L2 = 1048576, L3 = 20185088)

@nathanaelbosch
Copy link
Author

julia> CPUSummary.cache_size(Val(1))
static(32768)

julia> CPUSummary.cache_size(Val(2))
static(4194304)

julia> CPUSummary.cache_size(Val(3))
static(1048576)

julia> Hwloc.cachesize()
(L1 = 32768, L2 = 4194304, L3 = 16777216)

@chriselrod
Copy link
Collaborator

chriselrod commented May 10, 2023

That all looks correct, except -- you have 4 MiB of L2 cache? What CPU are you on?

julia> versioninfo()
Julia Version 1.10.0-DEV.1254
Commit b9b8b38ec0 (2023-05-09 20:47 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: 28 × Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 41 on 28 virtual cores
Environment:
  JULIA_PATH = @.
  LD_LIBRARY_PATH = /usr/local/lib/
  JULIA_NUM_THREADS = 28

julia> ccall(:jl_getpagesize, Int, ())
4096

and let's confirm jl_getpagesize is returning correctly.

@chriselrod
Copy link
Collaborator

It'd be easier for you to debug these yourself and tell me what is wrong.
Why do we have an invalid call to malloc?

  [7] malloc
    @ ./libc.jl:355 [inlined]
  [8] valloc
    @ ~/.julia/packages/VectorizationBase/0dXyA/src/alignment.jl:36 [inlined]
  [9] init_bcache
    @ ~/.julia/packages/Octavian/XhL0C/src/init.jl:19 [inlined]

https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/00d50b3fb270f23d7f94dedd261cd95a3fb25af3/src/init.jl#LL16C1-L27C4

function init_bcache()
  if bcache_count()  Zero()
    if BCACHEPTR[] == C_NULL
      BCACHEPTR[] = VectorizationBase.valloc(
        Threads.nthreads() * second_cache_size() * bcache_count(),
        Cvoid,
        ccall(:jl_getpagesize, Int, ())
      )
    end
  end
  nothing
end

calls

function valloc(
  N::Union{Integer,StaticInt},
  ::Type{T} = Float64,
  a = max(register_size(), cache_linesize())
) where {T}
  # We want alignment to both vector and cacheline-sized boundaries
  size_T = max(1, sizeof(T))
  reinterpret(
    Ptr{T},
    align(reinterpret(UInt, Libc.malloc(size_T * N + a - 1)), a)
  )
end

https://github.com/JuliaSIMD/VectorizationBase.jl/blob/9174dcca731144935e438d44ba07f4e4ec3a66c6/src/alignment.jl#L29-L40

So

N = Threads.nthreads() * Octavian.second_cache_size() * Octavian.bcache_count()
a = ccall(:jl_getpagesize, Int, ())
N + a - 1

Seems to be negative.

You can copy paste the definitions of Octavian.bcache_count and Octavian.second_cache_size.

@nathanaelbosch
Copy link
Author

nathanaelbosch commented May 11, 2023

Thanks a lot for your help, I really appreciate it.

That all looks correct, except -- you have 4 MiB of L2 cache? What CPU are you on?

versioninfo()
julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 2 on 64 virtual cores
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_STACKTRACE_MINIMAL = true

and let's confirm jl_getpagesize is returning correctly.

jl_getpagesize
julia> ccall(:jl_getpagesize, Int, ())
4096

It'd be easier for you to debug these yourself and tell me what is wrong.
So [...] Seems to be negative.

It is negative indeed! I get

julia> N + a - 1
-6287361

This line seems to be the issue:

_second_cache_size(scs::StaticInt, ::True) = scs - cache_size(first_cache())

According to CPUSummary I have

julia> (CPUSummary.cache_size(second_cache()), CPUSummary.cache_size(first_cache()))
(static(1048576), static(4194304))

so the former minus the latter gives negative number.

You mentioned that the results I wrote ealier (#177 (comment)) looked correct, but were they? The L3 numbers reported by CPUSummary and Hwloc are different ones, and in particular, if CPUSummary reported the Hwloc number, then this would not get negative (but again I really don't know anything about hardware so this might make no sense).

@chriselrod
Copy link
Collaborator

chriselrod commented May 11, 2023

Cascadelake has 1 MiB of L2 cache/core. So the 4 MiB reported is wrong.
Furthermore

julia> sc = Octavian.second_cache()
static(3)

julia> Octavian.cache_inclusive(sc)
static(false)

the cache is not inclusive either, so it shouldn't be subtracting.

CPUSummary is suposed to report per-core sizes, hence the discrepancy for the L3 cahce vs hwloc.
Octavian also shouldn't be trying to use a greater allotment of cache than the number of threads it has, as it can't assume the other threads aren't busy working on something else (if they weren't, Octavian itself could/should've been multithreaded).

@sloede
Copy link

sloede commented May 29, 2023

I am getting the same errors on my machine. It is not a compute cluster but a virtual machine by a large German cloud provider (Hetzner). I also get the reported numbers

julia> (CPUSummary.cache_size(second_cache()), CPUSummary.cache_size(first_cache()))
(static(2097152), static(4194304))

That is, the first cache is reported much larger then the second one and thus N is already negative.

@nathanaelbosch did you find a way to fix this issue for you?
@chriselrod If I reach out to their support regarding this, what exactly should I tell them (preferably without having to rely on Julia terminology)? That their setup reports the wrong cache sizes for the L2 and L3 caches?

@nathanaelbosch
Copy link
Author

@sloede Unfortunately I did not find a way to fix this. But I would be very interested in a solution to this issue.

@chriselrod
Copy link
Collaborator

As a workaround, we could hardcode values for certain architectures, e.g. check for

julia> Sys.CPU_NAME
"cascadelake"

@sloede
Copy link

sloede commented Jun 2, 2023

Is the amount of cache fixed for certain architectures? In my case, they identify as Skylake, as far as I can tell

@chriselrod
Copy link
Collaborator

chriselrod commented Jun 2, 2023

Yes. CPUSummary reports cache per core, so that Octavian can assume the total L3 cache it can use is proportional to the number of cores it is using. (If other cores are doing something else, they're likely to use/want some chunk of the L3 themselves.)

Skylake-avx512 and cascadelake are almost the same thing. They both have the same number of L1 and L2 cache per core: 32 KiB L1d, 32 KiB L1i, 1024 KiB L2.

They also have something like a 1.375 MiB L3 slice per core, shared among all cores.
I'm assuming your server is skylake-avx512 rather than skylake?

@sloede
Copy link

sloede commented Jun 2, 2023

I'm assuming your server is skylake-avx512 rather than skylake?

Yes, it looks like it - `julia -e 'using InteractiveUtils; versioninfo(verbose=true)' gives me the following:

Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.6 LTS
  uname: Linux 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64 x86_64
  CPU: Intel Xeon Processor (Skylake, IBRS): 
              speed         user         nice          sys         idle          irq
       #1  2099 MHz      10207 s          0 s        968 s      67491 s          0 s
       #2  2099 MHz      10011 s          0 s       1001 s      67651 s          0 s
       #3  2099 MHz       9902 s          0 s        940 s      67830 s          0 s
       #4  2099 MHz      10310 s          0 s        969 s      67400 s          0 s
       #5  2099 MHz        242 s          0 s        210 s      78168 s          0 s
       #6  2099 MHz        250 s          0 s        200 s      78101 s          0 s
       #7  2099 MHz        235 s          0 s        212 s      78164 s          0 s
       #8  2099 MHz        265 s          8 s        210 s      78115 s          0 s
  Memory: 8.0 GB (7221.390625 MB free)
  Uptime: 7880.56 sec
  Load Avg:  0.54  0.12  0.04
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 8 virtual cores
Environment:
  GITHUB_PATH = /_work/github-runner-1-3/_temp/_runner_file_commands/add_path_cec3c7ec-60a7-4b35-842d-3fdcb4ffc5f2
  HOME = /root
  GITHUB_EVENT_PATH = /_work/github-runner-1-3/_temp/_github_workflow/event.json
  PATH = /opt/hostedtoolcache/julia/1.9.0/x64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/actions-runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants