Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang inlining optimisation issue with pure release static build in json parsing - possibly null termination code change caused it #1566

Open
arturbac opened this issue Jan 14, 2025 · 2 comments

Comments

@arturbac
Copy link
Contributor

In one of my projects when i tried to upgrade version from v3.2.5 i got an error in system tests.
I will point out everything I have found so far to narrow down it.
it affects only static linked release build, on asan/ubsan instrumented and debug sahred builds it is ok.

3.2.6 - everything was still so far ok.
3.3.0 - release of non null termianted support ~7 errors in release build of >1000 tests
in later versions up to 4.2.2 there is still one error in my system tests but it is not possible to make repro as repro json works.
error is deterministic, always the same.

I tried parsing with null termination, without, even creating resize string buffer twice the size to ensure 0.
In below test You can see that I am outputing json string for which parsing fails so this makes sure problem is not in my code with invalid json data.

template<glz::read_json_supported T, class Buffer>
[[nodiscard]]
expected<T, glz::error_ctx> read_json(Buffer && buffer) noexcept
  {
  T value{};
  glz::context ctx{};
  glz::error_ctx const ec = read<glz::opts{.null_terminated = true}>(value, std::forward<Buffer>(buffer), ctx);
  if(ec)
    return unexpected<glz::error_ctx>(ec);
  return value;
  }

std::string data{response.jsonrpc};
    size_t orgsize{data.size()};
    data.resize(orgsize*2);
    data.resize(orgsize);
    auto geocode_res{read_json<json_rpc_response>(std::move(data))};
    namespace ut = boost::ut;

    if(!geocode_res)
      {
      ut::expect(false) << [&]
      {
        return std::format(
          "failed to decode json rpc v2 response for :{} msg:{} glz ec:{}:{} loc:{}\njson:\n{}", 
          tu.test_name,
          geocode_res.error().custom_error_message,
          int(geocode_res.error().ec),
          geocode_res.error().ec,
          geocode_res.error().location,
          response.jsonrpc
        );
      };
      return false;
      }
    json_rpc_response result{std::move(geocode_res.value())};

which produces when run in batch

Running "1348_GeeQD: Domaniewska 37"...
  verify.h:140:FAILED [false] failed to decode json rpc v2 response for :1348_GeeQD: Domaniewska 37 msg: glz ec:18:expected_comma loc:153
json:
{"jsonrpc":"2.0","result":{"data":[],"total_us":698,"search_us":692,"build_date":"","map_ver":"","geob_ver":-1763427336,"crap_streets_word_filtered":true,"encoding_failure_detected":false,"joined_words":false,"no_house_words_in_query":false,"place_mismatch_limit_exceeded":false,"street_mismatch_limit_exceeded":false},"id":"42"}
FAILED

as You can see json is correct after formatting it

{
  "jsonrpc": "2.0",
  "result": {
    "data": [],
    "total_us": 698,
    "search_us": 692,
    "build_date": "",
    "map_ver": "",
    "geob_ver": -1763427336,
    "crap_streets_word_filtered": true,
    "encoding_failure_detected": false,
    "joined_words": false,
    "no_house_words_in_query": false,
    "place_mismatch_limit_exceeded": false,
    "street_mismatch_limit_exceeded": false
  },
  "id": "42"
}

So i suspected - parser code optimisation error (maybe type erasure and lifetime ?) or some static buffer in glaze containing garbage from previous runs, all within bounds of allocations.

So I stared then with changing flags for static linking build
lowered -O3 to -O1

clang-19

  • WORKS OK ```-O1 -DDEBUG -fno-omit-frame-pointer -fno-strict-aliasing -static-libstdc++ -static-libgcc``
  • WORKS OK -O1 -DDEBUG -fno-omit-frame-pointer -fstrict-aliasing -static-libstdc++ -static-libgcc
  • FAILS -O1 -DDEBUG -fomit-frame-pointer -fno-strict-aliasing -static-libstdc++ -static-libgcc

so clang-19 fails with default omit-frame-pointer in Release static linked build in glaze
clang-20 same

  • FAILS -O1 -DDEBUG -fomit-frame-pointer -fno-strict-aliasing -static-libstdc++ -static-libgcc

  • WORKS OK -O0 -DNDEBUG -static-libstdc++ -static-libgcc (no omiting frame pointer as it is -O0

looks like inlining in static build is causing code to break.
I can not narrow down it more than version 3.3.0 stated to cause it (3.2.6 was ok) and that it is related to optimisation any level when functions are inlined

@arturbac
Copy link
Contributor Author

BTW reported error is always the same and it is a hint for where in code there is some error

glz ec:18:expected_comma loc:153
``

@arturbac
Copy link
Contributor Author

another hint from AI- it may be related to alignment of data and avx.

Template Function Stack Frame Management:

Look for template functions that manipulate large string buffers on stack
Check for functions using SIMD operations or aligned memory access
Examine recursive template instantiations that could affect stack layout

Memory Access Patterns:

Search for string parsing functions using fixed buffer sizes
Check for assumptions about stack alignment in template functions
Look for SSE/AVX operations that require specific alignment

Template Instantiation Side Effects:

Functions that rely on specific stack layout
Aggressive inlining combined with large local buffers
Template metaprogramming that affects function prologue/epilogue

Specific Code Patterns to Look For:

alignas specifications in template classes
String view operations with stack-based buffers
SFINAE or template specializations that might affect code generation
Recursive template instantiations with large stack usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant