Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid read in SST writer with WAN backend and data volume larger than 2GB #3195

Open
franzpoeschel opened this issue May 2, 2022 · 12 comments
Assignees

Comments

@franzpoeschel
Copy link
Contributor

Describe the bug
I'm trying to use the WAN backend of SST to stream data on a system where no compatible libfabric backend is available. Up until 2GB per step, everything runs fine, after this I get a segfault in the writer.

Valgrind output:

==373784== Invalid read of size 1                                                                                                                                                                          
==373784==    at 0x9029A51: INT_CMwrite_raw_notify (cm.c:3122)                                                                                                                                             
==373784==    by 0x90298F3: INT_CMwrite_raw (cm.c:3093)                                                                                                                                                    
==373784==    by 0x902B0C1: INT_CMwrite_attr (cm.c:3323)                                                                                                                                                   
==373784==    by 0x9027861: INT_CMwrite (cm.c:2733)                                                                                                                                                        
==373784==    by 0x903DA6B: CMwrite (cm_interface.c:629)                                                                                                                                                   
==373784==    by 0x80CD7A3: SendSpeculativePreloadMsgs (evpath_dp.c:1299)                                                                                                                                  
==373784==    by 0x80CD3B4: EvpathWSReaderRegisterTimestep (evpath_dp.c:1211)                                                                                                                              
==373784==    by 0x80D5AF6: SendTimestepEntryToSingleReader (cp_writer.c:1161)                                                                                                                             
==373784==    by 0x80D5BA4: SendTimestepEntryToReaders (cp_writer.c:1181)                                                                                                                                  
==373784==    by 0x80D88E8: SstInternalProvideTimestep (cp_writer.c:2320)                                                                                                                                  
==373784==    by 0x80D8CFD: SstProvideTimestep (cp_writer.c:2409)                                                                                                                                          
==373784==    by 0x801EEA0: adios2::core::engine::SstWriter::EndStep() (SstWriter.cpp:360)                                                                                                                 
==373784==  Address 0xffffffffd9c9d040 is not stack'd, malloc'd or (recently) free'd                                                                                                                       
==373784==                                                                                                                                                                                                 
==373784==                                                                                                                                                                                                 
==373784== Process terminating with default action of signal 11 (SIGSEGV)                                                                                                                                  
==373784==    at 0x5D94019: raise (raise.c:46)                                                                                                                                                             
==373784==    by 0x5D940BF: ??? (in /usr/lib/x86_64-linux-gnu/libc-2.31.so)                                                                                                                                
==373784==    by 0x9029A50: INT_CMwrite_raw_notify (cm.c:3122)                                                                                                                                             
==373784==    by 0x90298F3: INT_CMwrite_raw (cm.c:3093)                                                                                                                                                    
==373784==    by 0x902B0C1: INT_CMwrite_attr (cm.c:3323)                                                                                                                                                   
==373784==    by 0x9027861: INT_CMwrite (cm.c:2733)                                                                                                                                                        
==373784==    by 0x903DA6B: CMwrite (cm_interface.c:629)                                                                                                                                                   
==373784==    by 0x80CD7A3: SendSpeculativePreloadMsgs (evpath_dp.c:1299)                                                                                                                                  
==373784==    by 0x80CD3B4: EvpathWSReaderRegisterTimestep (evpath_dp.c:1211)                                                                                                                              
==373784==    by 0x80D5AF6: SendTimestepEntryToSingleReader (cp_writer.c:1161)                                                                                                                             
==373784==    by 0x80D5BA4: SendTimestepEntryToReaders (cp_writer.c:1181)                                                                                                                                  
==373784==    by 0x80D88E8: SstInternalProvideTimestep (cp_writer.c:2320)   

GDB finds the segfault at the same line:

3119│      for (i=0; i < vec_count; i++) {
3120│          count += full_vec[i].iov_len - start;
3121│          for (j=start; j< full_vec[i].iov_len; j++) {
3122│          checksum += ((unsigned char*)full_vec[i].iov_base)[j]; // this crashes
3123│          }
3124│          start = 0;
3125│      }

The error seems to occur independently of the chosen WANDataTransport.

To Reproduce

  • Use a data producer in ADIOS2 that writes more than 2GB per step
  • Use the SST engine with WAN backend
  • Use any reader to connect, in my tests opening the stream will be sufficient for crashing without loading any data. The crash will only occur when connecting a reader.

If you need an ADIOS2-only reproducer, I can set up one, for now I'm seeing this issue via openPMD.

Expected behavior
Ideally, no crash. The SST documentation does not mention a 2GB limit, but maybe it has one?

If it does, what is the recommended way to set up a streaming workflow with mediocre data volume in a non-HPC environment? The current use case is a lab environment for the exchange of laser images.

Desktop (please complete the following information):

  • OS/Platform: nvidia/cuda:11.6.0-devel-ubuntu20.04 Singularity container on a Debian bullseye/sid machine
  • Build [e.g. compiler version gcc 7.4.0, cmake version, build type: static ]: ADIOS 2.8.0, built with g++ 11.1.0 in Debug mode

Additional context
I remember seeing this issue also with earlier ADIOS2 releases, but did not investigate. So I assume that the exact build type is not very relevant.

Following up
Was the issue fixed? Please report back.

@eisenhauer
Copy link
Member

Interesting... My first thought was lurking 32-bit length values (much of this code was written with control, not data, in mind), but offhand I'm not sure how that turns into a segfault here. Then again, that checksum calculation should only be done for tiny (<10K) messages, so something is clearly going bad. Let me see if I can reproduce. Shouldn't be too hard to sort.

chuckatkins pushed a commit to chuckatkins/ADIOS2 that referenced this issue May 6, 2022
Code extracted from:

    https://github.com/pybind/pybind11.git

at commit 42d8593ad4225a634b481cd573f7aeb94de72418 (master).

Upstream Shortlog
-----------------

Aaron Gokaslan (45):
      cd4b49a2 Update py::kwargs examples to pass by reference (ornladios#3038)
      b4b67f02 Fix typos (ornladios#3044)
      af6218ff fix(clang-tidy): Apply performance fixes from clang-tidy (ornladios#3046)
      e0b5cbd4 chore(clang-tidy): add more modernize clang-tidy checks (ornladios#3049)
      3b30b0a5 fix(clang-tidy): clang-tidy readability and misc fixes, like adding const (ornladios#3052)
      dac74ebd fix(clang-tidy): performance fixes applied in tests and CI (ornladios#3051)
      b5357d1f fix(clang-tidy): Enable clang-tidy else-after-return and redundant void checks (ornladios#3080)
      25e470c5 fix(clang-tidy): Add cppcoreguidelines-init-vars,slicing, and throw-by-value-catch-by-reference checks (ornladios#3094)
      c4b0dc7c Add shellcheck style checking (ornladios#3114)
      9beaa925 maint(clang-tidy): Improve code readability with explicit boolean casts (ornladios#3148)
      0ac4c8af maint(clang-tidy): Improve code readability with explicit boolean casts (ornladios#3148)
      c0756ccd fix: func_handle for rule of two (ornladios#3169)
      9f204a18 fix: func_handle for rule of two (ornladios#3169)
      3893f37b maint(clang-tidy): Bugprone enable checks (ornladios#3166)
      ff590c12 maint(perf): Optimize Numpy constructor to remove copies by value. (ornladios#3183)
      9df2f1ff maint(precommit): Apply isort (ornladios#3195)
      617cb653 [Bugfix] Fix errant const methods (ornladios#3194)
      6cbabc4b maint(clang-tidy): Enable cpp-coreguideline slicing checks (ornladios#3210)
      d71ba0cb (perf): Add a missing noexcept to a pytype constructor (ornladios#3236)
      4c6bee35 fix: Set __file__ constant when using eval_file (ornladios#1300) (ornladios#3233)
      ae07d4c6 maint(Clang-Tidy): readability-const-return (ornladios#3254)
      9978ed58 Fix capsule bug (ornladios#3261)
      d0f3c51f Enable defining custom __new__ (ornladios#3265)
      6e6975e2 Fix test case with __new__ (ornladios#3285)
      0fb981b2 Add blacken-docs and pycln pre-commit hooks (ornladios#3292)
      ad966556 fix: replace free() with std::free() (ornladios#3321)
      f4c81e08 maint: Add additional linter-related pre-commit hooks (ornladios#3337)
      78ee782b feat: Add C++ binding to throw AttributeError (ornladios#3387)
      ef070f77 Add additional info to TypeError when C++->Python casting fails (ornladios#3605)
      d434b5f3 (chore): Remove deprecated c-headers (ornladios#3610)
      f8d4aa47 Add clang-tidy readability checks for sus args (ornladios#3611)
      d2ec8367 Add support for nested C++11 exceptions (ornladios#3608)
      3a8d9230 Fix caster optimization regression introduced in ornladios#3650 (ornladios#3659)
      978617f6 fix issue 3668 by removing bool casts in numpy.h (ornladios#3669)
      ce18721d Ensure TypeError use raise_from for C++->Python overload res. (ornladios#3671)
      1b841883 Minor change to improve readability (ornladios#3695)
      7f975816 chore(clang-tidy): Enable static downcast and decl naming check (ornladios#3709)
      dc9803ce Add missing clang-tidy fixes (ornladios#3715)
      d6c66d25 chore(clang-tidy): Add clang-tidy rules: prefer-member-initializer and optin.performance.Padding (ornladios#3716)
      42a8e312 Improve Python 3.11 support (ornladios#3694)
      af08a95b fix: potential memory leak in pypy (ornladios#3774)
      2dd52544 fix: missing move in eval.h (ornladios#3775)
      47079b9e (perf): Add missing move in sp matrix caster and microopt char concats (ornladios#3823)
      146695a9 fix: better exception and error handling for capsules (ornladios#3825)
      3a183d4b fix: improve str exceptions and consistency with python (ornladios#3826)

Akira Kawata (1):
      417fd120 Fix: fix typo of WITHOUT_SOABI (ornladios#2992)

Andy Maloney (3):
      14b37512 docs: fix example code in Exceptions section (match vs. matches) (ornladios#2781)
      df8494dc fix: a clang warning [-Wshadow-field-in-constructor-modified] (ornladios#2780)
      40931961 docs: fix spelling in some comments/docs (ornladios#2777)

Antony Lee (6):
      d068ab28 docs: pybind11/numpy.h does not require numpy at build time. (ornladios#2720)
      6b4297fd fix: don't trigger -Wunused-parameter in flagcheck.cpp. (ornladios#2735)
      e8c4f543 fix: prepend Pybind11Extension flags rather than appending them. (ornladios#2808)
      5bcaaa04 Add a std::filesystem::path <-> os.PathLike caster. (ornladios#2730)
      1be0a0a6 Add helper to build in-tree extensions. (ornladios#2831)
      b11ff912 fix(setup =_helpers): don't add -g0 CFLAGS sets -g (ornladios#3436)

Axel Huebl (5):
      0b3df7f9 ci: Intel icc/icpc via oneAPI (ornladios#2573)
      c78dfb69 MSVC but not Clang: /MP (ornladios#2824)
      55f6f6e9 Fix: RTD Docutils Build (ornladios#3119)
      d75b3536 CI: MSVC Debug Build (ornladios#3784)
      a7e7a6e8 Docs: No Strip in Debug (ornladios#3779)

Bertrand MICHEL (1):
      74a767d4 Dtype kind vs char  (ornladios#2864)

Bjorn (1):
      32d11c96 fix typo in pickle example (ornladios#2669)

Bobby Impollonia (1):
      75168113 fix(setup_helpers): ensure ThreadPool is closed (ornladios#3548)

Boris Rasin (2):
      01f938e7 fix: add missing std::forward calls (ornladios#3443)
      a224d0cc fix: vs2022 compilation, issue ornladios#3477 (ornladios#3497)

Boris Staletic (4):
      06b673a0 Allow NULL value in pybind11_meta_setattro (ornladios#2629)
      8adef2c7 fix: workaround for ornladios#2682 and ornladios#2422 by simply clearing the TypeError (ornladios#2685)
      f110889d Use correct duration representation when casting from datetime.timdelta to std::chrono::duration (ornladios#2870)
      5cd37507 Enable -Wstrict-aliasing warning (ornladios#2816)

Bruce Merry (4):
      ee0c5ee4 Add make_value_iterator (ornladios#3271)
      b3573ac9 feat: add `.keys` and `.values` to bind_map (ornladios#3310)
      47ed124f Fix some formatting in the v2.8.0 changelog (ornladios#3339)
      8a7c266d Fix make_key_iterator/make_value_iterator for prvalue iterators (ornladios#3348)

Chad B. Hovey (1):
      dd2d1272 Correct "which" versus "that" error. (ornladios#3430)

Changming Sun (1):
      210c8c21 fix: a warning found by static code analyzer (ornladios#2783)

Chris Ohk (1):
      1a432b42 docs: Correct minor typos (ornladios#3721)

Cris Luengo (1):
      93e69191 fix: enable py::implicitly_convertible<py::none, ...> for py::class_-wrapped types (ornladios#3059)

Dan (1):
      930bb16c Call PySys_SetArgv when initializing interpreter. (ornladios#2341)

David Hewitt (2):
      a0b97596 Allow python builtins to be used as callbacks (ornladios#1413)
      fd71bd48 Allow python builtins to be used as callbacks (ornladios#1413)

Dmitry Yershov (1):
      076c89fc tests: test recursive dispatch using visitor pattern (#3365)

Dustin Spicuzza (4):
      c0fbb02c Extract gil management functions to separate header (ornladios#2845)
      6d440946 Check dict item accesses where it isn't already checked (ornladios#2863)
      ec81e8e7 Propagate py::multiple_inheritance to all children (ornladios#3650)
      17792884 Document how to bind templates (ornladios#3665)

Edward Lockhart (1):
      23c3edcf When determining if a shared_ptr already exists, use a test on the we… (ornladios#2819)

Eric Cousineau (6):
      635e3fc9 CONTRIBUTING: Add suggestion about passing pytest flags (ornladios#2738)
      2110d2d8 enum: add missing Enum.value property (ornladios#2739)
      f676782b env: Add surrogate for pytest.deprecated_call for ptyest<3.9 (ornladios#2923)
      b6ec0e95 functions: Add doc on incorrect argument index (ornladios#2979)
      6ac8efe5 test_eval: Show example of working closure (ornladios#2743)
      f495dfc4 cast: Qualify symbol usage in PYBIND11_TYPE_CASTER (ornladios#3758)

Frank (1):
      f8b8107b fix: make FindPython2 and FindPython3 work (ornladios#2662)

Geoffrey Gunter (1):
      2d6014e4 docs: fix minor typo (ornladios#3390)

Guillaume Jacquenot (2):
      e450eb62 Removed duplicated word in docs/advanced/cast/eigen.rst (ornladios#3458)
      1eb59963 Removed duplicated word in docs/advanced/exceptions.rst (ornladios#3476)

Henry Fredrick Schreiner (5):
      5b43ac42 docs: fix missing line from ornladios#2310
      dff9b3b4 chore: add pytest-timeout, mypy
      732bf88d fix: avoid changing class outside of GIL
      4a5b81b1 chore: get back to work
      87954e7a fix: corrected dev versioning

Henry Schreiner (111):
      b8dc60ec fix: Python include directory was missing from DIRS (ornladios#2636)
      6cc233cc ci: label PRs when merged only for now
      6d4854a5 ci: correct types statement
      3e4d54bc fix: match new extension discovery with changes to classic discovery (ornladios#2640)
      ebd5c5b4 feat: way to only recompile changed files (ornladios#2643)
      f1abf5d9 docs: changelog update (ornladios#2652)
      b7c741b5 docs: back to work after 2.6.1
      02746cb6 docs: add a little more information for releases
      de78bddd docs: better badges (ornladios#2656)
      17c22b9e docs: mention branch update in checklist (ornladios#2670)
      499fcd54 ci: drop pypy2 linux, PGI 20.7, add Python 10 dev (ornladios#2724)
      ffb113d1 fix: regression with installed pybind11 overriding local one (ornladios#2716)
      d5af536f ci: update cmake action (ornladios#2734)
      5bd766bf docs: update changelog and add script to help generate it (ornladios#2733)
      79b0e2c0 docs: fix pdf build, simpler start page (ornladios#2736)
      b7dfe5cc chore: changelog update (ornladios#2750)
      5abce7fc ci: use fixed action (ornladios#2791)
      230fa53f fix: Don't override global settings for VISIBILITY if set (ornladios#2793)
      0df11d85 docs: update build description slightly (ornladios#2794)
      87f5aff4 ci: update to setup-cmake v1.6 (ornladios#2805)
      eb83feef style: avoid using unintialized variables (ornladios#2806)
      130c9954 fix: support basic dual includes (ornladios#2804)
      2db0264a style: add clang-format file (ornladios#2310)
      44105ca1 docs: mention that the changelog block in PR is special
      08bca374 docs: update changelog, nicer output for script (ornladios#2811)
      8e5d3d23 docs: prepare for 2.6.2 (ornladios#2820)
      8de7772c chore: prepare for the 2.6.2 release (ornladios#2821)
      721834b4 chore: get PyPy 3.7 wheels using NumPy 1.20 (ornladios#2837)
      e0c1dadb chore: add myself to CODEOWNERS (ornladios#2940)
      114be7f4 docs: remove recommonmark (ornladios#2955)
      5e4804bb tests: use master commit for pytest on 3.10 (ornladios#2967)
      54430436 ci: install Boost for boost checks (ornladios#2968)
      7a64b8ad docs: fix script issues for changelog compilation (ornladios#3100)
      f0a65c89 docs(fix): spelling mistake in recent commit
      ddf0efb9 chore: add nox support (ornladios#3101)
      84fdadfb chore: update pre-commit hooks
      11e12fe4 chore: move some config to pyproject.toml
      0e2e0035 style: add pyupgrade check, 2.7+
      6a644c8f docs: update changelog (ornladios#3099)
      cd061aee style: pre-commit cleanup (ornladios#3111)
      31843d45 docs: reduce visibility of 3.9.0 warning (ornladios#3105)
      2415c094 feat(package): support pipx run (ornladios#3117)
      1b10292c chore: support PDF from nox (ornladios#3121)
      6642f389 docs: update changelog (ornladios#3122)
      65e95ea8 chore: bump to 2.7.0 (ornladios#3123)
      74935f8d chore: post-release (ornladios#3128)
      7cc0ebb4 fix: the CMake config in Python package had a hard coded path (ornladios#3144)
      c14b1933 chore: increase CMake upper limit (ornladios#3124)
      5c6bdb72 fix: the CMake config in Python package had a hard coded path (ornladios#3144)
      b1fdbe69 chore: add discussions link (ornladios#3159)
      a2b78a8c chore: changelog update (ornladios#3163)
      90959848 chore: changelog update (ornladios#3163)
      078c1167 chore: bump to version 2.7.1
      5f34c42d chore: bump to version 2.7.1
      82adacb3 fix: include hex version in bump
      787d2c88 fix: include hex version in bump
      c30f57d2 chore: start development for 2.8.0
      5f4d7259 fix: version number hex
      1fafd1b4 fix: apply simpler expression with fewer workarounds
      089328f7 Revert "fix: apply simpler expression with fewer workarounds"
      fdac5fbf chore: support targeting different Python versions with nox (ornladios#3214)
      db44afa3 tests: fix pytest usage on Python 3.10 (ornladios#3221)
      04dd3262 docs: update CHANGELOG (ornladios#3276)
      b06a6f4f feat: Slice allowing None with py::object or std::optional (ornladios#1101)
      2fa3fcfd Revert "Add make_value_iterator (ornladios#3271)"
      5f46e47d tests: check simple iteration of pairs (ornladios#3296)
      21282e64 feat: reapply fixed version of ornladios#3271 (ornladios#3293)
      6ad3f874 fix(build): avoid a possible warning about shadowed variables and changing behaviors (ornladios#3220)
      d58699c9 fix(cmake): reduce chance for variable collision (ornladios#3302)
      6bce3bd7 docs: update CHANGELOG (ornladios#3304)
      a1830d5e docs: mention title conventions in PR template (ornladios#3313)
      d7a7edc1 tests: support Eigen configuration
      591db0b9 docs: update CHANGELOG for 2.8
      20aae3e6 ci: disable Eigen due to Cert issue on CentOS
      c9a319c6 chore: version 2.8.0 final
      3747dc2c Revert "All `-DDOWNLOAD_EIGEN=OFF` (to work around gitlab eigen outage)." (ornladios#3326)
      ba9f919b chore: get back to work after 2.8.0
      931f6644 ci: cancel in-progress on repeated pushes (ornladios#3370)
      f791dc86 fix: deprecate make_simple_namespace, fix Python 3.11 (ornladios#3374)
      606f81a9 style: drop pycln (ornladios#3397)
      9379b399 fix: MSVC 2017 C++17 on Python 3 regression (ornladios#3407)
      e7e2c79f fix: improve support for Python 3.11-dev (ornladios#3368)
      90707b46 fix(build): support conan's multiple includes of all files (ornladios#3420)
      f1594cb9 docs: changelog update for 2.8.1 (ornladios#3416)
      a61e354e docs: touch up manual release suggestion (ornladios#3422)
      aebd21b5 docs: rework CI a bit, more modern skipping (ornladios#3424)
      270b11d5 Revert "style: drop pycln" (ornladios#3466)
      72282f75 ci: support development releases of Python (ornladios#3419)
      ff51fcb7 docs: fix broken link (again)
      15f8d7c1 fix(build): cleaner CMake printouts & IDE folders (ornladios#3479)
      cd176cee chore: update changelog with recent PRs (ornladios#3524)
      39fbc799 fix: avoiding usage of _ if already defined (ornladios#3423)
      e50f841d fix: do not use LTS on mips64 and ppc64le (ornladios#3557)
      d4b9f347 docs: update changelog (ornladios#3556)
      cb302305 fix: restore full range of _ functions (ornladios#3571)
      45f792ef chore: prepare for 2.9
      9b4f71d1 docs: remove duplication in changelog for 2.9.0
      bf7e5f92 fix(setup): support overriding CMake args (ornladios#3577)
      21e10945 ci: move centos 8 to stream (ornladios#3675)
      0f6ad910 docs: update changelog for 2.9.1 (ornladios#3670)
      ffa34686 chore: bump to 2.9.1
      36813cfa chore: back to work
      af056b65 fix: __index__ on Enum should always be present. (ornladios#3700)
      46dcd9bc fix: minor CMake warning fix for unused variable (ornladios#3718)
      522c59ce chore: drop Python 3.5 (ornladios#3719)
      a25d40c7 tests: use 'build' in tests instead of running setup.py (ornladios#3734)
      4b42c371 style: pylint (ornladios#3720)
      5f9b090a ci: fix PyPy (ornladios#3768)
      461937d3 ci: test pypy 3.9 (ornladios#3789)
      7742be02 Revert "ci: test pypy 3.9" (ornladios#3828)
      42d8593a style: bump black (ornladios#3831)

Ivor Wanders (1):
      21911e12 A way to register additional test targets and support .py only tests. (ornladios#3590)

JYX (1):
      3df0ee6f docs: typo in classes.rst (ornladios#2926)

Jack S. Hale (1):
      4c7697db Add const T to docstring generation. (ornladios#3020)

James Foster (1):
      d57c1fab docs: update installing.rst (ornladios#2691)

Jan Iwaszkiewicz (1):
      cf006af2 Fix typos and docs style (ornladios#3088)

Jason Rhinelander (3):
      e7c9753f feat: allow kw-only args after a py::args (ornladios#3402)
      673b4be3 Fix py::kw_only when used before the first arg of a method (ornladios#3488)
      b4939fcb Expand std::string_view support to str, bytes, memoryview (ornladios#3521)

Jean-Baptiste Lespiau (1):
      af8849f4 docs: list all pybind11 exceptions (ornladios#2671)

Jeremy Maitin-Shepard (4):
      4d5ad03e Avoid use of temporary `bytes` object in string_caster for UTF-8 (ornladios#3257)
      14976c85 Eliminate duplicate TLS keys for loader_life_support stack (ornladios#3275)
      2a78abff Ensure PYBIND11_TLS_REPLACE_VALUE evaluates its arguments only once (ornladios#3290)
      62c4909c Add `custom_type_setup` attribute (ornladios#3287)

Jerome Robert (4):
      1259db6f Fix Pybind11Extension on mingw64 (ornladios#2921)
      9e8a741b fix: Mingw64 corrected and add a CI job to test it (ornladios#3132)
      c80e0593 fix: Mingw64 corrected and add a CI job to test it (ornladios#3132)
      56b49c2b ci: fix mingw checks by pinning (ornladios#3375)

JonTriebenbach (1):
      8b1944d3 Remove idioms in code comments (ornladios#3809)

Jouke Witteveen (1):
      031a700d Add make_simple_namespace function and tests (ornladios#2840)

Karthik Nishanth (1):
      e791ec4e fix: add null pointer check with std::localtime (ornladios#2846)

Kumar Aditya (1):
      948d09d6 test: Test against Python 3.10 (ornladios#2848)

Laramie Leavitt (4):
      5469c238 Adjusting `type_caster<std::reference_wrapper<T>>` to support const/non-const propagation in `cast_op`. (ornladios#2705)
      0e599589 Fix thread safety for pybind11 loader_life_support (ornladios#3237)
      b3a43d13 Use rvalue reference for std::variant cast_op<T> (ornladios#3811)
      b22ee64c Add type_caster<std::monostate> (ornladios#3818)

Liam Keegan (2):
      4f29b8a4 ci: extend msys2 mingw CI (ornladios#3207)
      bcb6d63c fix msys ci python issue (ornladios#3651)

Lishen1 (1):
      5d067e87 fix: remove redundant copy operation to fix warning (ornladios#3486)

Matthias Köppe (1):
      e0031bfc include/pybind11/numpy.h: gcc 4.8.4 does not have is_trivially_copyable (ornladios#3270)

Mattia Basaglia (2):
      07103d65 Remove extra semicolon (ornladios#3666)
      dc4717ba fix: module extension detection for python 3.10 (ornladios#3663)

Michael Kuron (1):
      48534089 fix: Intel ICC C++17 compatibility (ornladios#2729)

Michał Górny (2):
      1d3b04e8 test: Strip whitespace when comparing numpy dtypes for 1.22 compat (ornladios#3682)
      96b943be tests: update catch to 2.13.5 to fix glibc 2.34 failures (ornladios#3679)

NaDDu (1):
      750e38dc Update eval.h (ornladios#3344)

Nick Cullen (2):
      59ad1e7d reshape for numpy arrays (ornladios#984)
      503ff2a6 view for numpy arrays (ornladios#987)

Nikita Shulga (1):
      79cb013f fix: allow users to avoid thread termination in scoped_released (ornladios#2657)

Nimrod (1):
      9ec1128c Fix typo in doc (ornladios#3628)

Oleksandr Pavlyk (1):
      91a6e129 PYBIND11_OBJECT_CVT should use namespace for error_already_set() (ornladios#3797)

OnlineCop (1):
      cbae6d55 docs: fix CMake status for DOWNLOAD_EIGEN (ornladios#2857)

Peter Hawkins (1):
      44596bc4 Fix exception handling when pybind11::weakref() fails. (ornladios#3739)

Philipp Bucher (3):
      62976cfc fix: using -Werror-all for Intel (ornladios#2948)
      71fd5241 docs: fix minor typo (ornladios#3311)
      c9bbf8d2 docs: fix minor typo (ornladios#3311)

Pieter P (1):
      0c93a0f3 Fix Unicode support for ostream redirects (ornladios#2982)

Qifan Lu (1):
      d587a2fd fix: do not set docstring for function when empty (ornladios#2745)

Ralf W. Grosse-Kunstleve (78):
      cecdfadc minor cleanup: fixing or silencing flake8 errors (ornladios#2731)
      9b7bfef8 Factoring out find_registered_python_instance() from type_caster_generic::cast. (ornladios#2822)
      0432ae7c Changing pybind11::str to exclusively hold PyUnicodeObject (ornladios#2409)
      932769b0 Adding holder_caster `typename SFINAE = void` hooks to help work around the current lack of smart-pointer interoperability (ornladios#2833)
      0c42250a Splitting out detail/type_caster_base.h from cast.h, with iwyu cleanup. (ornladios#2841)
      e2e819b2 Shuffling code in test_smart_ptr.cpp to separate struct/class definitions from bindings code. Back-porting from smart_holder branch, to minimize diffs and potential for merge conflicts. (ornladios#2875)
      44678e54 Shuffling code in test_multiple_inheritance.cpp to separate struct/class definitions from bindings code. (ornladios#2890)
      0e01c243 Generalizing suppression for pypocketfft. (ornladios#2896)
      ad6bf5cd Adding PyGILState_Check() in object_api<>::operator(). (ornladios#2919)
      e25b1505 Adjusting valgrind suppression for pypocketfft to resolve systematic failures that started to appear on 2020-05-27. (ornladios#3022)
      19d99a87 Working around Centos 8 failure. (ornladios#3030)
      484b0f04 Updating and slightly enhancing instructions for running clang-tidy. (ornladios#3055)
      fbae8f31 pickle setstate: setattr __dict__ only if not empty (ornladios#2972)
      cad79c11 tests: remove very minor oversight in PR ornladios#3059. (ornladios#3066)
      795e3c4c Removing `AlignConsecutiveAssignments: true`. (ornladios#3067)
      0ad116d3 Adding codespell to .pre-commit-config.yaml (follow-on to PR ornladios#3075). (ornladios#3076)
      6d1b197b Splitting out pybind11/stl/filesystem.h. (ornladios#3077)
      bac5a0c3 Go all the way fixing clang-tidy issues to avoid the NOLINTNEXTLINE clutter and clang-format issues. This was really meant to be part of PR ornladios#3051 but was held back either out of an abundance of caution, or because of confusion caused by stray semicolons. (ornladios#3086)
      0f4761b4 Rollback of DOWNLOAD_CATCH=OFF change merged via PR ornladios#3059. (ornladios#3092)
      2d468697 NOLINT reduction (ornladios#3096)
      7472d37a Adding iostream.h thread-safety documentation. (ornladios#2995)
      9f11951b Fixing spelling errors that went undetected because the pre-commit spell check was added after the CI for PR ornladios#2995 last ran. (ornladios#3103)
      75090647 More precise return_value_policy::automatic documentation. (ornladios#2920)
      aca6c3ba * Removing stray semicolons (discovered by running clang-format v12 followed by tools/check-style.sh). (ornladios#3087)
      4359e00b Introducing PYBIND11_VERSION_HEX (ornladios#3120)
      34f587dd Removing all warning pragmas that have not effect. (ornladios#3127)
      ff97f101 Removing MSVC C4996 from pragma block at the top of pybind11.h (ornladios#3129)
      7904ba1a Adding pragma warning(disable: 4522) for MSVC <= 2017. (ornladios#3142)
      a0f862d4 Removing MSVC C4800 from pragma block at the top of pybind11.h (ornladios#3141)
      2164c2e0 Removing __INTEL_COMPILER section from pragma block at the top of pybind11.h (ornladios#3135)
      f4721a7b Accommodating environments that define __STDC_WANT_LIB_EXT1__ even if __STDC_LIB_EXT1__ is not defined by the implementation. (ornladios#3151)
      b72ca7d1 Removing MSVC C4100 from pragma block at the top of pybind11.h (ornladios#3150)
      b193d42c Removing MSVC C4996 from pragma block at the top of pybind11.h (ornladios#3129)
      85b38c69 Adding pragma warning(disable: 4522) for MSVC <= 2017. (ornladios#3142)
      e93d9459 Removing MSVC C4800 from pragma block at the top of pybind11.h (ornladios#3141)
      ed5fb66b Removing __INTEL_COMPILER section from pragma block at the top of pybind11.h (ornladios#3135)
      05852fb6 Accommodating environments that define __STDC_WANT_LIB_EXT1__ even if __STDC_LIB_EXT1__ is not defined by the implementation. (ornladios#3151)
      b4259729 Limiting pragma for ignoring GCC 7 -Wnoexcept-type to the scope of pybind11.h. (ornladios#3161)
      e2573dc9 Moving pragma for MSVC warning C4505 from pybind11.h to existing list in detail/common.h (ornladios#3160)
      46c51fc0 Limiting pragma for ignoring GCC 7 -Wnoexcept-type to the scope of pybind11.h. (ornladios#3161)
      b961ac64 Moving pragma for MSVC warning C4505 from pybind11.h to existing list in detail/common.h (ornladios#3160)
      dcbda8d7 Removing MSVC C4127 from pragma block at the top of pybind11.h (ornladios#3152)
      af700733 Removing GCC -Wunused-but-set-parameter from pragma block at the top of pybind11.h (ornladios#3164)
      61ee923b Consistent step name "Python tests". (ornladios#3180)
      4c7e509f PYBIND11_NOINLINE-related cleanup. (ornladios#3179)
      7d3b0571 Improved workaround for Centos 8 failure (follow-on to PR ornladios#3030). (ornladios#3193)
      1bcd94c4 Removing last remnants of pragma block at the top of pybind11.h (ornladios#3186)
      774b5ff9 Removing obsolete eigen.h warning suppression pragmas. (ornladios#3198)
      998d45e4 Cleanup of file-scoped and globally-scoped warning suppression pragmas across pybind11 header files. (ornladios#3201)
      c8ce4b8d Clone of @virtuald's PR ornladios#2112 with minor enhancements. (ornladios#3215)
      777352fc Adding `ssize_t_cast` to support passing `size_t` or `ssize_t` values where `ssize_t` is needed. (ornladios#3219)
      a46f6237 Minor tweaks. (ornladios#3230)
      49173e47 Minor follow-on to PR ornladios#1334 (Fix enum value's __int__ returning non-int when underlying type is bool or of char type) (ornladios#3232)
      6abf2baa CodeHealth: Enabling clang-tidy google-explicit-constructor (ornladios#3250)
      121b91f9 Fixing NOLINT mishap (ornladios#3260)
      6c65ab59 Follow-on to PR ornladios#3254, to address user code breakages. (ornladios#3263)
      9f146a56 All `-DDOWNLOAD_EIGEN=OFF` (to work around gitlab eigen outage).
      7c580586 Correct options on Eigen::MappedSparseMatrix & adding MSVC C4127 suppression around Eigen includes. (ornladios#3352)
      f7b49961 [skip ci] Tweaks in preparation for the 2.8.1 release. (ornladios#3421)
      a80b2237 chore: get back to work after 2.8.1
      9281faf4 Fixing `stict` vs `strict` typo. (ornladios#3493)
      b3d9c354 vi: replacing currently broken ICC Latest C++17 with C++14. (ornladios#3551)
      1bbaeb34 Adding dedicated test_const_name. (ornladios#3578)
      f5888108 Replacing ICC C++14 with C++17 (ornladios#3570)
      7e7c5585 Fixing obvious minor typo (missing `D` in `-DOWNLOAD_EIGEN=ON`).
      3899dc65 Documenting missing unit test coverage. (ornladios#3673)
      8581584e Manual fix-ups in preparation for clang-tidy readability-braces-around-statements.
      ddbc74c6 Adding .clang-tidy readability-braces-around-statements option.
      b4f5350d chore: use member initializer (ornladios#3704)
      7769e771 clang-tidy readability-qualified-auto (ornladios#3702)
      abc38690 Manually applying two clang-format changes that need fix-ups for clang-tidy. (ornladios#3705)
      e96221be Final manual curation in preparation for global `clang-format`ing (ornladios#3712)
      ec24786e Fully-automatic clang-format with include reordering (ornladios#3713)
      6493f496 Python 2 removal part 1: tests (C++ code is intentionally ~untouched) (ornladios#3688)
      44156477 Adding MSVC 2022 C++20 GitHub Action (ornladios#3732)
      a97e9d8c Dropping MSVC 2015 (ornladios#3722)
      c14170a7 Removing `// clang-format off` - `on` directives from test_pickling.cpp (ornladios#3738)
      009ffc33 MSVC C++20 test_eigen (ornladios#3741)

Rasmus Munk Larsen (1):
      70a58c57 Replace usage of deprecated Eigen class MappedSparseMatrix. (ornladios#3499)

Robert Haschke (4):
      b72cebeb style: clang-tidy: modernize-use-using (ornladios#2645)
      d9fa7056 style: remove redundant instance->owned = true (ornladios#2723)
      c2db53da fix: catch missing self argument in overloads constructor (ornladios#2914)
      c090c8c4 Unify cast_error message thrown by [simple|unpacking]_collector (ornladios#3013)

Robert Schütz (1):
      d00fc629 use CMAKE_INSTALL_FULL_INCLUDEDIR (ornladios#3005)

Ryan Cahoon (1):
      c2d3e220 fix: the types for return_value_policy_override in optional_caster (ornladios#3376)

Sebastian Koslowski (1):
      94a94872 docs: fix imported target name (ornladios#3689)

Sergei Izmailov (1):
      51948559 Render `py::bool_` and `py::float_` without `_` in docstrings (ornladios#3622)

Sergiu Deitsch (1):
      d2b21316 cmake: report version type in the version string (ornladios#3472)

Shane Loretz (1):
      7331d381 Raise codec errors when casting to std::string (ornladios#2903)

StarQTius (1):
      9aa676d3 fix: clear local internals after finalizing interpreter ornladios#2101 (ornladios#3744)

Stefano Rivera (1):
      465b2e0b Use sysconfig in Python >= 3.10 (ornladios#3764)

Steve Siano (1):
      6f66e760 docs: add a note about compiling the example (ornladios#2737)

Tailing Yuan (1):
      d6474ed7 fix: memory leak in cpp_function (ornladios#3228) (ornladios#3229)

Tamaki Nishino (1):
      6709abba Allow function pointer extraction from overloaded functions (ornladios#2944)

Thomas Ballinger (1):
      39a0aac8 docs fix to avoid nonexistent SmartCompile (ornladios#3241)

Tobias Leibner (1):
      7bd4b397 fix: define PYBIND11_CPP14 for recent intel compilers (ornladios#2679)

Tom de Geus (1):
      9c0aa699 Pointing out namespace in docs (ornladios#2874)

Trigve (1):
      afdc09de [master] Wrong caching of overrides  (ornladios#3465)

Vikram Pal (1):
      417067ee Add pybind11::bytearray (ornladios#2799)

Weiming Zhao (1):
      4f0727f2 Fix the enabling of default extension handling (ornladios#2938)

Wenzel Jakob (1):
      409be833 CMake: react to python version changes

Yannick Jadoul (19):
      7d6713a4 Use weakref to clean up captured function object in def_buffer (ornladios#2634)
      c58758d0 fix: add reasonable argument names to enum_ methods (ornladios#2637)
      028812ae docs: add warning about FindPython's Development component when libraries don't exist (e.g. on manylinux) (ornladios#2689)
      91a69720 docs: Update warning about Python 3.9.0 UB, now that 3.9.1 has been released (ornladios#2719)
      30eb39ed fix: also throw in the move-constructor added by the PYBIND11_OBJECT macro, after the argument has been moved-out (if necessary) (ornladios#2701)
      830f8eda tests: update pytest 6.2.1 and fix test_python_alreadyset_in_destructor (ornladios#2741)
      e612043d Fix invalid access when reinterpret_casting a non-pybind11 PyObject* to instance* (found by Valgrind in ornladios#2746) (ornladios#2755)
      e57dd471 Fix various minor memory leaks in the tests (found by Valgrind in ornladios#2746) (ornladios#2758)
      98f1bbb8 Ignore deprecation warnings about old-style __init__/__setstate__ constructors in the tests (originally done in ornladios#2746) (ornladios#2759)
      7b7ec664 ci: pin CMake to 3.19.2, fixes  issues with 3.19.3 on Linux (aarch64) and macOS (universal) (ornladios#2790)
      f243450e ci: disable builds for 3.10.0a4, and enable a nightly 3.10-dev build (ornladios#2792)
      1faf4a8a docs: the order of alternatives for variant types matters, and follows the same rules as overload resolution (ornladios#2784)
      08551463 Plug leaking function_records in cpp_function initialization in case of exceptions (found by Valgrind in ornladios#2746) (ornladios#2756)
      0f8d5f2e Add a Valgrind build on debug Python 3.9  (ornladios#2746)
      8449a808 fix: only allow integer type_caster to call __int__ method when conversion is allowed; always call __index__ (ornladios#2698)
      0bb8ca26 Always call PyNumber_Index when casting from Python to a C++ integral type, also pre-3.8 (ornladios#2801)
      587d5f84 Update breathe to 4.26.1, add make_tuple, make_iterator, and make_key_iterator (ornladios#2828)
      6cf6bf20 Fix confusing weakref constructor overload (ornladios#2832)
      fe845878 Make sure all warnings in pytest get turned into errors (ornladios#2838)

Ye Zhihao (1):
      cb60ed49 Fix enum value's __int__ returning non-int when underlying type is bool or of char type (ornladios#1334)

Yichen (1):
      3ac690b8 Explicitly export exception types. (ornladios#2999)

albanD (1):
      087b07c8 Remove workaround code that is not needed since ornladios#1211 (ornladios#2683)

blacktea (1):
      6d5d4e73 Move object in pop method of List. (ornladios#3116)

crimsoncor (1):
      9ea39dc3 Force the builtin module key to be the correct type. (ornladios#2814)

cyy (1):
      f067deb5 avoid unnecessary strlen (ornladios#3058)

dependabot[bot] (14):
      42e73807 chore(deps): bump jwlawson/actions-setup-cmake from v1.6 to v1.7 (ornladios#2818)
      c2362393 chore(deps): bump pypa/gh-action-pypi-publish from v1.4.1 to v1.4.2 (ornladios#2851)
      59f8d7f1 chore(deps): bump jwlawson/actions-setup-cmake from v1.7 to v1.8 (ornladios#2865)
      16c23fef chore(deps): bump pre-commit/action from v2.0.0 to v2.0.2 (ornladios#2935)
      bca4b36b chore(deps): bump pre-commit/action from v2.0.2 to v2.0.3 (ornladios#2964)
      bc7cf6ef chore(deps): bump jwlawson/actions-setup-cmake from 1.8 to 1.9 (ornladios#3000)
      f61855b9 chore(deps): bump ilammy/msvc-dev-cmd from 1 to 1.8.0 (ornladios#3001)
      14023c9c chore(deps): bump ilammy/msvc-dev-cmd from 1.8.0 to 1.8.1 (ornladios#3021)
      9b3b3577 chore(deps): bump ilammy/msvc-dev-cmd from 1.8.1 to 1.9.0 (ornladios#3027)
      d6841f60 chore(deps): bump jwlawson/actions-setup-cmake from 1.9 to 1.10 (ornladios#3196)
      1dc9a23c chore(deps): bump jwlawson/actions-setup-cmake from 1.10 to 1.11 (ornladios#3294)
      ed09664f chore(deps): bump ilammy/msvc-dev-cmd from 1.9.0 to 1.10.0 (ornladios#3338)
      fb9a222d chore(deps): bump pypa/gh-action-pypi-publish from 1.4.2 to 1.5.0 (ornladios#3606)
      3a1eddab chore(deps): bump jwlawson/actions-setup-cmake from 1.11 to 1.12 (ornladios#3625)

heyer2 (1):
      76a16007 fix: STATIC and SHARED flags not being detected (ornladios#2796)

jakobjw (1):
      98f9a33c Correct typo in FAQ (ornladios#2868)

jbarlow83 (2):
      79178e71 fix(setup_helpers): try import multiprocessing.synchronize too (ornladios#3043)
      2b7985e5 Improve documentation of discard_as_unraisable() API (ornladios#2697)

jesse-sony (1):
      d65edfb0 Feature/local exception translator (ornladios#2650)

jonathan-conder-sm (1):
      733f8de2 Avoid string copy if possible when passing a Python object to std::ostream (ornladios#3042)

ka-bo (2):
      e58c6897 Specified encoding in setup.py calls of open() (ornladios#3137)
      ee3ecb8a Specified encoding in setup.py calls of open() (ornladios#3137)

kururu002 (1):
      da15bb20 Cast bytearray to string  (ornladios#3707)

luzpaz (1):
      8bee61b6 docs: fix various typos (ornladios#3075)

mvoelkle-cern (1):
      e08a5811 Fix compilation with gcc < 5 (ornladios#2956)

ngc92 (1):
      56322daf fixed include for filesystem::path (ornladios#3482)

nickbridgechess (1):
      2fa4747c pythonbuf fix (ornladios#2675)

pre-commit-ci[bot] (28):
      9626483c [pre-commit.ci] pre-commit autoupdate (ornladios#3134)
      7f76d795 [pre-commit.ci] pre-commit autoupdate (ornladios#3143)
      c973660d [pre-commit.ci] pre-commit autoupdate (ornladios#3143)
      f4f4632e [pre-commit.ci] pre-commit autoupdate (ornladios#3167)
      ada6b791 [pre-commit.ci] pre-commit autoupdate (ornladios#3167)
      0be2ea06 [pre-commit.ci] pre-commit autoupdate (ornladios#3185)
      b3d18f38 [pre-commit.ci] pre-commit autoupdate (ornladios#3213)
      76d939de [pre-commit.ci] pre-commit autoupdate (ornladios#3231)
      3ed31e92 [pre-commit.ci] pre-commit autoupdate (ornladios#3266)
      077a16e9 [pre-commit.ci] pre-commit autoupdate (ornladios#3286)
      6be64304 [pre-commit.ci] pre-commit autoupdate (ornladios#3312)
      97976c16 [pre-commit.ci] pre-commit autoupdate (ornladios#3325)
      02c05573 [pre-commit.ci] pre-commit autoupdate (ornladios#3353)
      d45a8810 [pre-commit.ci] pre-commit autoupdate (ornladios#3409)
      6de30d31 [pre-commit.ci] pre-commit autoupdate (ornladios#3432)
      b322018e [pre-commit.ci] pre-commit autoupdate (ornladios#3449)
      9422d98f [pre-commit.ci] pre-commit autoupdate (ornladios#3473)
      fe65693c [pre-commit.ci] pre-commit autoupdate (ornladios#3500)
      59aa9986 [pre-commit.ci] pre-commit autoupdate (ornladios#3533)
      d0406c74 [pre-commit.ci] pre-commit autoupdate (ornladios#3563)
      89769e6e [pre-commit.ci] pre-commit autoupdate (ornladios#3574)
      2cd32e5d [pre-commit.ci] pre-commit autoupdate (ornladios#3589)
      b66328b0 [pre-commit.ci] pre-commit autoupdate (ornladios#3609)
      0986af61 [pre-commit.ci] pre-commit autoupdate (ornladios#3672)
      91f597be [pre-commit.ci] pre-commit autoupdate (ornladios#3754)
      061c6177 [pre-commit.ci] pre-commit autoupdate (ornladios#3765)
      f8a532a7 [pre-commit.ci] pre-commit autoupdate (ornladios#3800)
      67089cd3 [pre-commit.ci] pre-commit autoupdate (ornladios#3817)

xaedes (1):
      b4e1ab8c Docs: Demonstrate non-enum internal types in example (ornladios#3314)

yangliz5 (1):
      dedda228 Fix a typo in class.rst (ornladios#3648)
@eisenhauer
Copy link
Member

I was unable to duplicate this when I tried. EVPath has had a couple of tweaks since then. @franzpoeschel , can you maybe try again? If it still fails, I've probably got to get your setup so I can recreate.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 15, 2022

Thank you for looking into this!
I still see the crashes with v2.8.1, I will try to create a minimal example

@franzpoeschel
Copy link
Contributor Author

I can reproduce this with a minimal ADIOS2 example:

Writer:

#include <adios2.h>
#include <numeric>
#include <vector>

int main(int argsc, char **argsv)
{
    std::string engine_type = "sst";
    std::string datatransport = "WAN";
    if (argsc > 1)
    {
        datatransport = argsv[1];
    }

    adios2::ADIOS adios;
    adios2::IO IO = adios.DeclareIO("IO");
    IO.SetParameter("DataTransport", datatransport);
    IO.SetEngine(engine_type);
    adios2::Engine engine = IO.Open("stream", adios2::Mode::Write);

    using datatype = double;
    constexpr size_t vecLength = 2ull * 1024 * 1024 * 1024 / sizeof(double);
    std::vector<datatype> streamData(vecLength);
    std::iota(streamData.begin(), streamData.end(), 0.);

    auto variable = IO.DefineVariable<datatype>(
        "var", {vecLength}, {0}, {vecLength}, /* constantDims = */ true);

    for (unsigned step = 0; step < 10; ++step)
    {
        engine.BeginStep();
        engine.Put(variable, streamData.data());
        engine.EndStep();
    }
    engine.Close();
}

Reader:

#include <adios2.h>
#include <iostream>
#include <string>
#include <vector>

int main(int argsc, char **argsv)
{
    using datatype = double;
    std::string engine_type = "sst";
    std::string datatransport = "WAN";
    if (argsc > 1)
    {
        datatransport = argsv[1];
    }

    adios2::ADIOS adios;
    adios2::IO IO = adios.DeclareIO("IO");
    IO.SetParameter("DataTransport", datatransport);
    IO.SetEngine(engine_type);
    adios2::Engine engine = IO.Open("stream", adios2::Mode::Read);
    std::vector<datatype> streamData;

    unsigned currentStep = 0;

    auto loopbody = [&engine, &streamData, &currentStep](
                        adios2::Variable<datatype> &variable) {
        engine.Get(variable, streamData.data());
        engine.EndStep();
        std::cout << currentStep++ << std::endl;
    };

    engine.BeginStep();
    auto variable = IO.InquireVariable<datatype>("var");
    if (!variable)
    {
        throw std::runtime_error("[Reader] Failed inquiring variable");
    }
    streamData.resize(variable.Shape()[0]);
    loopbody(variable);

    while (engine.BeginStep() == adios2::StepStatus::OK)
    {
        loopbody(variable);
    }
    engine.Close();
}
cmake_minimum_required(VERSION 3.12.0)

project(adios_stream)

find_package(ADIOS2 REQUIRED)

add_executable(stream_write stream_write.cpp)
add_executable(stream_read stream_read.cpp)

target_link_libraries(stream_write PRIVATE adios2::cxx11)
target_link_libraries(stream_read PRIVATE adios2::cxx11)

Changing one of the 1024 to a 1023 will make things work completely fine, but the above configuration will not work.

The faulty behavior depends on the ADIOS2 version:

  • vecLength = 2ull * 1024 * 1024 * 1023 / sizeof(double) works in both v2.8.2 and v2.7.1
  • 2ull * 1024 * 1024 * 1024 will crash in v2.7.1 and v2.8.0 with the segfault at the checksum calculation as in the entry post, but will hang in v2.8.2 (1)
  • 4ull * 1024 * 1024 * 1024 will crash in v2.7.1, v2.8.0 and in v2.8.2, this time with a different error at the reading site, all the same (2)

(1)
Hangup backtrace of the writer:

(gdb) backtrace
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55555557b930) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55555557b8e0, cond=0x55555557b908) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x55555557b908, mutex=0x55555557b8e0) at pthread_cond_wait.c:647
#3  0x00007ffff75097ff in SstWriterClose (Stream=0x55555557b820) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/cp/cp_writer.c:1578
#4  0x00007ffff7451a7a in adios2::core::engine::SstWriter::DoClose (this=0x55555557b6e0, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstWriter.cpp:40
4
#5  0x00007ffff6f45b2b in adios2::core::Engine::Close (this=0x55555557b6e0, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/core/Engine.cpp:70
#6  0x00007ffff7e2524b in adios2::Engine::Close (this=0x7fffffff4d88, transportIndex=-1) at /home/franzpoeschel/git-repos/ADIOS2/bindings/CXX11/adios2/cxx11/Engine.cpp:115
#7  0x0000555555559b70 in main ()

Hangup backtrace of the reader:

(gdb) backtrace
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5555555f3ef0) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55555557c600, cond=0x5555555f3ec8) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5555555f3ec8, mutex=0x55555557c600) at pthread_cond_wait.c:647
#3  0x00007ffff5b911b7 in INT_CMCondition_wait (cm=0x55555557c590, condition=3) at /home/franzpoeschel/git-repos/ADIOS2/thirdparty/EVPath/EVPath/cm_control.c:299
#4  0x00007ffff5b9e69e in CMCondition_wait (cm=0x55555557c590, condition=3) at /home/franzpoeschel/singularity_build/ADIOS2_build/thirdparty/EVPath/EVPath/cm_interface.c:85
#5  0x00007ffff74ffcaf in EvpathWaitForCompletion (Svcs=0x7ffff77980c0 <Svcs>, Handle_v=0x5555555f3e60) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c:1105
#6  0x00007ffff7505ee9 in SstWaitForCompletion (Stream=0x55555557a860, handle=0x5555555f3e60) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/cp/cp_reader.c:2296
#7  0x00007ffff7432ce5 in adios2::core::engine::SstReader::PerformGets (this=0x55555557a6e0) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstReader.cpp:715
#8  0x00007ffff7427ff8 in adios2::core::engine::SstReader::EndStep (this=0x55555557a6e0) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/engine/sst/SstReader.cpp:477
#9  0x00007ffff7e25085 in adios2::Engine::EndStep (this=0x7fffffff4df0) at /home/franzpoeschel/git-repos/ADIOS2/bindings/CXX11/adios2/cxx11/Engine.cpp:103
#10 0x0000555555558768 in main::{lambda(adios2::Variable<double>&)#1}::operator()(adios2::Variable<double>&) const ()
#11 0x0000555555558b84 in main ()

(2)
Crash behavior:
Segfault on reader:

 547│     size_t Stride = Size / 8;
 548│     unsigned long Print = 0;
 549│     if (!Page)
 550│         return 0;
 551│     for (int i = 0; i < 8; i++)
 552│     {
 553│         size_t Index = Start + Stride * i;
 554│         unsigned char Component = 0;  
 555│         while ((Page[Index] == 0) && (Index < (Size - 1)))
 556│         {
 557│             Component++;
 558│             Index++;
 559│         }
 560│         Component += (unsigned char)Page[Index];
 561│         Print |= (((unsigned long)Component) << (8 * i));
 562│     }
/home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c                                                                                                                
0x00007ffff74fe7e6 in writeBlockFingerprint (Page=0x7ffff0005d30 "\224", Size=4294967679) at /home/franzpoeschel/git-repos/ADIOS2/source/adios2/toolkit/sst/dp/evpath_dp.c:555        [60/60]
(gdb) p Index
$1 = 268435479
(gdb) p Size
$2 = 4294967679
(gdb) p Component
$3 = 0 '\000'
(gdb) (gdb) p Page
$4 = 0x7ffff0005d30 "\224"

The writer only sends a warning after this Writer 0 (0x55555557b820): Got an unexpected connection close event.

The environment is a nvidia/cuda:11.6.0-devel-ubuntu20.04 Singularity container with a g++ 11.1.0 ADIOS2 Debug build.

@franzpoeschel
Copy link
Contributor Author

The issues are reproducible for v2.8.2 on my local system (openSUSE Leap 15.4), so they seem to not be entirely system-dependent. Setting IO.SetParameter("QueueLimit", "1"); helps reproduce the hangup on a system with limited RAM.
Didn't try any other ADIOS2 versions locally.

@eisenhauer
Copy link
Member

Thanks. Let me see what I can do. (I'm currently isolating with an active CoViD infection, so I'm not exactly on top of my game, but I'm not completely non-functional.)

@franzpoeschel
Copy link
Contributor Author

As said before, this is not urgent. Wishing you a quick recovery!

@eisenhauer
Copy link
Member

Just a thought while I'm poking at this. The example reader code is subtly wrong. For streaming engines, variables get wiped (at least potentially), so you have to do the InqVar again inside the loop. (There are a lot more problems supporting >2Gb data blocks inside BP5 than this, but this did cause undefined results because the var (and the start/count blocks that it contained) were free'd on the next BeginStep() and had random values.)

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 22, 2022

Ah, thanks for the hint. I should check if our implementation in openPMD does this correctly, then.

EDIT: Yep, we do InquireVariable before every dataset read

@eisenhauer
Copy link
Member

Looking at this in the background. Part of the problem is that there's actually a linux limitation too. Even on 64-bit systems, a single IO operation is limited to MAX_RW_COUNT bytes, where :
#define MAX_RW_COUNT (INT_MAX & PAGE_MASK)
and that works out to 0x7ffff000 (2,147,479,552). So, we can pass 64-bit sizes all the way through the system, but then get hung up when the lowest levels of EVPath network handling (actually the cmsockets transport) fail because they've been written to assume that you could submit a single writev() or read() operation that sends or reads a message.

The upshot is that I have mods that I'll commit, but they don't yet completely solve this problem because of the MAX_RW_COUNT issue. Doing the final fix is a bit complicated because of how it interacts with existing support for async write() operations in EVPath.

@franzpoeschel
Copy link
Contributor Author

I already got the impression that going beyond 2GB will run into limitations at many different corners, given the variety of errors that I had.
So, part of this issue is probably the question whether such large values are supposed to be supported by the WAN backend at all, or if I should look into other engines such as Dataman. If you think this hurdle can be overcome, then that's good news to me however, since our support for SST is definitely more mature than that for other engines.

@eisenhauer
Copy link
Member

I think this can be overcome. It's just going to take a bit of experimentation. The straightforward approach I tried when I first discovered it just had some rather disastrous, so I need to sort through alternatives...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants