add LoongArch aal #399

SchrodingerZhu · 2021-10-13T12:17:53Z

Signed-off-by: SchrodingerZhu [email protected]
Just to show the portability of snmalloc to loongarch.
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN

$/home/schrodinger/Documents/qemu/build/qemu-loongarch64 -L $LA_PATH/../target/usr/ perf-singlethread-1
Count:  32768, Size:     16, ZeroMem: 0, Write: 0:     25995159 ns
Count:  32768, Size:     16, ZeroMem: 0, Write: 1:     23428680 ns
Count:  32768, Size:     16, ZeroMem: 1, Write: 0:     25952950 ns
Count:  32768, Size:     16, ZeroMem: 1, Write: 1:     25649431 ns
Count:  32768, Size:     32, ZeroMem: 0, Write: 0:     23379167 ns
Count:  32768, Size:     32, ZeroMem: 0, Write: 1:     23449238 ns
Count:  32768, Size:     32, ZeroMem: 1, Write: 0:     26265615 ns
Count:  32768, Size:     32, ZeroMem: 1, Write: 1:     26149518 ns
Count:  32768, Size:     64, ZeroMem: 0, Write: 0:     24059021 ns
Count:  32768, Size:     64, ZeroMem: 0, Write: 1:     23810705 ns
Count:  32768, Size:     64, ZeroMem: 1, Write: 0:     26568643 ns
Count:  32768, Size:     64, ZeroMem: 1, Write: 1:     26486249 ns
Count:  32768, Size:    128, ZeroMem: 0, Write: 0:     25350150 ns
Count:  32768, Size:    128, ZeroMem: 0, Write: 1:     25165214 ns
Count:  32768, Size:    128, ZeroMem: 1, Write: 0:     28383734 ns
Count:  32768, Size:    128, ZeroMem: 1, Write: 1:     28094443 ns
Count:   1024, Size:   4096, ZeroMem: 0, Write: 0:      2374569 ns
Count:   1024, Size:   4096, ZeroMem: 0, Write: 1:      2036175 ns
Count:   1024, Size:   4096, ZeroMem: 1, Write: 0:      2870639 ns
Count:   1024, Size:   4096, ZeroMem: 1, Write: 1:      2554837 ns
Count:   1024, Size:   8192, ZeroMem: 0, Write: 0:      5612747 ns
Count:   1024, Size:   8192, ZeroMem: 0, Write: 1:      4376410 ns
Count:   1024, Size:   8192, ZeroMem: 1, Write: 0:      5351678 ns
Count:   1024, Size:   8192, ZeroMem: 1, Write: 1:      5368780 ns
Count:   1024, Size:  16384, ZeroMem: 0, Write: 0:     11227398 ns
Count:   1024, Size:  16384, ZeroMem: 0, Write: 1:      9032074 ns
Count:   1024, Size:  16384, ZeroMem: 1, Write: 0:     11434806 ns
Count:   1024, Size:  16384, ZeroMem: 1, Write: 1:     11662212 ns
Count:   1024, Size:  32768, ZeroMem: 0, Write: 0:     21082663 ns
Count:   1024, Size:  32768, ZeroMem: 0, Write: 1:     17188197 ns
Count:   1024, Size:  32768, ZeroMem: 1, Write: 0:     24380463 ns
Count:   1024, Size:  32768, ZeroMem: 1, Write: 1:     24069601 ns
Count:   1024, Size:  65536, ZeroMem: 0, Write: 0:     40695674 ns
Count:   1024, Size:  65536, ZeroMem: 0, Write: 1:     32872525 ns
Count:   1024, Size:  65536, ZeroMem: 1, Write: 0:     48841939 ns
Count:   1024, Size:  65536, ZeroMem: 1, Write: 1:     48756758 ns
Count:   1024, Size: 131072, ZeroMem: 0, Write: 0:     84693715 ns
Count:   1024, Size: 131072, ZeroMem: 0, Write: 1:     66228266 ns
Count:   1024, Size: 131072, ZeroMem: 1, Write: 0:    105022838 ns
Count:   1024, Size: 131072, ZeroMem: 1, Write: 1:    104210255 ns

All multi-thread tests currently failed with segfault. But it seems that the process quits on thread creation. I may investigate it later.

Improved support for MSVC with C++17

This passes though to an underlying allocator rather than using snmalloc. This is required for using ASAN in Verona. Verona takes a close coupling with snmalloc, but to use with ASAN would require a more work, so we pass to the system allocator in this case.

This commit splits the sizeclass meta-data to generate better cache locality for various lookups for checking for size and start of sizeclasses. Also, contains some tidying including removing sizeclasses covering large range. This is left over from an alternative design for large classes that is no longer in use.

When we are accessing potentially out of range, then we might be accessing before the pagemap has been initialised. Move the check into the pagemap for better codegen.

The PR microsoft#359 regressed codegen for deallocation. This fixes it.

And do so by type, rather than by value. While here, introduce a C++20 concept for this Backend-offered proxy and adjust the template parameters appropriately. This will be useful for the process sandbox code, which needs to mediate stores to the pagemap, but can provide a read-only view.

Co-authored-by: David Chisnall <[email protected]>

Fortunately, C++ taketh away and C++ giveth, both, so here we are: a way to detect if we're in the middle of definining a type that uses itself as a template parameter in a way that flows into a concept check and, if so, short-circuit out of the need to actually do any checks. Wonders never cease.

Just introduce the alias publicly so we can grab it when checking concepts

Wire the concept into the rest of the tree, being careful to avoid demanding the result of fixed-pointing computation while tying the knot.

While here give it a slightly more appropriate name to better distinguish the backing store from the interface class that gets passed around.

David points out that we might not have a static way to get at the pagemap, so it is potentially useful to pass pointers to state objects down from the Allocators.

Modernise and tidy the CMake a bit: - Use generator expressions for a lot of conditionals so that things are more reliable with multi-config generators (and less verbose). - Remove C as a needed language. None of the code was C but we were using C to test if headers worked. This was fragile because a build with `CMAKE_CXX_COMPILER` set might have checked things compiled with the system C compiler and then failed when the specified C++ compiler used different headers. - Rename the `BACKTRACE_HEADER` macro to `SNMALLOC_BACKTRACE_HEADER`. This is exposed into code that consumes snmalloc and so should be 'namespaced' (to the degree that's possible with C macros). - Clean up the options and use dependent options to hide options that are not always relevant. - Use functions instead of macros for better variable scoping. - Factor out some duplicated bits into functions. - Update to the latest way of telling CMake to use C++17 or C++20. - Migrate everything that's setting global properties to setting only per-target properties. - Link with -nostdlib++ if it's available. If it isn't, fall back to enabling the C language and linking with the C compiler. - Make the per-test log messages verbose outputs. These kept scrolling important messages off the top of the screen for me. - Make building as a header-only library a public option. - Add install targets that install all of the headers and provide a config option. This works with the header-only configuration for integration with things like vcpkg. - Fix a missing `#endif` in the `malloc_useable_size` check. This was failing co compile on all platforms because of the missing `#endif`. - Bump the minimum version to 3.14 so that we have access to target_link_options. This is necessary to use generator expressions for linker flags. - Make the linker error if the shim libraries depend on symbols that are not defined in the explicitly-provided libraries. - Make the old-Ubuntu CI jobs use C++17 explicitly (previously CMake was silently ignoring the fact that the compiler didn't support C++20) - Fix errors found by the more aggressive linking mode. With these changes, it's now possible to install snmalloc and then, in another project, do something like this: ```cmake find_package(snmalloc CONFIG REQUIRED) target_link_libraries(t1 snmalloc::snmalloc) target_link_libraries(t2 snmalloc::snmallocshim-static) ``` In this example, `t1` gets all of the compile flags necessary to include snmalloc headers for its build configuration. `t2` is additionally linked to the snmalloc static shim library.

Introduce Metaslab::from_link(SlabLink*) to encapsulate the "container of" dance. Note that Metaslab was not a standard layout type prior to this change (since both SlabLink and Metaslab defined non-static data members), and so the reinterpret_cast<>s replaced here with ::from_link() were UB, but everyone lays out classes as one expects so it was fine in practice. Most of the uses of ::from_link() are already guarded by checks that the link pointer is not nullptr, but in src/mem/corealloc.h:/debug_is_empty_impl we shift to testing the link pointer explicitly before converting to the metaslab. Despite that Metaslab is now standard layout, we still don't fall back to the inter-convertibility of a standard layout class and its first[*] data member since we're going to want to put a common initial sequence across Metaslab and ChunkRecord and the SlabLink isn't likely to be in it.

Verona uses these.

…per (microsoft#390) * Extracted backend common functionality

The memcpy implementation is not completely stupid but is almost certainly not as good as a carefully tuned and optimised one. Building snmalloc with FreeBSD's libc memcpy + jemalloc and with this, each 10 times, does not show a statistically significant performance difference at 95% confidence. The snmalloc version has very slightly lower median and worst-case times. This is in no way a sensible benchmark, but it serves as a smoke test for significant performance regressions. The CI self-host job now uses the checked memcpy. This also fixes an off-by-one error in the external bounds. This is triggered by ninja, so we will see breakage in CI if it is reintroduced. In debug builds, we provide a verbose error containing the address of the allocation, the base and bounds of the allocation, and a backtrace. The backtrace was broken by the CI cleanup moving the BACKTRACE_HEADER macro into the SNMALLOC_ namespace. This is also fixed. The test involves hijacking `abort`, which doesn't work everywhere. It also requires `backtrace` to work in configurations where stack traces are enabled. This is disabled in QEMU because `backtrace` appears to crash reliably in QEMU user mode. For now, in the -checks build configurations, we are hitting a slow path in the pagemap on accesses so that the pages that are `PROT_NONE` don't cause crashes. These need to be made read-only, but this requires a PAL change.

* Add compiler abstractions over fast fail. * Fix MSVC / GCC's disagreement over inline. * Rework the inline definitions. * Use _snprintf_s_l.

* Add extra key to freelist. This follows the encoding Cedric suggested for a signature of two things. Free list key now has a pair of keys for encoding previous pointer. This makes it harder to extract the underlying keys out of the multiplication. * Apply SFINAE to the extract_segment.

This exposes a readonly notify using, so that the underlying platform can map the range of pages readonly into the application. This improves performance of external pointer on platforms that support lazy commit of pages as it can access anything in the range.

The various Pals were given different meanings in CHECK_CLIENT and non-CHECK_CLIENT builds. This was because it is essential that in the CHECK_CLIENT builds access is prevented, when not requested. This PR separates the CHECK_CLIENT concept from how the Pal should be implemented.

Most ranges just deal with whatever kinds of ranges their parent deal with, but that might be Chunk- or (soon) Arena-bounded. This commit does not yet introduce nuance, but just sets the stage.

Do not hard-code FrontendSlabMetadata, but rather take it as a template argment. We're going to plumb other types through for StrictProvenance.

Expose a static CapPtr<T,B>::unsafe_from() and use that everywhere instead (though continue to allow implicit and explicit construction of CapPtr from nullptr).

Make these generic, with the SmallBuddyRange taking its cue from the parent Range, since we're about to change them anyway and might want to vary them again in the future.

This allows us to have a single Pipe-line of ranges where we can, nevertheless, jump over the small buddy allocator when making large allocations. This, in turn, will let us differentiate the types coming from the small end and the large "tap" on this Pipe-line.

Now that we've split the range Pipe-line externally, the small-buddy ranges should never be seeing large requests.

Update the backend concept so that metadata allocations are Arena-bounded.

Wrap the FrontendSlabMetadata with a struct that holds the Arena-bounded authority for Chunks that the Backend ships out to the Frontend or, for non-StrictProvenance architecture, encapsulates the sleight of hand that turns Chunk-bounded CapPtr-s to Arena-bounded ones.

These pieces of metadata (specifically, the Allocator structures) are never deallocated at the moment, so we need not consider how we might amplify these bounded pointers back to higher authority.

ICF currently breaks building on Morello, so allow cmake to notch it out.

* Fix pal_linux.h for older linux systems Where MADV_FREE is not defined - replaced with MADV_DONTNEED Where GRND_NONBLOCK is not defined in <sys/random.h> but in <linux/random.h> * Check for linux/random.h in CMake as __has_include seems to not be reliable * Use CMake module CheckIncludeFilesCXX as C language isn't enabled by default everywhere * Move madvise flag ifdefs into constexpr for cleaner code

Otherwise, on platforms for which {,u}intptr_t aren't just typedef-s of other scalar types, it's ambiguous which way an implicit cast should go.

SchrodingerZhu · 2022-06-29T23:27:21Z

Just tested with snmalloc main branch and the latest cross-tools at https://github.com/loongson/build-tools.
Still a lot of mess.

The ar tool gives FPE.
many operations still fail with min_page_size= 0x1000 due to madvise returning EINVAL.

@xen0n is there any recent progress on improving the situation?

#pragma once

#if __SIZEOF_POINTER__ == 8
#  define SNMALLOC_VA_BITS_64
#else
#  define SNMALLOC_VA_BITS_32
#endif

#include <cstddef>
namespace snmalloc
{
  /**
   * Loongarch-specific architecture abstraction layer.
   */
  class AAL_LoongArch
  {
  public:
    /**
     * Bitmap of AalFeature flags
     */
    static constexpr uint64_t aal_features =
      IntegerPointers | NoCpuCycleCounters;

    static constexpr enum AalName aal_name = LoongArch;

    static constexpr size_t smallest_page_size = 0x1000;

    /**
     * On pipelined processors, notify the core that we are in a spin loop and
     * that speculative execution past this point may not be a performance gain.
     */
    static inline void pause()
    {
      __asm__ __volatile__("dbar 0" : : : "memory");
    }

    /**
     * PRELD reads a cache-line of data from memory in advance into the Cache.
     * The access address is the 12bit immediate number of the value in the
     * general register rj plus the symbol extension.
     *
     * The processor learns from the hint in the PRELD instruction what type
     * will be acquired and which level of Cache the data to be taken back fill
     * in, hint has 32 optional values (0 to 31), 0 represents load to level 1
     * Cache If the Cache attribute of the access address of the PRELD
     * instruction is not cached, then the instruction cannot generate a memory
     * access action and is treated as a NOP instruction. The PRELD instruction
     * will not trigger any exceptions related to MMU or address.
     */
    static inline void prefetch(void* ptr)
    {
      __asm__ volatile("preld 0, %0, 0" : "=r"(ptr));
    }
  };

  using AAL_Arch = AAL_LoongArch;
} // namespace snmalloc

Make it easier to justify our avoidance of capptr_from_client and capptr_reveal in external_pointer by performing address_cast earlier. In particular, with this change, we can see that the pointer (and so its authority, in CHERI) is not passed to any called function other than address_cast and pointer_offset, and so authority is merely propagated and neither exercised nor amplified. Remove the long-disused capptr_reveal_wild, which was added for earlier versions of external_pointer.

Signed-off-by: Schrodinger ZHU Yifan <[email protected]>

SchrodingerZhu · 2022-08-22T15:26:29Z

close due to #553

ihaller and others added 30 commits August 26, 2021 12:18

Improved support for MSVC with C++17

935f3cc

Merge pull request microsoft#381 from ihaller/ihaller/msvc17

b84a7af

Improved support for MSVC with C++17

Make pagemap check for init on some gets.

27c4a6a

When we are accessing potentially out of range, then we might be accessing before the pagemap has been initialised. Move the check into the pagemap for better codegen.

Fix codegen for dealloc

7eb8769

The PR microsoft#359 regressed codegen for deallocation. This fixes it.

Rename [gs]et_meta_data to [gs]et_metaentry.

f913f8b

Co-authored-by: David Chisnall <[email protected]>

NFC: Make config objects expose their PoolState types

2e1658f

Just introduce the alias publicly so we can grab it when checking concepts

NFC: Add Concept for equality modulo references

bb6e706

NFC: Add a concept for backend Global objects

e530f56

Use backend global concept on template args

2be44d2

Wire the concept into the rest of the tree, being careful to avoid demanding the result of fixed-pointing computation while tying the knot.

Move BackendAllocator::pagemap closer to use

3710e35

While here give it a slightly more appropriate name to better distinguish the backing store from the interface class that gets passed around.

Plumb LocalState ptrs through to Pagemap accessors

3af9d35

David points out that we might not have a static way to get at the pagemap, so it is potentially useful to pass pointers to state objects down from the Allocators.

Added constness to argv in Opt (microsoft#383)

aedb666

Add missing inline from header. (microsoft#388)

fd18528

Install test headers.

6c5626f

Verona uses these.

Extracted the core elements of the BackendAllocator into a common hel…

d524ef5

…per (microsoft#390) * Extracted backend common functionality

Add compiler abstractions over fast fail. (microsoft#392)

7f71f80

* Add compiler abstractions over fast fail. * Fix MSVC / GCC's disagreement over inline. * Rework the inline definitions. * Use _snprintf_s_l.

PALNoAlloc should delegate more to underlying PAL

1baf675

Prepare for AAL bits / address_bits

4a4ca96

Prepare for PAL address_bits

e212ddd

Move to AAL/PAL bits and address_bits

15e3052

mjp41 and others added 20 commits June 7, 2022 16:13

Missing PRIVATE in cmake. (microsoft#539)

e17672d

NFC: backend_helper: generalize chunk bounds

6a5f3c2

Most ranges just deal with whatever kinds of ranges their parent deal with, but that might be Chunk- or (soon) Arena-bounded. This commit does not yet introduce nuance, but just sets the stage.

NFC: capptr: re-introduce Arena bounds

966f2f1

NFC: Generalize Default Pagemap Entry

d5b155b

Do not hard-code FrontendSlabMetadata, but rather take it as a template argment. We're going to plumb other types through for StrictProvenance.

NFC: additional generalization for CHERI

41128a3

RFC: Hide CapPtr constructor

f41bb32

Expose a static CapPtr<T,B>::unsafe_from() and use that everywhere instead (though continue to allow implicit and explicit construction of CapPtr from nullptr).

NFC: Generalize smallbuddyrange bounds annotations

aa61b59

Make these generic, with the SmallBuddyRange taking its cue from the parent Range, since we're about to change them anyway and might want to vary them again in the future.

smallbuddy ranges are only for small things

a78a16e

Now that we've split the range Pipe-line externally, the small-buddy ranges should never be seeing large requests.

backend ranges: use Arena bounds throughout

86124ba

Update the backend concept so that metadata allocations are Arena-bounded.

mem/pool: Alloc-bound pooled things

1f79c76

These pieces of metadata (specifically, the Allocator structures) are never deallocated at the moment, so we need not consider how we might amplify these bounded pointers back to higher authority.

docs: Update StrictProvenance

3fce61e

func-malloc: expand CHERI tests to check no-VMEM

095e8f1

RFC: Add tests for some CHERI-specific behaviors

da19291

NFC: cmake: add SNMALLOC_LINK_ICF, default on

3e72ef6

ICF currently breaks building on Morello, so allow cmake to notch it out.

Fix wrong ifdef in pal_linux.h (microsoft#546)

c560a9a

Add buffer append method for {,u}intptr_t

467c28b

Otherwise, on platforms for which {,u}intptr_t aren't just typedef-s of other scalar types, it's ambiguous which way an implicit cast should go.

Add Morello CI

df1dbc9

nwf-msr and others added 5 commits July 7, 2022 16:57

NFC: rename ConceptBound to IsBound

b2c75df

NFC: Rename ConceptAAL to IsAAL

9e0fefc

NFC: Rename ConceptPAL to IsPAL

db3ae1c

update loongarch

009a331

Signed-off-by: Schrodinger ZHU Yifan <[email protected]>

SchrodingerZhu force-pushed the loongarch-poc branch from c8ed775 to 009a331 Compare August 22, 2022 15:21

format

263543b

Signed-off-by: Schrodinger ZHU Yifan <[email protected]>

SchrodingerZhu closed this Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add LoongArch aal #399

add LoongArch aal #399

SchrodingerZhu commented Oct 13, 2021

SchrodingerZhu commented Jun 29, 2022

SchrodingerZhu commented Aug 22, 2022

add LoongArch aal #399

add LoongArch aal #399

Conversation

SchrodingerZhu commented Oct 13, 2021

SchrodingerZhu commented Jun 29, 2022

SchrodingerZhu commented Aug 22, 2022