Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add LoongArch aal #399

Closed
wants to merge 308 commits into from

Conversation

SchrodingerZhu
Copy link
Collaborator

Signed-off-by: SchrodingerZhu [email protected]
Just to show the portability of snmalloc to loongarch.
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN

$/home/schrodinger/Documents/qemu/build/qemu-loongarch64 -L $LA_PATH/../target/usr/ perf-singlethread-1
Count:  32768, Size:     16, ZeroMem: 0, Write: 0:     25995159 ns
Count:  32768, Size:     16, ZeroMem: 0, Write: 1:     23428680 ns
Count:  32768, Size:     16, ZeroMem: 1, Write: 0:     25952950 ns
Count:  32768, Size:     16, ZeroMem: 1, Write: 1:     25649431 ns
Count:  32768, Size:     32, ZeroMem: 0, Write: 0:     23379167 ns
Count:  32768, Size:     32, ZeroMem: 0, Write: 1:     23449238 ns
Count:  32768, Size:     32, ZeroMem: 1, Write: 0:     26265615 ns
Count:  32768, Size:     32, ZeroMem: 1, Write: 1:     26149518 ns
Count:  32768, Size:     64, ZeroMem: 0, Write: 0:     24059021 ns
Count:  32768, Size:     64, ZeroMem: 0, Write: 1:     23810705 ns
Count:  32768, Size:     64, ZeroMem: 1, Write: 0:     26568643 ns
Count:  32768, Size:     64, ZeroMem: 1, Write: 1:     26486249 ns
Count:  32768, Size:    128, ZeroMem: 0, Write: 0:     25350150 ns
Count:  32768, Size:    128, ZeroMem: 0, Write: 1:     25165214 ns
Count:  32768, Size:    128, ZeroMem: 1, Write: 0:     28383734 ns
Count:  32768, Size:    128, ZeroMem: 1, Write: 1:     28094443 ns
Count:   1024, Size:   4096, ZeroMem: 0, Write: 0:      2374569 ns
Count:   1024, Size:   4096, ZeroMem: 0, Write: 1:      2036175 ns
Count:   1024, Size:   4096, ZeroMem: 1, Write: 0:      2870639 ns
Count:   1024, Size:   4096, ZeroMem: 1, Write: 1:      2554837 ns
Count:   1024, Size:   8192, ZeroMem: 0, Write: 0:      5612747 ns
Count:   1024, Size:   8192, ZeroMem: 0, Write: 1:      4376410 ns
Count:   1024, Size:   8192, ZeroMem: 1, Write: 0:      5351678 ns
Count:   1024, Size:   8192, ZeroMem: 1, Write: 1:      5368780 ns
Count:   1024, Size:  16384, ZeroMem: 0, Write: 0:     11227398 ns
Count:   1024, Size:  16384, ZeroMem: 0, Write: 1:      9032074 ns
Count:   1024, Size:  16384, ZeroMem: 1, Write: 0:     11434806 ns
Count:   1024, Size:  16384, ZeroMem: 1, Write: 1:     11662212 ns
Count:   1024, Size:  32768, ZeroMem: 0, Write: 0:     21082663 ns
Count:   1024, Size:  32768, ZeroMem: 0, Write: 1:     17188197 ns
Count:   1024, Size:  32768, ZeroMem: 1, Write: 0:     24380463 ns
Count:   1024, Size:  32768, ZeroMem: 1, Write: 1:     24069601 ns
Count:   1024, Size:  65536, ZeroMem: 0, Write: 0:     40695674 ns
Count:   1024, Size:  65536, ZeroMem: 0, Write: 1:     32872525 ns
Count:   1024, Size:  65536, ZeroMem: 1, Write: 0:     48841939 ns
Count:   1024, Size:  65536, ZeroMem: 1, Write: 1:     48756758 ns
Count:   1024, Size: 131072, ZeroMem: 0, Write: 0:     84693715 ns
Count:   1024, Size: 131072, ZeroMem: 0, Write: 1:     66228266 ns
Count:   1024, Size: 131072, ZeroMem: 1, Write: 0:    105022838 ns
Count:   1024, Size: 131072, ZeroMem: 1, Write: 1:    104210255 ns

All multi-thread tests currently failed with segfault. But it seems that the process quits on thread creation. I may investigate it later.

ihaller and others added 30 commits August 26, 2021 12:18
Improved support for MSVC with C++17
This passes though to an underlying allocator rather than using
snmalloc.  This is required for using ASAN in Verona.  Verona takes a
close coupling with snmalloc, but to use with ASAN would require a
more work, so we pass to the system allocator in this case.
This commit splits the sizeclass meta-data to generate better cache
locality for various lookups for checking for size and start of
sizeclasses.

Also, contains some tidying including removing sizeclasses covering
large range. This is left over from an alternative design for large
classes that is no longer in use.
When we are accessing potentially out of range, then we might be
accessing before the pagemap has been initialised.  Move the check
into the pagemap for better codegen.
The PR microsoft#359 regressed codegen for deallocation. This fixes it.
And do so by type, rather than by value.  While here, introduce a C++20 concept
for this Backend-offered proxy and adjust the template parameters appropriately.

This will be useful for the process sandbox code, which needs to mediate stores
to the pagemap, but can provide a read-only view.
Fortunately, C++ taketh away and C++ giveth, both, so here we are: a way to
detect if we're in the middle of definining a type that uses itself as a
template parameter in a way that flows into a concept check and, if so,
short-circuit out of the need to actually do any checks.  Wonders never cease.
Just introduce the alias publicly so we can grab it when checking concepts
Wire the concept into the rest of the tree, being careful to avoid demanding the
result of fixed-pointing computation while tying the knot.
While here give it a slightly more appropriate name to better distinguish the
backing store from the interface class that gets passed around.
David points out that we might not have a static way to get at the pagemap, so
it is potentially useful to pass pointers to state objects down from the
Allocators.
Modernise and tidy the CMake a bit:

 - Use generator expressions for a lot of conditionals so that things
   are more reliable with multi-config generators (and less verbose).
 - Remove C as a needed language.  None of the code was C but we were
   using C to test if headers worked.  This was fragile because a build
   with `CMAKE_CXX_COMPILER` set might have checked things compiled with
   the system C compiler and then failed when the specified C++ compiler
   used different headers.
 - Rename the `BACKTRACE_HEADER` macro to `SNMALLOC_BACKTRACE_HEADER`.
   This is exposed into code that consumes snmalloc and so should be
   'namespaced' (to the degree that's possible with C macros).
 - Clean up the options and use dependent options to hide options 
   that are not always relevant.
 - Use functions instead of macros for better variable scoping.
 - Factor out some duplicated bits into functions.
 - Update to the latest way of telling CMake to use C++17 or C++20.
 - Migrate everything that's setting global properties to setting only
   per-target properties.
 - Link with -nostdlib++ if it's available.  If it isn't, fall back to
   enabling the C language and linking with the C compiler.
 - Make the per-test log messages verbose outputs.  These kept scrolling
   important messages off the top of the screen for me.
 - Make building as a header-only library a public option.
 - Add install targets that install all of the headers and provide a
   config option.  This works with the header-only configuration for
   integration with things like vcpkg.
 - Fix a missing `#endif` in the `malloc_useable_size` check.  This was
   failing co compile on all platforms because of the missing `#endif`.
 - Bump the minimum version to 3.14 so that we have access to
   target_link_options.  This is necessary to use generator expressions
   for linker flags.
 - Make the linker error if the shim libraries depend on symbols that
   are not defined in the explicitly-provided libraries.
 - Make the old-Ubuntu CI jobs use C++17 explicitly (previously CMake 
   was silently ignoring the fact that the compiler didn't support C++20)
 - Fix errors found by the more aggressive linking mode.

With these changes, it's now possible to install snmalloc and then, in
another project, do something like this:

```cmake
find_package(snmalloc CONFIG REQUIRED)
target_link_libraries(t1 snmalloc::snmalloc)
target_link_libraries(t2 snmalloc::snmallocshim-static)
```

In this example, `t1` gets all of the compile flags necessary to include
snmalloc headers for its build configuration.  `t2` is additionally
linked to the snmalloc static shim library.
Introduce Metaslab::from_link(SlabLink*) to encapsulate the "container of"
dance.  Note that Metaslab was not a standard layout type prior to this change
(since both SlabLink and Metaslab defined non-static data members), and so the
reinterpret_cast<>s replaced here with ::from_link() were UB, but everyone lays
out classes as one expects so it was fine in practice.

Most of the uses of ::from_link() are already guarded by checks that the link
pointer is not nullptr, but in src/mem/corealloc.h:/debug_is_empty_impl we shift
to testing the link pointer explicitly before converting to the metaslab.

Despite that Metaslab is now standard layout, we still don't fall back to the
inter-convertibility of a standard layout class and its first[*] data member
since we're going to want to put a common initial sequence across Metaslab and
ChunkRecord and the SlabLink isn't likely to be in it.
Verona uses these.
The memcpy implementation is not completely stupid but is almost
certainly not as good as a carefully tuned and optimised one.

Building snmalloc with FreeBSD's libc memcpy + jemalloc and with this,
each 10 times, does not show a statistically significant performance
difference at 95% confidence.  The snmalloc version has very slightly
lower median and worst-case times.  This is in no way a sensible
benchmark, but it serves as a smoke test for significant performance
regressions.

The CI self-host job now uses the checked memcpy.

This also fixes an off-by-one error in the external bounds.  This is
triggered by ninja, so we will see breakage in CI if it is reintroduced.

In debug builds, we provide a verbose error containing the address of
the allocation, the base and bounds of the allocation, and a backtrace.

The backtrace was broken by the CI cleanup moving the BACKTRACE_HEADER
macro into the SNMALLOC_ namespace.  This is also fixed.

The test involves hijacking `abort`, which doesn't work everywhere.  It
also requires `backtrace` to work in configurations where stack traces
are enabled.  This is disabled in QEMU because `backtrace` appears to
crash reliably in QEMU user mode.

For now, in the -checks build configurations, we are hitting a slow path
in the pagemap on accesses so that the pages that are `PROT_NONE` don't
cause crashes.  These need to be made read-only, but this requires a PAL
change.
* Add compiler abstractions over fast fail.

* Fix MSVC / GCC's disagreement over inline.

* Rework the inline definitions.

* Use _snprintf_s_l.
* Add extra key to freelist.  This follows the encoding Cedric suggested
  for a signature of two things. Free list key now has a pair of keys
  for encoding previous pointer. This makes it harder to extract the
  underlying keys out of the multiplication.

* Apply SFINAE to the extract_segment.
This exposes a readonly notify using, so that the underlying platform
can map the range of pages readonly into the application.  This improves
performance of external pointer on platforms that support lazy commit
of pages as it can access anything in the range.
The various Pals were given different meanings in CHECK_CLIENT and
non-CHECK_CLIENT builds.  This was because it is essential
that in the CHECK_CLIENT builds access is prevented, when not requested.

This PR separates the CHECK_CLIENT concept from how the Pal should be
implemented.
mjp41 and others added 20 commits June 7, 2022 16:13
Most ranges just deal with whatever kinds of ranges their parent deal with, but
that might be Chunk- or (soon) Arena-bounded.  This commit does not yet
introduce nuance, but just sets the stage.
Do not hard-code FrontendSlabMetadata, but rather take it as a template argment.
We're going to plumb other types through for StrictProvenance.
Expose a static CapPtr<T,B>::unsafe_from() and use that everywhere instead
(though continue to allow implicit and explicit construction of CapPtr from
nullptr).
Make these generic, with the SmallBuddyRange taking its cue from the parent
Range, since we're about to change them anyway and might want to vary them again
in the future.
This allows us to have a single Pipe-line of ranges where we can, nevertheless,
jump over the small buddy allocator when making large allocations.  This, in
turn, will let us differentiate the types coming from the small end and the
large "tap" on this Pipe-line.
Now that we've split the range Pipe-line externally, the small-buddy ranges
should never be seeing large requests.
Update the backend concept so that metadata allocations are Arena-bounded.
Wrap the FrontendSlabMetadata with a struct that holds the Arena-bounded
authority for Chunks that the Backend ships out to the Frontend or, for
non-StrictProvenance architecture, encapsulates the sleight of hand that turns
Chunk-bounded CapPtr-s to Arena-bounded ones.
These pieces of metadata (specifically, the Allocator structures) are never
deallocated at the moment, so we need not consider how we might amplify these
bounded pointers back to higher authority.
ICF currently breaks building on Morello, so allow cmake to notch it out.
* Fix pal_linux.h for older linux systems

Where MADV_FREE is not defined - replaced with MADV_DONTNEED
Where GRND_NONBLOCK is not defined in <sys/random.h> but in <linux/random.h>

* Check for linux/random.h in CMake

as __has_include seems to not be reliable

* Use CMake module CheckIncludeFilesCXX

as C language isn't enabled by default everywhere

* Move madvise flag ifdefs into constexpr for cleaner code
Otherwise, on platforms for which {,u}intptr_t aren't just typedef-s of
other scalar types, it's ambiguous which way an implicit cast should go.
@SchrodingerZhu
Copy link
Collaborator Author

Just tested with snmalloc main branch and the latest cross-tools at https://github.com/loongson/build-tools.
Still a lot of mess.

  • The ar tool gives FPE.
  • many operations still fail with min_page_size= 0x1000 due to madvise returning EINVAL.

@xen0n is there any recent progress on improving the situation?

#pragma once

#if __SIZEOF_POINTER__ == 8
#  define SNMALLOC_VA_BITS_64
#else
#  define SNMALLOC_VA_BITS_32
#endif

#include <cstddef>
namespace snmalloc
{
  /**
   * Loongarch-specific architecture abstraction layer.
   */
  class AAL_LoongArch
  {
  public:
    /**
     * Bitmap of AalFeature flags
     */
    static constexpr uint64_t aal_features =
      IntegerPointers | NoCpuCycleCounters;

    static constexpr enum AalName aal_name = LoongArch;

    static constexpr size_t smallest_page_size = 0x1000;

    /**
     * On pipelined processors, notify the core that we are in a spin loop and
     * that speculative execution past this point may not be a performance gain.
     */
    static inline void pause()
    {
      __asm__ __volatile__("dbar 0" : : : "memory");
    }

    /**
     * PRELD reads a cache-line of data from memory in advance into the Cache.
     * The access address is the 12bit immediate number of the value in the
     * general register rj plus the symbol extension.
     *
     * The processor learns from the hint in the PRELD instruction what type
     * will be acquired and which level of Cache the data to be taken back fill
     * in, hint has 32 optional values (0 to 31), 0 represents load to level 1
     * Cache If the Cache attribute of the access address of the PRELD
     * instruction is not cached, then the instruction cannot generate a memory
     * access action and is treated as a NOP instruction. The PRELD instruction
     * will not trigger any exceptions related to MMU or address.
     */
    static inline void prefetch(void* ptr)
    {
      __asm__ volatile("preld 0, %0, 0" : "=r"(ptr));
    }
  };

  using AAL_Arch = AAL_LoongArch;
} // namespace snmalloc

nwf-msr and others added 5 commits July 7, 2022 16:57
Make it easier to justify our avoidance of capptr_from_client and
capptr_reveal in external_pointer by performing address_cast earlier.
In particular, with this change, we can see that the pointer (and so its
authority, in CHERI) is not passed to any called function other than
address_cast and pointer_offset, and so authority is merely propagated
and neither exercised nor amplified.

Remove the long-disused capptr_reveal_wild, which was added for earlier
versions of external_pointer.
Signed-off-by: Schrodinger ZHU Yifan <[email protected]>
Signed-off-by: Schrodinger ZHU Yifan <[email protected]>
@SchrodingerZhu
Copy link
Collaborator Author

close due to #553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.