-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add LoongArch aal #399
Closed
SchrodingerZhu
wants to merge
308
commits into
microsoft:snmalloc1
from
SchrodingerZhu:loongarch-poc
Closed
add LoongArch aal #399
SchrodingerZhu
wants to merge
308
commits into
microsoft:snmalloc1
from
SchrodingerZhu:loongarch-poc
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Improved support for MSVC with C++17
This passes though to an underlying allocator rather than using snmalloc. This is required for using ASAN in Verona. Verona takes a close coupling with snmalloc, but to use with ASAN would require a more work, so we pass to the system allocator in this case.
This commit splits the sizeclass meta-data to generate better cache locality for various lookups for checking for size and start of sizeclasses. Also, contains some tidying including removing sizeclasses covering large range. This is left over from an alternative design for large classes that is no longer in use.
When we are accessing potentially out of range, then we might be accessing before the pagemap has been initialised. Move the check into the pagemap for better codegen.
The PR microsoft#359 regressed codegen for deallocation. This fixes it.
And do so by type, rather than by value. While here, introduce a C++20 concept for this Backend-offered proxy and adjust the template parameters appropriately. This will be useful for the process sandbox code, which needs to mediate stores to the pagemap, but can provide a read-only view.
Co-authored-by: David Chisnall <[email protected]>
Fortunately, C++ taketh away and C++ giveth, both, so here we are: a way to detect if we're in the middle of definining a type that uses itself as a template parameter in a way that flows into a concept check and, if so, short-circuit out of the need to actually do any checks. Wonders never cease.
Just introduce the alias publicly so we can grab it when checking concepts
Wire the concept into the rest of the tree, being careful to avoid demanding the result of fixed-pointing computation while tying the knot.
While here give it a slightly more appropriate name to better distinguish the backing store from the interface class that gets passed around.
David points out that we might not have a static way to get at the pagemap, so it is potentially useful to pass pointers to state objects down from the Allocators.
Modernise and tidy the CMake a bit: - Use generator expressions for a lot of conditionals so that things are more reliable with multi-config generators (and less verbose). - Remove C as a needed language. None of the code was C but we were using C to test if headers worked. This was fragile because a build with `CMAKE_CXX_COMPILER` set might have checked things compiled with the system C compiler and then failed when the specified C++ compiler used different headers. - Rename the `BACKTRACE_HEADER` macro to `SNMALLOC_BACKTRACE_HEADER`. This is exposed into code that consumes snmalloc and so should be 'namespaced' (to the degree that's possible with C macros). - Clean up the options and use dependent options to hide options that are not always relevant. - Use functions instead of macros for better variable scoping. - Factor out some duplicated bits into functions. - Update to the latest way of telling CMake to use C++17 or C++20. - Migrate everything that's setting global properties to setting only per-target properties. - Link with -nostdlib++ if it's available. If it isn't, fall back to enabling the C language and linking with the C compiler. - Make the per-test log messages verbose outputs. These kept scrolling important messages off the top of the screen for me. - Make building as a header-only library a public option. - Add install targets that install all of the headers and provide a config option. This works with the header-only configuration for integration with things like vcpkg. - Fix a missing `#endif` in the `malloc_useable_size` check. This was failing co compile on all platforms because of the missing `#endif`. - Bump the minimum version to 3.14 so that we have access to target_link_options. This is necessary to use generator expressions for linker flags. - Make the linker error if the shim libraries depend on symbols that are not defined in the explicitly-provided libraries. - Make the old-Ubuntu CI jobs use C++17 explicitly (previously CMake was silently ignoring the fact that the compiler didn't support C++20) - Fix errors found by the more aggressive linking mode. With these changes, it's now possible to install snmalloc and then, in another project, do something like this: ```cmake find_package(snmalloc CONFIG REQUIRED) target_link_libraries(t1 snmalloc::snmalloc) target_link_libraries(t2 snmalloc::snmallocshim-static) ``` In this example, `t1` gets all of the compile flags necessary to include snmalloc headers for its build configuration. `t2` is additionally linked to the snmalloc static shim library.
Introduce Metaslab::from_link(SlabLink*) to encapsulate the "container of" dance. Note that Metaslab was not a standard layout type prior to this change (since both SlabLink and Metaslab defined non-static data members), and so the reinterpret_cast<>s replaced here with ::from_link() were UB, but everyone lays out classes as one expects so it was fine in practice. Most of the uses of ::from_link() are already guarded by checks that the link pointer is not nullptr, but in src/mem/corealloc.h:/debug_is_empty_impl we shift to testing the link pointer explicitly before converting to the metaslab. Despite that Metaslab is now standard layout, we still don't fall back to the inter-convertibility of a standard layout class and its first[*] data member since we're going to want to put a common initial sequence across Metaslab and ChunkRecord and the SlabLink isn't likely to be in it.
Verona uses these.
…per (microsoft#390) * Extracted backend common functionality
The memcpy implementation is not completely stupid but is almost certainly not as good as a carefully tuned and optimised one. Building snmalloc with FreeBSD's libc memcpy + jemalloc and with this, each 10 times, does not show a statistically significant performance difference at 95% confidence. The snmalloc version has very slightly lower median and worst-case times. This is in no way a sensible benchmark, but it serves as a smoke test for significant performance regressions. The CI self-host job now uses the checked memcpy. This also fixes an off-by-one error in the external bounds. This is triggered by ninja, so we will see breakage in CI if it is reintroduced. In debug builds, we provide a verbose error containing the address of the allocation, the base and bounds of the allocation, and a backtrace. The backtrace was broken by the CI cleanup moving the BACKTRACE_HEADER macro into the SNMALLOC_ namespace. This is also fixed. The test involves hijacking `abort`, which doesn't work everywhere. It also requires `backtrace` to work in configurations where stack traces are enabled. This is disabled in QEMU because `backtrace` appears to crash reliably in QEMU user mode. For now, in the -checks build configurations, we are hitting a slow path in the pagemap on accesses so that the pages that are `PROT_NONE` don't cause crashes. These need to be made read-only, but this requires a PAL change.
* Add compiler abstractions over fast fail. * Fix MSVC / GCC's disagreement over inline. * Rework the inline definitions. * Use _snprintf_s_l.
* Add extra key to freelist. This follows the encoding Cedric suggested for a signature of two things. Free list key now has a pair of keys for encoding previous pointer. This makes it harder to extract the underlying keys out of the multiplication. * Apply SFINAE to the extract_segment.
This exposes a readonly notify using, so that the underlying platform can map the range of pages readonly into the application. This improves performance of external pointer on platforms that support lazy commit of pages as it can access anything in the range.
The various Pals were given different meanings in CHECK_CLIENT and non-CHECK_CLIENT builds. This was because it is essential that in the CHECK_CLIENT builds access is prevented, when not requested. This PR separates the CHECK_CLIENT concept from how the Pal should be implemented.
Most ranges just deal with whatever kinds of ranges their parent deal with, but that might be Chunk- or (soon) Arena-bounded. This commit does not yet introduce nuance, but just sets the stage.
Do not hard-code FrontendSlabMetadata, but rather take it as a template argment. We're going to plumb other types through for StrictProvenance.
Expose a static CapPtr<T,B>::unsafe_from() and use that everywhere instead (though continue to allow implicit and explicit construction of CapPtr from nullptr).
Make these generic, with the SmallBuddyRange taking its cue from the parent Range, since we're about to change them anyway and might want to vary them again in the future.
This allows us to have a single Pipe-line of ranges where we can, nevertheless, jump over the small buddy allocator when making large allocations. This, in turn, will let us differentiate the types coming from the small end and the large "tap" on this Pipe-line.
Now that we've split the range Pipe-line externally, the small-buddy ranges should never be seeing large requests.
Update the backend concept so that metadata allocations are Arena-bounded.
Wrap the FrontendSlabMetadata with a struct that holds the Arena-bounded authority for Chunks that the Backend ships out to the Frontend or, for non-StrictProvenance architecture, encapsulates the sleight of hand that turns Chunk-bounded CapPtr-s to Arena-bounded ones.
These pieces of metadata (specifically, the Allocator structures) are never deallocated at the moment, so we need not consider how we might amplify these bounded pointers back to higher authority.
ICF currently breaks building on Morello, so allow cmake to notch it out.
* Fix pal_linux.h for older linux systems Where MADV_FREE is not defined - replaced with MADV_DONTNEED Where GRND_NONBLOCK is not defined in <sys/random.h> but in <linux/random.h> * Check for linux/random.h in CMake as __has_include seems to not be reliable * Use CMake module CheckIncludeFilesCXX as C language isn't enabled by default everywhere * Move madvise flag ifdefs into constexpr for cleaner code
Otherwise, on platforms for which {,u}intptr_t aren't just typedef-s of other scalar types, it's ambiguous which way an implicit cast should go.
Just tested with snmalloc main branch and the latest
@xen0n is there any recent progress on improving the situation? #pragma once
#if __SIZEOF_POINTER__ == 8
# define SNMALLOC_VA_BITS_64
#else
# define SNMALLOC_VA_BITS_32
#endif
#include <cstddef>
namespace snmalloc
{
/**
* Loongarch-specific architecture abstraction layer.
*/
class AAL_LoongArch
{
public:
/**
* Bitmap of AalFeature flags
*/
static constexpr uint64_t aal_features =
IntegerPointers | NoCpuCycleCounters;
static constexpr enum AalName aal_name = LoongArch;
static constexpr size_t smallest_page_size = 0x1000;
/**
* On pipelined processors, notify the core that we are in a spin loop and
* that speculative execution past this point may not be a performance gain.
*/
static inline void pause()
{
__asm__ __volatile__("dbar 0" : : : "memory");
}
/**
* PRELD reads a cache-line of data from memory in advance into the Cache.
* The access address is the 12bit immediate number of the value in the
* general register rj plus the symbol extension.
*
* The processor learns from the hint in the PRELD instruction what type
* will be acquired and which level of Cache the data to be taken back fill
* in, hint has 32 optional values (0 to 31), 0 represents load to level 1
* Cache If the Cache attribute of the access address of the PRELD
* instruction is not cached, then the instruction cannot generate a memory
* access action and is treated as a NOP instruction. The PRELD instruction
* will not trigger any exceptions related to MMU or address.
*/
static inline void prefetch(void* ptr)
{
__asm__ volatile("preld 0, %0, 0" : "=r"(ptr));
}
};
using AAL_Arch = AAL_LoongArch;
} // namespace snmalloc
|
Make it easier to justify our avoidance of capptr_from_client and capptr_reveal in external_pointer by performing address_cast earlier. In particular, with this change, we can see that the pointer (and so its authority, in CHERI) is not passed to any called function other than address_cast and pointer_offset, and so authority is merely propagated and neither exercised nor amplified. Remove the long-disused capptr_reveal_wild, which was added for earlier versions of external_pointer.
Signed-off-by: Schrodinger ZHU Yifan <[email protected]>
SchrodingerZhu
force-pushed
the
loongarch-poc
branch
from
August 22, 2022 15:21
c8ed775
to
009a331
Compare
Signed-off-by: Schrodinger ZHU Yifan <[email protected]>
close due to #553 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: SchrodingerZhu [email protected]
Just to show the portability of snmalloc to
loongarch
.https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN
All multi-thread tests currently failed with segfault. But it seems that the process quits on thread creation. I may investigate it later.