Attempt to remove AWS S3 flaky cache for sccache #2953

mfuntowicz · 2025-01-24T14:50:52Z

Leverage GHA cache rather than S3

…n rustc_wrapper is present

mfuntowicz · 2025-01-25T20:22:16Z

Dockerfile_trtllm

+RUN export CMAKE_C_COMPILER_LAUNCHER=sccache && \
+    export CMAKE_CXX_COMPILER_LAUNCHER=sccache && \
+    export CMAKE_CUDA_COMPILER_LAUNCHER=sccache && \
+    mkdir $TGI_INSTALL_PREFIX && mkdir "$TGI_INSTALL_PREFIX/include" && mkdir "$TGI_INSTALL_PREFIX/lib" && \


We don't care about these environment variables not being persisted

mfuntowicz · 2025-01-25T20:22:45Z

backends/trtllm/CMakeLists.txt

-if (NOT DEFINED CMAKE_CXX_COMPILER_LAUNCHER)
-    find_program(CCACHE_EXECUTABLE "ccache")
-    if (CCACHE_EXECUTABLE)
-        message(STATUS "Using ccache")
-        set(CMAKE_C_COMPILER_LAUNCHER "${CCACHE_EXECUTABLE}")
-        set(CMAKE_CXX_COMPILER_LAUNCHER "${CCACHE_EXECUTABLE}")
-        set(CMAKE_CUDA_COMPILER_LAUNCHER "${CCACHE_EXECUTABLE}")
-    endif ()
-else ()
-    message(STATUS "Using user specified cmake cxx compiler launcher: ${CMAKE_CXX_COMPILER_LAUNCHER}")
-    set(CMAKE_C_COMPILER_LAUNCHER "${CMAKE_CXX_COMPILER_LAUNCHER}")
-    set(CMAKE_CXX_COMPILER_LAUNCHER "${CMAKE_CXX_COMPILER_LAUNCHER}")
-    set(CMAKE_CUDA_COMPILER_LAUNCHER "${CMAKE_CXX_COMPILER_LAUNCHER}")
-endif ()
-


This is a purely manual thing to specify a compiler launcher, remove any way to look smart here :D

mfuntowicz · 2025-01-25T20:23:45Z

backends/trtllm/build.rs

@@ -14,7 +14,7 @@ const TENSORRT_ROOT_DIR: Option<&str> = option_env!("TENSORRT_ROOT_DIR");
 const NCCL_ROOT_DIR: Option<&str> = option_env!("NCCL_ROOT_DIR");

 const IS_GHA_BUILD: LazyLock<bool> = LazyLock::new(|| {
-    option_env!("IS_GHA_BUILD").map_or(false, |value| match value.to_lowercase().as_str() {
+    option_env!("SCCACHE_GHA_ENABLED").map_or(false, |value| match value.to_lowercase().as_str() {


We are setting this variable to ON to make sccache outputs to GHA caching layer.
So it should be true in GHA contexts and false otherwise

mfuntowicz · 2025-01-25T20:25:30Z

.github/workflows/build.yaml

+      - name: Inject required variables for sccache to interact with Github Actions Cache
+        uses: actions/github-script@v7
+        with:
+          script: |
+            core.exportVariable('ACTIONS_CACHE_URL', process.env.ACTIONS_CACHE_URL || '');
+            core.exportVariable('ACTIONS_RUNTIME_TOKEN', process.env.ACTIONS_RUNTIME_TOKEN || '');
+


This new step is required to expose the cache parameters inside the job so we can use in building step for TRTLLM forwarding to sccache

Narsil · 2025-01-27T10:15:02Z

.github/workflows/build.yaml

-          cache-from: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=min
-          cache-to: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=min
+          cache-from: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=max
+          cache-to: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=max


Letś try max. I think we reverted to min because it was uploading way too many layers, leading to actual degradation in runtime.

backend(trtllm): attempt to remove AWS S3 flaky cache for sccache

556a61d

mfuntowicz requested review from Narsil and Hugoch January 24, 2025 14:50

mfuntowicz added 6 commits January 24, 2025 17:47

backend(trtllm): what if we expose ENV instead of inline?

a8a9168

backend(trtllm): and with the right env var for gha sccache

cb452ae

backend(trtllm): relax the way to detect sccache

a434c2f

backend(trtllm): make sccache definition manually

e7064c9

backend(trtllm): ok let's try to define the launchers in build.rs whe…

cb1dab1

…n rustc_wrapper is present

backend(trtllm): export env variable in run mb?

cad4644

mfuntowicz commented Jan 25, 2025

View reviewed changes

mfuntowicz added 2 commits January 26, 2025 11:38

backend(trtllm): Cache mode max to cache intermediate layers

c632f8a

backend(trtllm): inject ompi_version build arg in dependent step

5a317ff

Narsil approved these changes Jan 27, 2025

View reviewed changes

Narsil merged commit 40b0027 into main Jan 27, 2025
16 of 17 checks passed

Narsil deleted the trtllm/gha_cache branch January 27, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to remove AWS S3 flaky cache for sccache #2953

Attempt to remove AWS S3 flaky cache for sccache #2953

mfuntowicz commented Jan 24, 2025

mfuntowicz Jan 25, 2025 •

edited

Loading

mfuntowicz Jan 25, 2025

mfuntowicz Jan 25, 2025

mfuntowicz Jan 25, 2025

Narsil Jan 27, 2025

Attempt to remove AWS S3 flaky cache for sccache #2953

Attempt to remove AWS S3 flaky cache for sccache #2953

Conversation

mfuntowicz commented Jan 24, 2025

mfuntowicz Jan 25, 2025 • edited Loading

Choose a reason for hiding this comment

mfuntowicz Jan 25, 2025

Choose a reason for hiding this comment

mfuntowicz Jan 25, 2025

Choose a reason for hiding this comment

mfuntowicz Jan 25, 2025

Choose a reason for hiding this comment

Narsil Jan 27, 2025

Choose a reason for hiding this comment

mfuntowicz Jan 25, 2025 •

edited

Loading