Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(uv): parse the dist-manifest.json to not hardcode sha256 in rules_python #2578

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ Unreleased changes template.
* (rules) deprecation warnings for deprecated symbols have been turned off by
default for now and can be enabled with `RULES_PYTHON_DEPRECATION_WARNINGS`
env var.
* (uv) Now the extension can be fully configured via `bzlmod` APIs without the
need to patch `rules_python`. The documentation has been added to `rules_python`
docs but usage of the extension may result in your setup breaking without any
notice. What is more, the URLs and SHA256 values will be retrieved from the
GitHub releases page metadata published by the `uv` project.

{#v0-0-0-fixed}
### Fixed
Expand Down
74 changes: 70 additions & 4 deletions MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -173,14 +173,80 @@ use_repo(
"build_bazel_bazel_self",
)

# EXPERIMENTAL: This is experimental and may be removed without notice
uv = use_extension(
# TODO @aignas 2025-01-27: should this be moved to `//python/extensions:uv.bzl` or should
# it stay as it is? I think I may prefer to move it.
uv = use_extension("//python/uv:uv.bzl", "uv")

# Here is how we can define platforms for the `uv` binaries - this will affect
# all of the downstream callers because we are using the extension without
# `dev_dependency = True`.
uv.platform(
name = "aarch64-apple-darwin",
compatible_with = [
"@platforms//os:macos",
"@platforms//cpu:aarch64",
],
)
uv.platform(
name = "aarch64-unknown-linux-gnu",
compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:aarch64",
],
)
uv.platform(
name = "powerpc64-unknown-linux-gnu",
compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:ppc",
],
)
uv.platform(
name = "powerpc64le-unknown-linux-gnu",
compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:ppc64le",
],
)
uv.platform(
name = "s390x-unknown-linux-gnu",
compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:s390x",
],
)
uv.platform(
name = "x86_64-apple-darwin",
compatible_with = [
"@platforms//os:macos",
"@platforms//cpu:x86_64",
],
)
uv.platform(
name = "x86_64-pc-windows-msvc",
compatible_with = [
"@platforms//os:windows",
"@platforms//cpu:x86_64",
],
)
uv.platform(
name = "x86_64-unknown-linux-gnu",
compatible_with = [
"@platforms//os:linux",
"@platforms//cpu:x86_64",
],
)

uv_dev = use_extension(
"//python/uv:uv.bzl",
"uv",
dev_dependency = True,
)
uv.toolchain(uv_version = "0.4.25")
use_repo(uv, "uv_toolchains")
uv_dev.toolchain(
name = "uv_toolchains",
version = "0.5.24",
)
use_repo(uv_dev, "uv_toolchains")

register_toolchains(
"@uv_toolchains//:all",
Expand Down
16 changes: 13 additions & 3 deletions examples/bzlmod/MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,21 @@ python.single_version_platform_override(
use_repo(python, "python_3_10", "python_3_9", "python_versions", "pythons_hub")

# EXPERIMENTAL: This is experimental and may be removed without notice
uv = use_extension("@rules_python//python/uv:uv.bzl", "uv")
uv.toolchain(uv_version = "0.4.25")
uv = use_extension(
"@rules_python//python/uv:uv.bzl",
"uv",
dev_dependency = True,
)
uv.toolchain(
name = "uv_toolchains",
version = "0.5.24",
)
use_repo(uv, "uv_toolchains")

register_toolchains("@uv_toolchains//:all")
register_toolchains(
"@uv_toolchains//:all",
dev_dependency = True,
)

# This extension allows a user to create modifications to how rules_python
# creates different wheel repositories. Different attributes allow the user
Expand Down
7 changes: 0 additions & 7 deletions python/uv/private/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ bzl_library(
deps = [
":toolchain_types_bzl",
":uv_toolchains_repo_bzl",
":versions_bzl",
],
)

Expand All @@ -82,9 +81,3 @@ bzl_library(
"//python/private:text_util_bzl",
],
)

bzl_library(
name = "versions_bzl",
srcs = ["versions.bzl"],
visibility = ["//python/uv:__subpackages__"],
)
8 changes: 5 additions & 3 deletions python/uv/private/lock.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,11 @@ def lock(*, name, srcs, out, upgrade = False, universal = True, args = [], **kwa
"""Pin the requirements based on the src files.

Differences with the current {obj}`compile_pip_requirements` rule:
- This is implemented in shell and uv.
- This is implemented in shell and `uv`.
- This does not error out if the output file does not exist yet.
- Supports transitions out of the box.
- The execution of the lock file generation is happening inside of a build
action in a `genrule`.

Args:
name: The name of the target to run for updating the requirements.
Expand All @@ -41,8 +43,8 @@ def lock(*, name, srcs, out, upgrade = False, universal = True, args = [], **kwa
upgrade: Tell `uv` to always upgrade the dependencies instead of
keeping them as they are.
universal: Tell `uv` to generate a universal lock file.
args: Extra args to pass to `uv`.
**kwargs: Extra kwargs passed to the {obj}`py_binary` rule.
args: Extra args to pass to the rule.
**kwargs: Extra kwargs passed to the binary rule.
"""
pkg = native.package_name()
update_target = name + ".update"
Expand Down
169 changes: 157 additions & 12 deletions python/uv/private/uv.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -22,32 +22,177 @@ load(":uv_repositories.bzl", "uv_repositories")

_DOC = """\
A module extension for working with uv.

Use it in your own setup by:
```starlark
uv = use_extension(
"@rules_python//python/uv:uv.bzl",
"uv",
dev_dependency = True,
)
uv.toolchain(
name = "uv_toolchains",
version = "0.5.24",
)
use_repo(uv, "uv_toolchains")

register_toolchains(
"@uv_toolchains//:all",
dev_dependency = True,
)
```

Since this is only for locking the requirements files, it should be always
marked as a `dev_dependency`.
"""

_DIST_MANIFEST_JSON = "dist-manifest.json"
_DEFAULT_BASE_URL = "https://github.com/astral-sh/uv/releases/download"

config = tag_class(
doc = "Configure where the binaries are going to be downloaded from.",
attrs = {
"base_url": attr.string(
doc = "Base URL to download metadata about the binaries and the binaries themselves.",
default = _DEFAULT_BASE_URL,
),
},
)

platform = tag_class(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that isn't clear from the doc here is how the word "for" is operating.

Does this define the platforms uv will run on, or the platform that uv will generate a lockfile for? I'm assuming the former.

doc = "Configure the available platforms for lock file generation.",
attrs = {
"compatible_with": attr.label_list(
doc = "The compatible with constraint values for toolchain resolution",
),
"name": attr.string(
doc = "The platform string used in the UV repository to denote the platform triple.",
mandatory = True,
),
},
)

uv_toolchain = tag_class(
doc = "Configure uv toolchain for lock file generation.",
attrs = {
"uv_version": attr.string(doc = "Explicit version of uv.", mandatory = True),
"name": attr.string(
doc = "The name of the toolchain repo",
default = "uv_toolchains",
),
"version": attr.string(
doc = "Explicit version of uv.",
mandatory = True,
),
},
)

def _uv_toolchain_extension(module_ctx):
config = {
"platforms": {},
}

for mod in module_ctx.modules:
if not mod.is_root and not mod.name == "rules_python":
# Only rules_python and the root module can configure this.
#
# Ignore any attempts to configure the `uv` toolchain elsewhere
#
# Only the root module may configure the uv toolchain.
# This prevents conflicting registrations with any other modules.
#
# NOTE: We may wish to enforce a policy where toolchain configuration is only allowed in the root module, or in rules_python. See https://github.com/bazelbuild/bazel/discussions/22024
continue

# Note, that the first registration will always win, giving priority to
# the root module.

for platform_attr in mod.tags.platform:
config["platforms"].setdefault(platform_attr.name, struct(
name = platform_attr.name.replace("-", "_").lower(),
compatible_with = platform_attr.compatible_with,
))

for config_attr in mod.tags.config:
config.setdefault("base_url", config_attr.base_url)

for toolchain in mod.tags.toolchain:
if not mod.is_root:
fail(
"Only the root module may configure the uv toolchain.",
"This prevents conflicting registrations with any other modules.",
"NOTE: We may wish to enforce a policy where toolchain configuration is only allowed in the root module, or in rules_python. See https://github.com/bazelbuild/bazel/discussions/22024",
)

uv_repositories(
uv_version = toolchain.uv_version,
register_toolchains = False,
config.setdefault("version", toolchain.version)
config.setdefault("name", toolchain.name)

if not config["version"]:
return

config.setdefault("base_url", _DEFAULT_BASE_URL)
config["urls"] = _get_tool_urls_from_dist_manifest(
module_ctx,
base_url = "{base_url}/{version}".format(**config),
)
uv_repositories(
name = config["name"],
platforms = config["platforms"],
urls = config["urls"],
version = config["version"],
)

def _get_tool_urls_from_dist_manifest(module_ctx, *, base_url):
"""Download the results about remote tool sources.

This relies on the tools using the cargo packaging to infer the actual
sha256 values for each binary.
"""
dist_manifest = module_ctx.path(_DIST_MANIFEST_JSON)
module_ctx.download(base_url + "/" + _DIST_MANIFEST_JSON, output = dist_manifest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want a sha for the dist manifest download?

Using untrusted shas to verify a download is equivalent to not using shas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the sha256, I am downloading the manifest that only has links and I don't see how the extra sha256 value here could be beneficial.

This has been inspired by https://github.com/bazel-contrib/rules_go/blob/7f6a9bf5870f2b5ffbba1615658676dcabf9edc7/go/private/sdk.bzl#L84 where rules_go is downloading things from the manifest if the user does not specify anything.

Maybe we should instead an optional sha256s dict where we have platform labels as keys and the sha values as values? And we could print a warning with buildozer command to add the said sha256s values to their MODULE.bazel file if they want things to be deterministic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see your point now. Hm.

On the one hand, not using shas means, if someone MITM the manifest, then they can control what actual uv stuff is downloaded and then used.

On the other hand, using a sha means we are tied to a particular manifest and don't benefit from the automatic-ness of using the manifest. Which is pretty appealing behavior.

Hm, not sure how, or if, we can split the difference. These seem at odds.

Also:

  • Does the sha arg of download() allow bazel to better cache it?
  • Does the sha arg feed into MODULE.lock? I would expect so, since this is module-phase behavior.
  • Without the sha arg set, does Bazel print a warning? (http_archive does, not sure if download does)

dist_manifest = json.decode(module_ctx.read(dist_manifest))

artifacts = dist_manifest["artifacts"]
tool_sources = {}
downloads = {}
for fname, artifact in artifacts.items():
if artifact.get("kind") != "executable-zip":
continue

checksum = artifacts[artifact["checksum"]]
checksum_fname = checksum["name"]
checksum_path = module_ctx.path(checksum_fname)
downloads[checksum_path] = struct(
download = module_ctx.download(
"{}/{}".format(base_url, checksum_fname),
output = checksum_path,
block = False,
),
archive_fname = fname,
platforms = checksum["target_triples"],
)

for checksum_path, download in downloads.items():
result = download.download.wait()
if not result.success:
fail(result)

archive_fname = download.archive_fname

sha256, _, checksummed_fname = module_ctx.read(checksum_path).partition(" ")
checksummed_fname = checksummed_fname.strip(" *\n")
if archive_fname != checksummed_fname:
fail("The checksum is for a different file, expected '{}' but got '{}'".format(
archive_fname,
checksummed_fname,
))

for platform in download.platforms:
tool_sources[platform] = struct(
urls = ["{}/{}".format(base_url, archive_fname)],
sha256 = sha256,
)

return tool_sources

uv = module_extension(
doc = _DOC,
implementation = _uv_toolchain_extension,
tag_classes = {"toolchain": uv_toolchain},
tag_classes = {
"config": config,
"platform": platform,
"toolchain": uv_toolchain,
},
)
Loading