Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bphilip/add mofed and dependencies #11479

Open
wants to merge 24 commits into
base: 3.0-dev
Choose a base branch
from

Conversation

binujp
Copy link

@binujp binujp commented Dec 17, 2024

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./LICENSES-AND-NOTICES/SPECS/data/licenses.json, ./LICENSES-AND-NOTICES/SPECS/LICENSES-MAP.md, ./LICENSES-AND-NOTICES/SPECS/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

What does the PR accomplish, why was it needed?

This PR brings in MOFED and dependent packages required to build our AI/ML distribution story. Immediately HPC images require infiniband for GPU-clustering. MOFED is thus a core requirement. There are first and third party requirments for MOFED drivers directly and dependent packages.

CUDA and GdrCopy drivers need MOFED as a build-time and runtime requirements. This is not satisfied today and CUDA driver is built without MOFED support. These packages getting integrated into core will enable that work stream to be unblocked as well.

Change Log

There are 18 new packages being added by this change to the SPECS directory. There are 8 new SIGNED-SPECS being added too. They are either directly required to build out MOFED and support infrastructure or are tools required to provide a complete infiniband solution like tools to configure the hw/driver and debug issues.

fwctl
ibarr
ibsim
iser
isert
knem
mft_kernel
mlnx-ethtool
mlnx-iproute2
mlnx-nfsrdma
mlnx-ofa_kernel
mlx-steering-dump
multiperf rshim
sockperf
srp
xpmem-lib
xpmem

Does this affect the toolchain?

No

YES/NO

Associated issues
  • #xxxx
Links to CVEs
Test Methodology

The RPMs from a local build of these specs were installed on A100, V100 and T4 GPU based VMs and functionality verified.

Binu Jose Philip added 2 commits December 15, 2024 20:31
Infiniband is a requirement for AI/ML deployments for inter-GPU communication and scale out GPU clusters. HPC team built HPC azure linux image needs infiniband mofed drivers. Their major customer singularity is on a deadline to move from Ubuntu to azure linux for AI/ML workloads. All of the required sources are open source and have spec's which are already being used by NVIDIA. This PR brings to Azure Linux MOFED driver for infiniband driver and all dependencies to help use, manage and debug the stack. These modules have already been built, integrated into an HPC image and tested on an existing 16 mode cluster owned by HPC team. Performance characteristics are within specified tolerance limits.
@microsoft-github-policy-service microsoft-github-policy-service bot added Tools 3.0-dev PRs Destined for AzureLinux 3.0 labels Dec 30, 2024
@binujp binujp marked this pull request as ready for review January 13, 2025 18:50
@binujp binujp requested a review from a team as a code owner January 13, 2025 18:50
SPECS/ibsim/ibsim.spec Outdated Show resolved Hide resolved
Comment on lines +2 to +24
# Copyright (c) 2014 Mellanox Technologies. All rights reserved.
#
# This Software is licensed under one of the following licenses:
#
# 1) under the terms of the "Common Public License 1.0" a copy of which is
# available from the Open Source Initiative, see
# http://www.opensource.org/licenses/cpl.php.
#
# 2) under the terms of the "The BSD License" a copy of which is
# available from the Open Source Initiative, see
# http://www.opensource.org/licenses/bsd-license.php.
#
# 3) under the terms of the "GNU General Public License (GPL) Version 2" a
# copy of which is available from the Open Source Initiative, see
# http://www.opensource.org/licenses/gpl-license.php.
#
# Licensee has the right to choose one of the above licenses.
#
# Redistributions of source code must retain the above copyright
# notice and one of the license notices.
#
# Redistributions in binary form must reproduce both the above copyright
# notice, one of the license notices in the documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to interpret this licensing comment and the source attributions (we mention both Nvidia and claim we've done this ourselves at the same time). I think we need more clarify on the licensing of the new specs.

Copy link
Author

@binujp binujp Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this line

Licensee has the right to choose one of the above licenses.

as the guidance. Does that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added license copyright files as needed for SPECS fwctl, iser, mofed kernel, srp etc.

@binujp binujp requested a review from romoh January 15, 2025 20:07
Comment on lines +33 to +35
%{!?_name: %define _name fwctl}
%{!?_version: %define _version 24.10}
%{!?_release: %define _release OFED.24.10.0.6.7.1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define these _version separately? Won't the %{version} work? Similar question for _name. And what's the logic behind _release? I see it trickles down to _kmp_rel for instance but I don't see _kmp_rel used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is defined from MOFED SPECS for alignment. Hence keeping it same for furture updates.

%endif
%global _kmp_rel %{_release1}%{?_kmp_build_num}%{?_dist}

Summary: %{_name} Driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the specs done, please also update the version_release_matching_groups inside toolkit/scripts/check_entangled_specs.py. This is part of the PR check, which makes sure "entangled" specs get updated in lockstep within one PR. In case of (regular, signed) spec pairs, if you only update the regular one and forget about this one here, we'd end up either publishing a newer version without the signed .ko files or not publishing the new version at all (I'd need to check the details).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign pipeline changes are coming from another PR. Is this can be done with signing pipeline changes for signed SPECs?

@@ -17,6 +17,7 @@
"Fedora (ISC)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from Fedora \d+ \(license: ISC\)(\.|\n|$)',
"Magnus Edenhill Open Source": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from Magnus Edenhill Open Source \(license: BSD\)(\.|\n|$)',
"NVIDIA": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(license: (ASL 2\.0|GPLv2)\)(\.|\n|$)',
"NVIDIA (BSD)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(BSD\) \(license: (BSD)\)(\.|\n|$)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for "CBL-Mariner"` in the new source attributions. The other ones are to support the old specs. And there's no need to repeat "BSD", I think.

Suggested change
"NVIDIA (BSD)": r'\n-\s+Initial (CBL-Mariner|Azure Linux) import from NVIDIA \(BSD\) \(license: (BSD)\)(\.|\n|$)',
"NVIDIA (BSD)": r'\n-\s+Initial Azure Linux import from NVIDIA \(license: (BSD)\)(\.|\n|$)',

A bigger question, perhaps: why add a new entry specially for BSD? I see the older one for Nvidia already covers two licenses. Is there a reason we don't want to just update that one?

Comment on lines +92 to +100
BuildRequires: gcc
BuildRequires: make
BuildRequires: kernel-devel = %{target_kernel_version_full}
BuildRequires: kernel-headers = %{target_kernel_version_full}
BuildRequires: binutils
BuildRequires: systemd
BuildRequires: kmod
BuildRequires: mlnx-ofa_kernel-devel = %{_version}
BuildRequires: mlnx-ofa_kernel-source = %{_version}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for all signed specs: they don't need most BuildRequires, since they only expand already built RPMs and replace a few files. The packages providing rpm2cpio and cpio are part of the default build chroot but to be on the safe side, I'd add them to BuildRequires.

Requires: kmod

%description
fwctl signed kernel modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the description, that the published package will have, so I'd suggest something, that gives users more information about the package.

Comment on lines +70 to +75
# This package's "version" and "release" must reflect the unsigned version that
# was signed.
# An important consequence is that when making a change to this package, the
# unsigned version/release must be increased to keep the two versions consistent.
# Ideally though, this spec will not change much or at all, so the version will
# just track the unsigned package's version/release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the comment - the specs entanglement PR check will handle that for you. Just make sure to update the set of spec pairs, which need to be kept in sync.

Source1: fwctl.ko
Source2: mlx5_fwctl.ko

BuildRoot: /var/tmp/%{name}-%{version}-build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to define BuildRoot explicitly here instead of using the default?

%postun
/sbin/depmod %{KVERSION}

%if "%{KMP}" != "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure KMP makes much sense in the signed spec, since its only purpose is to repackage the unsigned RPM. Also, with the way it's written right now, if KMP ever was equal to 1, we'd end up with an invalid spec file (no %files section).

@@ -0,0 +1,142 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a bunch of comments into this spec but they apply to most/all other signed spec files.

License: GPLv2
Url: http://nvidia.com
Group: System Environment/Base
Source0: https://linux.mellanox.com/public/repo/mlnx_ofed/24.10-0.7.0.0/SRPMS/fwctl-24.10.tgz#/%{_name}-%{_version}.tgz
Copy link
Contributor

@PawelWMS PawelWMS Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also use macros to parameterize the URL, so that it updates automatically with the macros. This makes it easier to miss necessary changes during updates.

Comment on lines +271 to +279
%if "%{KMP}" != "1"
%files -n %{non_kmp_pname}
/lib/modules/%{KVERSION}/%{install_mod_dir}/
%if %{IS_RHEL_VENDOR}
%if ! 0%{?fedora}
%config(noreplace) %{_sysconfdir}/depmod.d/%{_name}.conf
%endif
%endif
%endif
Copy link
Contributor

@PawelWMS PawelWMS Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing %license. Not strictly necessary when there's a signed spec counterpart to this one but consistency helps. Having said that, a file explaining the license IS necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0-dev PRs Destined for AzureLinux 3.0 Packaging Tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants