Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for sparse GID tables (not starting at one) #1587

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jlamanna
Copy link

This fix is for supporting sparse GID tables. Especially in Kubernetes environments, a pod can end up with a GID table that does not start at "1". This arises when there is another pod on the host that has assigned an IP address to the NIC (usually when they are exposed through MACVLAN to multiple pods).

An example:

$ show_gids
DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_0  1       6       0000:0000:0000:0000:0000:ffff:0b00:0401 11.0.4.1        v1      net1
mlx5_0  1       7       0000:0000:0000:0000:0000:ffff:0b00:0401 11.0.4.1        v2      net1
mlx5_0  1       8       fe80:0000:0000:0000:e4ea:71ff:feb9:4970                 v1      net1
mlx5_0  1       9       fe80:0000:0000:0000:e4ea:71ff:feb9:4970                 v2      net1

This patch fixes the existing code which has two problems for dealing with this:

  1. The GID index loop can terminate prematurely because of an error
    being thrown in searching for the GID by testing an invalid GID
  2. A non-valid GID could be selected (by just comparing address
    families), and then on subsequent tests, the ROCE call would fail,
    resulting in a valid GID being skipped.

This fix is for supporting sparse GID tables. Especially in Kubernetes
environments, a pod can end up with a GID table that does not start at
"1". This arises when there is another pod on the host that has assigned
an IP address to the NIC (usually when they are exposed through MACVLAN
to multiple pods).

An example:

```
$ show_gids
DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_0  1       6       0000:0000:0000:0000:0000:ffff:0b00:0401 11.0.4.1        v1      net1
mlx5_0  1       7       0000:0000:0000:0000:0000:ffff:0b00:0401 11.0.4.1        v2      net1
mlx5_0  1       8       fe80:0000:0000:0000:e4ea:71ff:feb9:4970                 v1      net1
mlx5_0  1       9       fe80:0000:0000:0000:e4ea:71ff:feb9:4970                 v2      net1
```

This patch fixes the existing code which has two problems for dealing
with this:

1) The GID index loop can terminate prematurely because of an error
   being thrown in searching for the GID by testing an invalid GID
2) A non-valid GID could be selected (by just comparing address
   families), and then on subsequent tests, the ROCE call would fail,
   resulting in a valid GID being skipped.
@gcongiu
Copy link
Contributor

gcongiu commented Jan 22, 2025

Hi @jlamanna, this was already reported here: #1573 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants