Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel Granite Rapids (Xeon 6980P 128-Core / 256-Thread) - HYDU_create_process (Too many open files) #45

Open
CraftComputing opened this issue Nov 1, 2024 · 12 comments

Comments

@CraftComputing
Copy link

CraftComputing commented Nov 1, 2024

Attempting to run Top500 Playbook on a 2P Granite Rapids server. Single server test.

  • 2x Intel Xeon 6980P 128-Core / 256-Thread

Benchmark errors out with "HYDU_create_process (lib/utils/launch.c:24): pipe error (Too many open files)"

I have attempted to workaround the issue by allowing more active processes to run on the host (ulimit -n 4096). Sometimes that will result this same error... other times, the script will hang at "TASK [Run the benchmark]", but never progress.

TASK [Run the benchmark.] *****************************************************************
fatal: [127.0.0.1]: FAILED! => changed=true
cmd:
- mpirun
- -f
- cluster-hosts
- ./xhpl
delta: '0:00:00.083887'
end: '2024-11-01 14:19:52.600072'
msg: non-zero return code
rc: 255
start: '2024-11-01 14:19:52.516185'
stderr: |-
[proxy:0@craft-6900P] HYDU_create_process (lib/utils/launch.c:24): pipe error (Too many open files)
[proxy:0@craft-6900P] launch_procs (proxy/pmip_cb.c:1003): create process returned error
[proxy:0@craft-6900P] handle_launch_procs (proxy/pmip_cb.c:588): launch_procs returned error
[proxy:0@craft-6900P] HYD_pmcd_pmip_control_cmd_cb (proxy/pmip_cb.c:498): launch_procs returned error
[proxy:0@craft-6900P] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[proxy:0@craft-6900P] main (proxy/pmip.c:122): demux engine error waiting for event
[mpiexec@craft-6900P] control_cb (mpiexec/pmiserv_cb.c:280): assert (!closed) failed
[mpiexec@craft-6900P] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@craft-6900P] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:173): error waiting for event
[mpiexec@craft-6900P] main (mpiexec/mpiexec.c:260): process manager error waiting for completion
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>

NO MORE HOSTS LEFT ************************************************************************

PLAY RECAP ********************************************************************************
127.0.0.1 : ok=22 changed=4 unreachable=0 failed=1 skipped=7 rescued=0 ignored=0
@geerlingguy
Copy link
Owner

This sounds a lot like an MPI hostname issue — for your hosts.ini file, do you have it set like the example?

# For single node benchmarking (default), use this:
[cluster]
127.0.0.1 ansible_connection=local

And when you run the benchmark, did you just run ansible-playbook main.yml --tags "setup,benchmark"? If you don't have the tags, it might've tried to set up clustering... which could make things act a little funny!

@geerlingguy
Copy link
Owner

geerlingguy commented Nov 1, 2024

Worst case, though, you can nuke the build folder (rm -rf /opt/top500) and try again.

Usually if it hits run the benchmark and nothing is happening, it means MPI is trying to fire off the process, but for some reason is not able to.

Oh one more thing, just for context — is it running on Ubuntu or some other distro?

@CraftComputing
Copy link
Author

Yes, I am running ansible-playbook main.yml --tags "setup,benchmark"

Hosts file is setup for local [127.0.0.1] as well.

I've tried Ps + Qs configuration as default [1/4], also [1/256] and [2/128] with no effect.

I nuked the /opt/top500 folder and ran again. Same result.

@geerlingguy geerlingguy changed the title Granite Rapids - HYDU_create_process (Too many open files) Intel Granite Rapids (Xeon 6980P 128-Core / 256-Thread) - HYDU_create_process (Too many open files) Nov 2, 2024
@geerlingguy
Copy link
Owner

geerlingguy commented Nov 2, 2024

@CraftComputing - Can you try running the benchmark manually?

cd /opt/top500/tmp/hpl-2.3/bin/top500
mpirun -f cluster-hosts ./xhpl

I wonder if there's some output Ansible's eating up that may be helpful debugging this. Also for completeness, can you post the contents of the cluster-hosts file in that directory, as well as the HPL.dat (the latter just for reference, for your system the tuning might be better with a more even Ps/Qs).

cat /opt/top500/tmp/hpl-2.3/bin/top500/cluster-hosts
cat /opt/top500/tmp/hpl-2.3/bin/top500/HPL.dat

@CraftComputing
Copy link
Author

CraftComputing commented Nov 2, 2024

hosts.ini:

# For single node benchmarking (default), use this:
[cluster]
127.0.0.1 ansible_connection=local

mpirun -f cluster-hosts ./xhpl:

[mpiexec@craft-6900P] HYDU_parse_hostfile (lib/utils/args.c:324): unable to open host file: cluster-hosts
[mpiexec@craft-6900P] mfile_fn (mpiexec/options.c:315): error parsing hostfile
[mpiexec@craft-6900P] match_arg (lib/utils/args.c:159): match handler returned error
[mpiexec@craft-6900P] HYDU_parse_array (lib/utils/args.c:181): argument matching returned error
[mpiexec@craft-6900P] parse_args (mpiexec/get_parameters.c:313): error parsing input array
[mpiexec@craft-6900P] HYD_uii_mpx_get_parameters (mpiexec/get_parameters.c:48): unable to parse user arguments
[mpiexec@craft-6900P] main (mpiexec/mpiexec.c:54): error parsing parameters

cat /opt/top500/tmp/hpl-2.3/bin/top500/cluster-hosts

10.0.0.179:512

cat /opt/top500/tmp/hpl-2.3/bin/top500/HPL.dat:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
350963         Ns
1            # of NBs
256           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
256            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

@geerlingguy
Copy link
Owner

@CraftComputing - Thanks! I'm going to boot my Ampere machine and double check a couple things. I think it may be what's in the cluster-hosts, the fun thing is this could be related to DNS too ;)

Can you check the contents of your /etc/hosts file on that machine too? I'm going to compare cluster-hosts and my /etc/hosts on a known working machine. Something may have gotten lost in translation.

@geerlingguy
Copy link
Owner

For comparison, my files:

# Inside /opt/top500/tmp/hpl-2.3/bin/top500/cluster-hosts
10.0.2.21:192

# Inside /etc/hosts
127.0.0.1 localhost
127.0.1.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

And I can confirm I can ping my system's mDNS name OR local IP and get a result:

ubuntu@ubuntu:~$ ping 10.0.2.21
PING 10.0.2.21 (10.0.2.21) 56(84) bytes of data.
64 bytes from 10.0.2.21: icmp_seq=1 ttl=64 time=0.042 ms
64 bytes from 10.0.2.21: icmp_seq=2 ttl=64 time=0.009 ms
^C
--- 10.0.2.21 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.009/0.025/0.042/0.016 ms

ubuntu@ubuntu:~$ ping ubuntu
PING ubuntu (127.0.1.1) 56(84) bytes of data.
64 bytes from ubuntu (127.0.1.1): icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from ubuntu (127.0.1.1): icmp_seq=2 ttl=64 time=0.005 ms
^C
--- ubuntu ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1037ms
rtt min/avg/max/mdev = 0.005/0.026/0.047/0.021 ms

Can you confirm the same on your system? I wonder if you might have a network setup that is causing mpich to be angry :(

@CraftComputing
Copy link
Author

Hosts file... 10.0.0.179 is my local IP address

127.0.0.1 localhost
127.0.1.1 craft-6900P

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# BEGIN Ansible MPI host 127.0.0.1
10.0.0.179 127.0.0.1 127.0.0.1
# END Ansible MPI host 127.0.0.1

I can ping both the local IP 10.0.0.179 and mDNS record craft-6900p from the localhost. The network is a flat LAN, 10.0.0.0/24.

@geerlingguy
Copy link
Owner

Another idea, since you have Hyperthreading... can you modify /opt/top500/tmp/hpl-2.3/bin/top500/cluster-hosts and change the 512 to 256 and try again running the manual command?

@CraftComputing
Copy link
Author

Changed 512 to 256, and still hanging at the same spot.

TASK [Run the benchmark.] ******************************************************************************************************************************************************************************************
task path: /home/craft/top500-benchmark/main.yml:214
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: craft
<127.0.0.1> EXEC /bin/sh -c 'echo ~craft && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/craft/.ansible/tmp `"&& mkdir "` echo /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751 `" && echo ansible-tmp-1730516294.043043-9707-136336910001751="` echo /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751 `" ) && sleep 0'
Using module file /usr/lib/python3/dist-packages/ansible/modules/command.py
<127.0.0.1> PUT /home/craft/.ansible/tmp/ansible-local-8737msev_9aq/tmpi3woq2tr TO /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751/AnsiballZ_command.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751/ /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751/AnsiballZ_command.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python3 /home/craft/.ansible/tmp/ansible-tmp-1730516294.043043-9707-136336910001751/AnsiballZ_command.py && sleep 0'

@geerlingguy
Copy link
Owner

It sounds like my script needs a little updating, specifically the hostvars[host].ansible_processor_vcpus is incorporating threads, not cores, resulting in the errant 512 parameter in your cluster-hosts file.

{{ hostvars[host].ansible_default_ipv4.address }}:{{ hostvars[host].ansible_processor_vcpus }}

I'll look at a better option for that. (Or maybe have it switch depending on architecture?)

Separately, since you mentioned (separately) that switching the count from 512 to 256 got HPL running... once you get a full run in, we might be able to tweak the blis build for your particular architecture better, depending on if there's a better config (see all the blis configs).

On the AmpereOne, tweaking that made almost a 40% improvement, but it's architecture is vastly different than the generic arm64 that the automatic configuration picks out.

@geerlingguy
Copy link
Owner

Opened a follow-up issue: #46

For now, we can just act like Ansible doesn't exist anymore and run the command manually :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants