Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

インタラクティブノードからのqrsh -inherit #246

Open
ogawa opened this issue Jan 19, 2021 · 3 comments
Open

インタラクティブノードからのqrsh -inherit #246

ogawa opened this issue Jan 19, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@ogawa
Copy link
Collaborator

ogawa commented Jan 19, 2021

1.バッチジョブの投入
[username@es1 ~]$ qsub -g grpname -l rt_F=2 run.sh
Your job 1000000 ("run.sh") has been submitted

2.ジョブに割り当てられた計算ノードの確認
[username@es1 ~]$ qstat -j 1000000
(snip)
exec_host_list 1: g0001:80, g0002:80
(snip)

3.環境変数の設定
[username@es1 ~]$ export JOB_ID=1000000
[username@es1 ~]$ export SGE_TASK_ID=undefined

4.計算ノードでnvidia-smiの実行
[username@es1 ~]$ qrsh -inherit g0001 nvidia-smi
Wed Oct 21 16:01:12 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:3D:00.0 Off | 0 |
| N/A 31C P0 38W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:3E:00.0 Off | 0 |
| N/A 29C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:B1:00.0 Off | 0 |
| N/A 30C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:B2:00.0 Off | 0 |
| N/A 32C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[username@es1 ~]$

@ogawa ogawa added the enhancement New feature or request label Jan 19, 2021
@ogawa ogawa added this to the release-202102 milestone Jan 19, 2021
@ogawa
Copy link
Collaborator Author

ogawa commented Jan 19, 2021

以下のようにした方が環境を汚さない。

$ JOB_ID=1000000 SGE_TASK_ID=undefined qrsh -inherit g0001 nvidia-smi

USE_SSHと比較すると、rt_G.small, rt_G.largeなどでも使えることを説明。

@ogawa
Copy link
Collaborator Author

ogawa commented Jan 27, 2021

ジョブを実行中にキーボード^Cでkillすると、元のジョブが中断されるような現象を観測。

@ogawa
Copy link
Collaborator Author

ogawa commented Jan 28, 2021

ジョブを実行中にキーボード^Cでkillすると、元のジョブが中断されるような現象を観測。

これはUGEの不調と関係していた模様。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants