Add reset command from previous branch #724

aquan9 · 2023-09-19T17:39:31Z

Make a beeflow reset command with warning message. The command just finds and removes the .beeflow directory.

This should hopefully resolve #708

This is a continuation of PR #712

pagrubel

See beeflow/wf_manager/resources/wf_utils.py
You can use get_bee_workdir to find the path

beeflow/client/core.py

docs/sphinx/commands.rst

pagrubel · 2023-09-26T22:14:36Z

@aquan9 As I was reviewing I found some minor changes where .beeflow was still used and will commit them. However, I'm still testing. I believe I found an error if someone has a workflow running. I'll post soon.

pagrubel · 2023-09-27T19:29:29Z

This is an error that occured if a reset was done while workflows were still running. I'm thinking we should check for running workflows using beeflow list and advise the user to either let them finish or cancel them via beeflow cancel <wf_id>

pagrubel · 2023-09-27T19:30:38Z

oops forgot to post the error:

Waiting for components to cleanly stop.
Traceback (most recent call last):

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/bin/beeflow", line 6, in <module>
    sys.exit(main())

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/bee_client.py", line 554, in main
    app()

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 289, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 280, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1157, in __call__

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1078, in main

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1688, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 1434, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/click/core.py", line 783, in invoke

  File "/vast/home/pagrubel/.cache/pypoetry/virtualenvs/hpc-beeflow-YDRVf3zF-py3.9/lib/python3.9/site-packages/typer/main.py", line 607, in wrapper

  File "/vast/home/pagrubel/BEE/BEE/beeflow/client/core.py", line 428, in reset
    shutil.rmtree(directory_to_delete)

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 732, in rmtree
    _rmtree_safe_fd(fd, path, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 665, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 665, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 665, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 671, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())

  File "/projects/opt/centos8/x86_64/miniconda3/py39_4.12.0/lib/python3.9/shutil.py", line 669, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)

OSError: [Errno 39] Directory not empty: 'x86_64-linux-gnu'

pagrubel · 2023-09-27T19:54:55Z

So if I had a workflow running when I did the beeflow core reset it left a neo4j process running:
ps aux |grep pagrubel |grep -v grep| grep -E 'bee|slurmrest|neo4j' pagrubel 3228289 6.9 1.0 46490656 2892124 ? Sl 13:41 0:29 /usr/local/openjdk-8/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+TrustFinalNonStaticFields -XX:+DisableExplicitGC -Djdk.tls.ephemeralDHKeySize=2048 -Djdk.tls.rejectClientInitiatedRenegotiation=true -Dunsupported.dbms.udc.source=tarball -Dfile.encoding=UTF-8 org.neo4j.server.CommunityEntryPoint --home-dir=/var/lib/neo4j --config-dir=/var/lib/neo4j/conf

And more if there was more than one workflow running. I think we should check for running workflows and inform the user that they will be cancelled if they continue with the reset, then we will need to kill the GDB instances for that user.

aquan9 · 2023-10-03T16:54:31Z

I'm wondering if the changes to fix this need to happen at the level of "quit" call. Because as it stands, the "beeflow stop" command should also have the same problem.

Both beeflow stop, and beeflow reset are calling:

resp = cli_connection.send(paths.beeflow_socket(), {'type': 'quit'})

pagrubel · 2023-10-10T17:38:46Z

Discussion during Oct 10 meeting:

neo4j orphaned processes have a file on ~/.beeflow so beeflow core stop works but beeflow core reset fails since it deletes the ~/.beeflow
~/.beeflow/worflows/<wf_id> is bind mounted into neo4j in /tmp so as long as an instance is running ~/.beeflow can't be deleted

The pid for each neo4j instance is in the wf_manager database so we could kill those.

We also need to evaluate beeflow cancel <wf_id> which leaves orphaned neo4j instances around

We still need to look at using a different database system, but fix this now.

For now should we search for any running workflows and if there are print a message telling the user they need to either wait or cancel the workflows.

pagrubel · 2023-10-25T00:02:01Z

1.) I get this error if -a is used and <bee_workdir>.backup already exists:
error.txt
If the -a --archive flag is set, check for the file, before doing anything else and give a warning and exit.

2.) Maybe we should only be archiving the archives directory and the logs. I get this error when I try to archive (when the above doesn't apply). I think it has to do with some of the active sockets and processes. I'm thinking we should only copy <bee_workdir>/archives and logs, maybe the db files. Would that help?
error-archive.txt

If I don't care to keep anything everything works fine.

pagrubel · 2023-10-26T14:19:34Z

@aquan9 I think if you will just copy the logs and archives the -a option will work. You may want to query if they want to copy the container_archive directory if it exists, since the user can change that to another location in the configuration file and the files can be quite large.

…eflow reset

pagrubel · 2023-10-31T00:00:05Z

@jtronge Since I made the last changes would you please review them

jtronge · 2023-10-31T14:59:27Z

This seems to work for me. If I tried to submit a workflow with the --no-start option, then I ended up with the OSError: [Errno 39] Directory not empty: 'x86_64-linux-gnu' error on calling reset, but maybe this is expected for that case.

Add reset command from previous branch

e42ace0

aquan9 added the WIP Work in progress label Sep 19, 2023

aquan9 mentioned this pull request Sep 19, 2023

Reset beeflow #712

Closed

aquan9 added 8 commits September 19, 2023 11:56

Fix pylama issues

9d88077

More pylama fixes

b0d508d

Fix typo

d41cc86

Move beeflow reset to core

e12fbbd

beeflow core reset will shutdown beeflow before deletion

6e91053

pylama fixes

aeeca04

Add documentation about the beeflow core reset command

f326629

Document the --archive option

6183aec

aquan9 removed the WIP Work in progress label Sep 19, 2023

aquan9 requested a review from pagrubel September 19, 2023 21:29

pagrubel requested changes Sep 21, 2023

View reviewed changes

beeflow/client/core.py Outdated Show resolved Hide resolved

Use get_bee_workdir to find path for reset

bb83d0b

pagrubel reviewed Sep 26, 2023

View reviewed changes

Change .beeflow to bee_workdir in comments, and help text

3318656

aquan9 and others added 2 commits October 17, 2023 10:11

Add the same warning about running workflow as beeflow stop

66a7df3

Merge branch 'develop' into reset-beeflow2

c3c60a3

pagrubel added 2 commits October 30, 2023 16:47

Only backup logs, containers and workflows if --archive is set for be…

56720d7

…eflow reset

Satisfy pylama complaints

ca308fe

pagrubel self-requested a review October 30, 2023 23:57

pagrubel requested a review from jtronge October 30, 2023 23:58

pagrubel approved these changes Oct 30, 2023

View reviewed changes

jtronge approved these changes Oct 31, 2023

View reviewed changes

pagrubel merged commit 1bd5980 into develop Oct 31, 2023
4 checks passed

pagrubel deleted the reset-beeflow2 branch October 31, 2023 16:09

pagrubel mentioned this pull request Nov 28, 2023

Add beeflow clean command #581

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reset command from previous branch #724

Add reset command from previous branch #724

aquan9 commented Sep 19, 2023

pagrubel left a comment

pagrubel commented Sep 26, 2023

pagrubel commented Sep 27, 2023

pagrubel commented Sep 27, 2023

pagrubel commented Sep 27, 2023 •

edited

Loading

aquan9 commented Oct 3, 2023

pagrubel commented Oct 10, 2023

pagrubel commented Oct 25, 2023

pagrubel commented Oct 26, 2023

pagrubel commented Oct 31, 2023

jtronge commented Oct 31, 2023

Add reset command from previous branch #724

Add reset command from previous branch #724

Conversation

aquan9 commented Sep 19, 2023

pagrubel left a comment

Choose a reason for hiding this comment

pagrubel commented Sep 26, 2023

pagrubel commented Sep 27, 2023

pagrubel commented Sep 27, 2023

pagrubel commented Sep 27, 2023 • edited Loading

aquan9 commented Oct 3, 2023

pagrubel commented Oct 10, 2023

pagrubel commented Oct 25, 2023

pagrubel commented Oct 26, 2023

pagrubel commented Oct 31, 2023

jtronge commented Oct 31, 2023

pagrubel commented Sep 27, 2023 •

edited

Loading