Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update picas examples and token commands #21

Merged
merged 32 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0a4f169
Update picas examples and token commands
xinan1911 Sep 18, 2024
6baea12
move Grid example to this page
xinan1911 Sep 18, 2024
d2563cf
add delete tokens instruction
xinan1911 Sep 23, 2024
e4ce4db
add deleteTokens.py
xinan1911 Sep 23, 2024
f09c006
Create token-commands.md
xinan1911 Oct 29, 2024
6479f04
Create quick-example.md
xinan1911 Oct 29, 2024
da30a73
Update quick-example.md
xinan1911 Oct 29, 2024
1920af1
Add picas-views.png
xinan1911 Oct 31, 2024
0dd5130
Add screenshot of views
xinan1911 Oct 31, 2024
de08846
Create fractal-example.md
xinan1911 Nov 13, 2024
3e929f7
Update fractal-example.md
xinan1911 Nov 13, 2024
7ed08e6
Update deleteTokens.py
xinan1911 Nov 13, 2024
0bef99e
Update README.md
xinan1911 Nov 13, 2024
371dd20
Update token-commands.md
xinan1911 Nov 13, 2024
cf748aa
Update fractal-example.md
xinan1911 Nov 13, 2024
6685fa7
Update fractal-example.md
xinan1911 Nov 13, 2024
43a26b4
Update fractal-example.md
xinan1911 Nov 13, 2024
6add1a2
Update README.md
xinan1911 Nov 13, 2024
d470d96
Merge branch 'master' into xinan1911-patch-1
hailihu Dec 3, 2024
6097047
Updated README to contain examples
hailihu Dec 3, 2024
f0d72e4
Clean up docs
hailihu Dec 4, 2024
9fb8135
Update picas-layer.png
hailihu Dec 4, 2024
53dab5c
cleanup
hailihu Dec 4, 2024
80aac95
Update Grid example
hailihu Dec 4, 2024
91b3425
Update README.md
hailihu Dec 4, 2024
346b9ae
Update README.md
hailihu Dec 5, 2024
f943058
Update local-example.py
hailihu Dec 5, 2024
c97373a
Update README.md
hailihu Dec 5, 2024
0ff92f4
Update README.md
hailihu Dec 6, 2024
faed7e4
Added examples/resetTokens.py
hailihu Dec 9, 2024
71140ca
Update README.md
hailihu Dec 9, 2024
f2cc1b0
Update local-example.py
hailihu Dec 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 175 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ cd picasclient
pip install -U .
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
```

If you come across error messages related to python, you can execute pip module for a specific python version using the corresponding python:
```
python3.9 -m pip install -U .
hailihu marked this conversation as resolved.
Show resolved Hide resolved
```

Testing
=======

Expand All @@ -33,30 +38,69 @@ Examples

## Setting up the examples
hailihu marked this conversation as resolved.
Show resolved Hide resolved

The examples directory contains examples to use the picasclient. There are examples for running locally (laptop, cluster login), slurm and the Grid (https://www.egi.eu/), and in principle the jobs can be sent to any machine that can run this client.
The examples directory contains examples to use the picasclient. There are examples for running locally (laptop, cluster login), to a slurm job scheduler and the Grid (https://www.egi.eu/), and in principle the jobs can be sent to any machine that can run this client.

To run the examples, first you need to have a CouchDB instance running that functions as the token broker that stores the tokens which the worker machines can approach to get work execute. To set up this CouchDB instance, see the [SURF documentation](https://doc.grid.surfsara.nl/en/latest/Pages/Practices/picas/picas_overview.html#picas-server-1), these examples assume you have an instance running and access to a DB on this instance.
To run the examples, first you need to have a CouchDB instance running that functions as the token broker that stores the tokens which the worker machines can approach to get work execute. To set up this CouchDB instance, see the [SURF documentation](https://doc.grid.surfsara.nl/en/latest/Pages/Practices/picas/picas_overview.html#picas-server-1), these examples assume you have an instance running and access to a DB on this instance. If you are following a workshop organized by SURF, this has already been arranged for you.

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
Once this server is running, you can run the PiCaS examples:
- Local
- Slurm
- Grid

To approach the DB, you have to fill in the `examples/picasconfig.py` with the information to log in to your CouchDB instance and the database you want use for storing the work tokens.

## Prepare the tokens


To approach the DB, you have to fill in the `examples/picasconfig.py` with the information to log in to your CouchDB instance and the database you want use for storing the work tokens. Specifically, the information needed are:
```
PICAS_HOST_URL="https://picas.surfsara.nl:6984"
PICAS_DATABASE=""
PICAS_USERNAME=""
PICAS_PASSWORD=""
```
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
### Create views
Once you can approach the server, you have to define "view" logic, so that you can easily view large numbers of tokens and filter on new, running and finished tokens. To create these views, run:

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
```
python createViews.py
```
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved

Next you have to send some tokens containing work to the CouchDB instance. You can send two types of work in this example. For very fast running jobs, send the `quickExample.txt` file with:
### Create tokens
This example includes a bash script `(./createTokens)` that generates a sensible parameter file, with each line representing a set of parameters that the fractals program can be called with. Without arguments it creates a fairly sensible set of 24 lines of parameters. You can generate different sets of parameters by calling the program with a combination of `-q`, `-d` and `-m` arguments, but at the moment no documentation exists on these. We recommend not to use them for the moment.
```
./createTokens
```
After you ran the `createTokens` script you’ll see output similar to the following:
```
/tmp/tmp.fZ33Kd8wXK
cat /tmp/tmp.fZ33Kd8wXK
```

### Upload tokens to the PiCaS server

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved

Next you have to send some tokens containing work to the CouchDB instance. You can send two types of work in this example. For very fast running jobs, send the `quickExample.txt` file with:
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
```
python pushTokens.py quickExample.txt
```

Now we are ready to run the examples!
For longer jobs example with a set of 24 lines of parameters. send the file generated in the create tokens step:
```
python pushTokens.py /tmp/tmp.fZ33Kd8wXK
```

### Reset tokens
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved

### Delete tokens

To delete all the Tokens in a certain view, you can use the `deteleTokens.py` under the `examples` directory. For example to delete all the tokens in todo view, run
```
python /path-to-script/deleteTokens.py Monitor/todo
```

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved


Now we are ready to run the examples! You can start with running a quick example on different systems. Or you can jump to "Running the long jobs" section for a more complex example.

## Running locally

Expand Down Expand Up @@ -109,35 +153,131 @@ Now in a slurm job array the work will be performed (you can set the number of a

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
## Running on Grid

xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
### Fractal example
In this fractal example we will implement the following pilot job workflow:

* First we define and generate the application tokens with all the necessary parameters.
* Then we define and create a shell script to process one task (*process_task.sh*) that will be sent with the job using the input sandbox. This contains some boiler plate code to e.g. setup the environment, download software or data from the Grid storage, run the application etc. This doesn’t have to be a shell script, however, setting up environment variables is easiest when using a shell script, and this way setup scripts are separated from the application code.
* We also define and create a Python script to handle all the communication with the token pool server, call the process_task,sh script, catch errors and do the reporting.
* Finally we define the :abbr:`JDL (Job Description Language)` on the User Interface machine to specify some general properties of our jobs. This is required to submit a batch of pilot jobs to the Grid that will in turn initiate the Python script as defined in the previous step.
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved


### Prerequisites

To be able to run the example you must have:
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved

* All the three Grid :ref:`prerequisites` (User Interface machine, Grid certificate, VO membership)
* An account on PiCaS server (send your request to <[email protected]>)



### Picas sample example


* Log in to the :abbr:`UI (User Interface)` and download the :download:`pilot_picas_fractals.tgz </Scripts/picas-python3/pilot_picas_fractals.tgz>` example, the couchdb package for Python :download:`couchdb.tgz </Scripts/couchdb.tgz>` and the fractals source code :download:`fractals.c </Scripts/fractals.c>`.

hailihu marked this conversation as resolved.
Show resolved Hide resolved
* Untar ``pilot_picas_fractals.tgz`` and inspect the content:

```
tar -xvf pilot_picas_fractals.tgz
cd pilot_picas_fractals/
ls -l
-rwxrwxr-x 1 homer homer 1247 Jan 28 15:40 createTokens
-rw-rw-r-- 1 homer homer 1202 Jan 28 15:40 createTokens.py
-rw-rw-r-- 1 homer homer 2827 Jan 28 15:40 createViews.py
-rw-rw-r-- 1 homer homer 462 Jan 28 15:40 fractals.jdl
drwxrwxr-x 2 homer homer 116 Jan 28 15:40 sandbox
```

Detailed information regarding the operations performed in each of the scripts below is embedded to the comments inside each of the scripts individually.

On the grid, in our screnario, you need to supply the entire environment through the sandbox (a more grid-native CVMFS example is available in the [picas-profile](https://github.com/sara-nl/picas-profile) repository). The binaries and python code need to be in this sandbox.
First we need to create a tar of the picas code, so that it can be sent to the Grid:
* Also download the current PiCaS version :download:`picas.tar </Scripts/picas-python3/picas.tar>` and put both PiCaS and the couchdb.tgz file in the ``sandbox`` directory:

```
tar cfv grid-sandbox/picas.tar ../picas/
cd sandbox
mv ../../couchdb.tgz ./
mv ../../picas.tar ./
```
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved

Secondly, the CouchDB python API needs to be available too, so download and extract it:
* And finally compile the fractals program (and put it in the sandbox directory) and move one directory up again:
```
cc ../../fractals.c -o fractals -lm
cd ..
```

The sandbox directory now holds everything we need to send to the Grid worker nodes.

Create the Tokens

This example includes a bash script (``./createTokens``) that generates a sensible parameter file, with each line representing a set of parameters that the fractals program can be called with. Without arguments it creates a fairly sensible set of 24 lines of parameters. You can generate different sets of parameters by calling the program with a combination of ``-q``, ``-d`` and ``-m`` arguments, but at the moment no documentation exists on these. We recommend not to use them for the moment.

* After you ran the ``createTokens`` script you'll see output similar to the following:
```
wget https://files.pythonhosted.org/packages/7c/c8/f94a107eca0c178e5d74c705dad1a5205c0f580840bd1b155cd8a258cb7c/CouchDB-1.2.tar.gz
./createTokens
/tmp/tmp.fZ33Kd8wXK
cat /tmp/tmp.fZ33Kd8wXK
```
Now we will start using PiCaS. For this we need the downloaded CouchDB and PiCaS packages for Python and set the hostname, database name and our credentials for the CouchDB server:

Now you can start the example from a grid login node with (in this case DIRAC is used for job submission):
* Edit ``sandbox/picasconfig.py`` and set the PiCaS host URL, database name, username and password.

```
dirac-wms-job-submit grid-example.jdl
ln -s sandbox/picasconfig.py
```

And the status and output can be retrieved with DIRAC commands, while in the token you see the status of the token and the tokens' attachments contain the log files. Once all tokens have been processed (this can be seen in the CouchDB instance) the grid job will finish.
* Make the CouchDB package locally available:
```
tar -xvf sandbox/couchdb.tgz
```

* Upload the tokens:

```
$python createTokens.py /tmp/tmp.fZ33Kd8wXK
```

* Check your database in this link:
```
https://picas.surfsara.nl:6984/_utils/#/database/homerdb/_all_docs (replace homerdb with your Picas database name)
```

* Create the Views (pools) - independent to the tokens (should be created only once):

```
python createViews.py
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
```
To make use of the dirac tool, first source the dirac env

```
source /etc/diracosrc
```

* Create a proxy:
```
dirac-proxy-init -b 2048 -g lsgrid_user -M lsgrid --valid 168:00 # replace lsgrid with your VO
```

* Submit the pilot jobs:
```
dirac-wms-job-submit fractals.jdl -f jobIDs
```


It will recursively generate an image based on parameters received from PiCas. At this point, some of your tokens are processed on the Grid worker nodes and some of the tokens are already processed on the :abbr:`UI (User Interface)`. Note that the :abbr:`UI (User Interface)` is not meant for production runs, but only for testing few runs before submitting the pilot jobs to the Grid.

* Convert the :abbr:`UI (User Interface)` output file to .png format and display the picture:
```
convert output_token_6 output_token_6.png # replace with your output filename
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
```

For the tokens that are processed on Grid, you can send the output to the :ref:`Grid Storage <grid-storage>` or some other remote location.

As we have seen, through PiCaS you have a single interface that can store tokens with work to be done (the CouchDB instance). Then on any machine where you can deploy the PiCaS client, one can perform the tasks hand.


## Running the long jobs

The example above is very fast in running (it only echos to your shell). To get an idea on longer running jobs there is also a "fractal" example. The work in this example takes from 10 seconds up to 30 minutes per token. To add these tokens to your DB, do:
The example above is very fast in running (it only echos to your shell). To get an idea on longer running jobs there is also a "fractal" example.
The work in this example takes from 10 seconds up to 30 minutes per token. To add these tokens to your DB, do:

```
./createTokens
Expand All @@ -153,6 +293,7 @@ python pushTokens.py /tmp/tmp.abc123
Now the tokens are available in the database. Next, the binary for the fractal calculation needs to be built:

```
mkdir bin
cc src/fractals.c -o bin/fractals -lm
```

Expand All @@ -165,13 +306,31 @@ eval $INPUT
with:

```
cd bin
./fractals -o $OUTPUT $INPUT
```

to ensure the fractal code is called.

Now, you can run your jobs whichever way you want (locally, slurm, grid) and start running using the general instructions as described above!
Now, you can run your jobs whichever way you want (locally, slurm, grid) and start submitting job!

It will recursively generate an image based on parameters received from PiCas. Once the jobs are run successfully, you can find the output in the bin directory.
Convert the output file to .png format and display the picture:
```
convert output_token_6 output_token_6.png # replace with your output filename
display output_token_6.png
```

## Checking failed jobs

While your pilot jobs process tasks, you can keep track of their progress through the CouchDB web interface. There are views installed to see:

* all the tasks that still need to be done (Monitor/todo)
* the tasks that are locked (Monitor/locked)
* tasks that encountered errors (Monitor/error)
* tasks that are finished (Monitor/done)

When all your pilot jobs are finished, ideally, you'd want all tasks to be 'done'. However, often you will find that not all jobs finished successfully and some are still in a 'locked' or 'error' state. If this happens, you should investigate what went wrong with these jobs. Incidentally, this will be due to errors with the middleware, network or storage. In those cases, you can remove the locks and submitting some new pilot jobs to try again. In other cases, there could be errors with your task: maybe you've sent the wrong parameters or forgot to download all necessary input files. Reviewing these failed tasks gives you the possibility to correct them and improve your submission scripts. After that, you could run those tasks again, either by removing their locks or by creating new tokens if needed and then submitting new pilot jobs.

Picas overview
==============
Expand Down
43 changes: 43 additions & 0 deletions examples/deleteTokens.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
'''
Created on 17 March 2016

@author: Natalie Danezi <[email protected]>
xinan1911 marked this conversation as resolved.
Show resolved Hide resolved
@helpdesk: SURFsara helpdesk <[email protected]>

usage: python deleteTokens.py [viewname]
e.g. python deleteTokens.py Monitor/todo

description:
Connect to PiCaS server
Delete all the Tokens in the [viewname] View
'''

import sys

import couchdb
import picasconfig


def deleteDocs(db, viewname):
# v=db.view("Monitor/todo")
v = db.view(viewname)
for x in v:
document = db[x['key']]
db.delete(document)


def get_db():
server = couchdb.Server(picasconfig.PICAS_HOST_URL)
username = picasconfig.PICAS_USERNAME
pwd = picasconfig.PICAS_PASSWORD
server.resource.credentials = (username, pwd)
db = server[picasconfig.PICAS_DATABASE]
return db


if __name__ == '__main__':
# Create a connection to the server
db = get_db()
# Delete the Docs in [viewname]
viewname = str(sys.argv[1])
deleteDocs(db, viewname)