-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Falcon data provider #2955
base: master
Are you sure you want to change the base?
Add Falcon data provider #2955
Conversation
…into falcon-integration
parsl/data_provider/falcon.py
Outdated
try: | ||
subprocess.run(sender_command, check=True) | ||
except subprocess.CalledProcessError as e: | ||
logger.debug(f"Command failed with exit code {e.returncode}: {e.stderr}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should re-raise this exception so that it propagates into the rest of parsl? (so that a failed transfer then causes a task failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This have been addressed now.
parsl/data_provider/falcon.py
Outdated
"host": self.host_ip, | ||
"port": directory.query, | ||
"directory_path": directory.path | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this block and the below sender block and how it is using the user supplied file URL, especially to do with hostnames and port numbers - mostly because I don't know much about Falcon.
I think this sends a command to remote host directory.netloc
port 5555, telling it to deliver the directory directory.path
that exists on that remote host; and then in a few lines time, it starts a receiver listening on local IP address host_ip
port directory.query
to receive the data and put it in working_dir
.
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following code needs to be running on the source host https://github.com/KiwiFac3/Datamover2Parsl/blob/main/Sender/sender.py so the communication is between the part you highlighted and this code in order to send the correct files to the destination host. If it still unclear please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so what I'm a bit confused about, specifically:
if you are specifying the local host port as part of the provider initializer, why doesn't the local port number get specified in the same way as part of the constructor? A URL should name where the data is remotely but a falcon:
URL seems to name a local port too (in directory.query
) which seems out of place.
There's a URL mechanism for specifying a port, as part of directory.netloc
, parsed out as:
o.netloc
'docs.python.org:80'
o.hostname
'docs.python.org'
o.port
https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse
Naively I'd expect a port specified that way to override the hard coded 5555 port number of the remote end - (although right now it would break things, I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the source host doesn't have the information that you specified including the local host port that is part of the provider initializer. In order to make it know for the source host we communicate the needed information for the transfer through zmq using a predefined port (in my case 5555) which starts the transfer (which takes considerably more time than the communication) on the local host port that is part of the provider initializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it could get the local host port from a variable like self.local_port
specified in the initializer, right? same as the hostname? I don't understand why the local hostname and local port need to be treated differently in how they are supplied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so you cant have the same port for the transfer for multiple hosts that's why we supply the port through query. However the zmq port have to be fixed and pre-determined in order to actually receive the information. I cannot convey the port number supplied through the provider initializer without that connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so I think that means it is up to the end user to manage the selection of local ports on their submitting machine (for example in their head or in their own workflow code) and to make sure the right ports get added onto the end of the right falcon:
URLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to describe that in the docstring (https://github.com/Parsl/parsl/pull/2955/files#r1392354365) so that users can understand how they are meant to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be addressed in my latest commit
parsl/data_provider/falcon.py
Outdated
|
||
class FalconStaging(Staging, RepresentationMixin): | ||
""" | ||
In this function the File object is used to represent the directory being tranfered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make the docstring documentation for this class appear by adding the class name into docs/reference.rst. There is a data management / file staging section in there where you will find the other staging providers such as parsl.data_provider.globus.GlobusStaging
I think this docstring should then describe:
-
what this provider is from the perspective of a Parsl user who does not know what Falcon is: one sentence describing what it is for, and a URL to learn more (for example, a Falcon getting started guide or something else hands-on)
-
what
falcon:
URLs should look like and what they mean.
I would be interested in a simple test in the CI that this code basically works - for example can you start a receiver and sender on the localhost and run a small transfer? |
Tests have been conducted on both local hosts and on different hosts, I am happy to share the performance improvements. |
I'm interested in tests that tell me that this code works and continues to work for (eg.) the next year as people make changes to the Parsl codebase and as Falcon itself changes: so that a user coming to use this provider can have some basic expectation that it won't be broken. We don't really do any automated performance tests (although I'd like that one day...) |
https://github.com/KiwiFac3/Datamover2Parsl/blob/main/Test/Falcon.py this is the file I used for testing it works with no issues and I tried to keep my code as close to the other data-provider to insure no changes to Parsl codebase will affect my integration. I did present the results of the test in ParslFest I can re-share them if you would like. |
I still really would like the documentation to refer to somewhere where people can get the Falcon software, read more about it, etc - imagine yourself as a person browsing through the Parsl documentation. |
I just added a more detailed documentation, however I would like your feedback if there is any aspect that I need to touch on |
'directory_path' --method probe" | ||
- sleep for 0.1 seconds (in order for the receiver instance to run before sender instance) | ||
e.g. | ||
zmq_context = zmq.Context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import zmq first
parsl/data_provider/falcon.py
Outdated
- Setup your virtual environments on both source and destination server | ||
and install required python packages. | ||
- On the destination server, you can stage data files using FalconStaging. | ||
The following example shows creation of a Flacon-accessible file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: Flacon
parsl/data_provider/files.py
Outdated
@@ -65,7 +66,8 @@ def __repr__(self) -> str: | |||
f"scheme={self.scheme} " \ | |||
f"netloc={self.netloc} " \ | |||
f"path={self.path} " \ | |||
f"filename={self.filename}" | |||
f"filename={self.filename}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a following space to separate it from query
:
<File at 0x7f442e7bf690 url=falcon://localhost/tmp/file.txt?8000 scheme=falcon netloc=localhost path=/tmp/file.txt filename=file.txtquery=8000>
print("Received JSON data:", json_data) | ||
|
||
# Parse the JSON data | ||
data = json.loads(json_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import json first
parsl/data_provider/falcon.py
Outdated
zmq_context.term() | ||
|
||
sender_command = ["falcon", "receiver", "--host", self.host_ip, "--port", directory.query, "--data_dir", | ||
working_dir] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gives a pretty ugly exception if the user hasn't specified working_dir
on the relevant executor.
parsl/data_provider/falcon.py
Outdated
zmq_socket.close() | ||
zmq_context.term() | ||
|
||
sender_command = ["falcon", "receiver", "--host", self.host_ip, "--port", directory.query, "--data_dir", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a receiver command, not a sender command so you might want to rename this variable for people want to read this code
""" | ||
To run Falcon data_provider | ||
- Setup your virtual environments on both source and destination server | ||
and install required python packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"install required python packages" might look better as an explicit pip command to type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the following requirements.txt for this, should I just like it as a reference? https://github.com/KiwiFac3/Datamover2Parsl/blob/main/requirments.txt
@KiwiFac3 I added a few comments here - you could probably easily address them before merge. Right now I'm trying to run this myself to see how it works (which is how I ended up with most of my recent comments) -- the dependency stack underneath is a bit awkward on top of my current environment, to do with scipy/numpy/pandas versions, but let me see how that goes. |
I get this:
which feels like some kind of package versioning problem between scipy and numpy and skopt -- can you send me the exact versions you have installed in your environment? @KiwiFac3 |
I am using the following versions https://github.com/KiwiFac3/Datamover2Parsl/blob/main/requirments.txt |
I believe I addressed all the comments let me know if there is anything else I should address. |
ok I'll try to work out what's happening in my environment compared to that and let you know how it goes. |
…into falcon-integration
@benclifford I made a slight change to the code so it wouldn't have issues in certain corner cases, please let me know if any changes are required. |
I'm sure the change is fine, but this PR still has some general work to go before merging. The most important obstacle in my mind is a missing set of automated tests in CI. That is, responding to: I'm happy to see the performance improvements, but that's not the point of CI. CI ("Continuous Integration") is about robustness, and robustness for the next dev who isn't aware of the intricacies of your code. CI — a set of individual test cases — runs for every push to a PR, and also periodically against the Without a set of test cases that verify this PR's additions, we won't discover a misassumption by a future dev until possibly months later (perhaps only when a user complains). Thus, before this goes in, I want to see a few unit tests that verify different important aspects of Falcon in Parsl. For test examples, please see |
Should I create those CI tests, if so is there an example on any previous ones? |
Yes. That is typically under the purview of the PR author.
See the aforementioned. Given your question, let me point out that CI ("continuous deployment") is "merely" automated testing. GitHub has this builtin via the # .github/workflows/ci.yaml
name: Parsl
on:
pull_request:
types:
- opened
- synchronize
[...rest of ci.yaml file...] I'll not delve into those details here, but instead observe that CI creates a consistent environment and run tests in a consistent way every time they run (e.g., on every push to a pull request). In particular, one can run the tests locally as well, which is much more efficient while developing. If the above is not enough of a pointer, please reach out for help. (Slack might be a more expedient avenue.) |
Description
Integration Falcon as a Parsl data provider.
Changed Behaviour
None of the current behaviours should change
Type of change
Choose which options apply, and delete the ones which do not apply.