diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2335ac2e3..8baa4b4ff 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,14 +4,20 @@ Welcome, and thanks for your interest in contributing to the Synapse Python clie By contributing, you are agreeing that we may redistribute your work under this [license](LICENSE.md). +# Table of contents - [How to contribute](#how-to-contribute) - - [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) - - [The development life cycle](#the-development-life-cycle) - - [Fork and clone this repository](#fork-and-clone-this-repository) - - [Installing the Python Client in a virtual environment with pipenv](#installing-the-python-client-in-a-virtual-environment-with-pipenv) - - [Development](#development) - - [Testing](#testing) -- [Repository Admins](#repository-admins) + * [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) +- [Getting started](#getting-started) + + [Fork and clone this repository](#fork-and-clone-this-repository) + + [Installing the Python Client in a virtual environment with pipenv](#installing-the-python-client-in-a-virtual-environment-with-pipenv) + + [Set up pre-commit](#set-up-pre-commit) + + [Authentication](#authentication) +- [The development life cycle](#the-development-life-cycle) + * [Development](#development) + * [Testing](#testing) + + [Integration testing agaisnt the `dev` synapse server](#integration-testing-agaisnt-the-dev-synapse-server) + * [Code style](#code-style) + * [Repository Admins](#repository-admins) ## I don't want to read this whole thing I just have a question! @@ -36,10 +42,7 @@ Bug reports and feature requests can be made in two ways. The first (preferred) After a bug report is received, expect a Sage Bionetworks staff member to contact you through the submission method you chose ([Synapse Help Forum](https://www.synapse.org/#!SynapseForum:default)or [Github issue](https://github.com/Sage-Bionetworks/synapsePythonClient/issues)). After ascertaining there is enough detail for the bug report or feature request, a JIRA issue will be opened. If you want to work on fixing the issue or feature yourself, follow the next sections! -### The development life cycle - -Developing on the Python client starts with picking a issue to work on in JIRA! The open work items (bugs and new features) are tracked in JIRA in the [SYNPY Project](https://sagebionetworks.jira.com/projects/SYNPY/issues). Issues marked as `Open` are ready for your contributions! Please make sure your assign yourself the ticket and check with the maintainers if the issue you picked is suitable. - +## Getting started #### Fork and clone this repository 1. See the [Github docs](https://help.github.com/articles/fork-a-repo/) for how to make a copy (a fork) of a repository to your own Github account. @@ -66,14 +69,35 @@ Perform the following one-time steps to set up your local environment. * pipenv ```bash # Verify you are at the root directory for the cloned repository (ie: `cd synapsePythonClient`) - pipenv install # To develop locally you want to add --dev - # pipenv install --dev + pipenv install --dev + # Set your active session to the virtual environment you created pipenv shell + # Note: The 'Python Environment Manager' extension in vscode is reccomended here ``` 4. Once completed you are ready to start developing. Commands run through the CLI, or through an IDE like visual studio code within the virtual environment will have all required dependencies automatically installed. Try running `synapse -h` in your shell to read over the available CLI commands. Or view the `Usage as a library` section in the README.md to get started using the library to write more python. -#### Development + +#### Set up pre-commit +Once your virtual environment is created the `pre-commit` command will be available through your terminal. Install `pre-commit` into your git hooks via: +``` +pre-commit install +``` +When you commit your files via git it will automatically use the `.pre-commit-config.yaml` to run various checks to enforce style. + +If you want to manually run all pre-commit hooks use: +``` +pre-commit run --all-files +``` + +#### Authentication +Learn about the multiple ways one can login to Synapse [here](https://python-docs.synapse.org/build/html/getting_started/credentials.html). + +## The development life cycle + +Developing on the Python client starts with picking a issue to work on in JIRA! The open work items (bugs and new features) are tracked in JIRA in the [SYNPY Project](https://sagebionetworks.jira.com/projects/SYNPY/issues). Issues marked as `Open` are ready for your contributions! Please make sure your assign yourself the ticket and check with the maintainers if the issue you picked is suitable. + +### Development Now that you have chosen a JIRA ticket and have your own fork of this repository. It's time to start development! @@ -83,14 +107,6 @@ Now that you have chosen a JIRA ticket and have your own fork of this repository 1. (Assuming you have followed all 4 steps above in the "fork and clone this repository" section). Navigate to your cloned repository on your computer/server. -1. Make sure your `develop` branch is up to date with the `Sage-Bionetworks/synapsePythonClient` develop branch - - ``` - cd synapsePythonClient - git remote add upstream https://github.com/Sage-Bionetworks/synapsePythonClient.git - git checkout develop - git pull upstream develop - ``` 1. Create a feature branch which off the `develop` branch. The branch should be named the same as the JIRA issue you are working on (e.g., `SYNPY-1234-{feature-here}`). I recommend adding details of the actual feature in the branch name so that you don't need to go back and forth between JIRA and GitHub. @@ -112,11 +128,12 @@ Now that you have chosen a JIRA ticket and have your own fork of this repository > It is good practice to run `git diff` or `git status` to first view your changes prior to pushing! ``` + # Make sure that you have setup `pre-commit` as noted in the getting started section git commit changed_file.txt -m "Remove X parameter because it was unused" git push ``` -1. (Make sure you have follow instructions in "Install development dependencies") Once you have made your additions or changes, make sure you write tests and run the test suite. More information on testing below. +1. Once you have made your additions or changes, make sure you write tests and run the test suite. More information on testing below. 1. Once you have completed all the steps above, in Github, create a pull request from the feature branch of your fork to the `develop` branch of `Sage-Bionetworks/synapsePythonClient`. > **Note** @@ -125,7 +142,7 @@ Now that you have chosen a JIRA ticket and have your own fork of this repository > > The status of an issue can be tracked in JIRA. Once an issue has passed a code review with a Sage Bionetworks engineer, he/she will update it's status in JIRA appropriately. -#### Testing +### Testing All code added to the client must have tests. These might include unit tests (to test specific functionality of code that was added to support fixing the bug or feature), integration tests (to test that the feature is usable - e.g., it should have complete the expected behavior as reported in the feature request or bug report), or both. @@ -139,7 +156,7 @@ Here's how to run the test suite: # Unit tests pytest -vs tests/unit -# Integration tests +# Integration tests - The integration tests should be run against the `dev` synapse server pytest -vs tests/integration ``` @@ -150,6 +167,22 @@ To test a specific feature, specify the full path to the function to run: pytest -vs tests/integration/synapseclient/test_command_line_client.py::test_table_query ```` +#### Integration testing against the `dev` synapse server +The easiest way to specify the HTTP endpoints to use for all synapse requests is to modify the `~/.synapseConfig` file and modify a few key-value pairs such as below. Not this is also where you will specify the dev authentication: +``` +[authentication] +username= +authtoken= + +[endpoints] +repoEndpoint=https://repo-dev.dev.sagebase.org/repo/v1 +authEndpoint=https://repo-dev.dev.sagebase.org/auth/v1 +fileHandleEndpoint=https://repo-dev.dev.sagebase.org/file/v1 +``` + +#### Integration testing for external collaborators +As an external collaborator you will not have access to a development account and environment to run the integration tests agaisnt. Either request that a Sage Bionetworks staff member run your integration tests via a pull request, or, contact us via the [Service Desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/9) to requisition a development account for integration testing only. + ### Code style The Synapse Python Client uses [`flake8`](https://pypi.org/project/flake8/) to enforce diff --git a/README.md b/README.md index 79e5c6578..406418b1b 100644 --- a/README.md +++ b/README.md @@ -93,7 +93,7 @@ The Synapse client can be used to write software that interacts with the Sage Bi syn = synapseclient.Synapse() - ## log in using username and password + ## log in using auth token syn.login(authToken='auth_token') ## retrieve a 100 by 4 matrix @@ -124,6 +124,8 @@ Authentication toward [Synapse](https://www.synapse.org/#RegisterAccount:0) can Authentication via passwords and API keys will be deprecated early 2024. +Learn about the multiple ways one can login to Synapse [here](https://python-docs.synapse.org/build/html/getting_started/credentials.html). + Synapse Utilities (synapseutils) -------------------------------- diff --git a/synapseclient/client.py b/synapseclient/client.py index 69ccaafe1..a7a57a7cd 100644 --- a/synapseclient/client.py +++ b/synapseclient/client.py @@ -1035,7 +1035,11 @@ def _getWithEntityBundle(self, entityBundle, entity=None, **kwargs): if downloadFile: if file_handle: self._download_file_entity( - downloadLocation, entity, ifcollision, submission + downloadLocation, + entity, + ifcollision, + submission, + file_handle.get("contentMd5", None), ) else: # no filehandle means that we do not have DOWNLOAD permission warning_message = ( @@ -1062,7 +1066,12 @@ def _ensure_download_location_is_directory(self, downloadLocation): return download_dir def _download_file_entity( - self, downloadLocation: str, entity: Entity, ifcollision: str, submission: str + self, + downloadLocation: str, + entity: Entity, + ifcollision: str, + submission: str, + expected_md5: typing.Union[str, None], ): # set the initial local state entity.path = None @@ -1074,6 +1083,15 @@ def _download_file_entity( # downloaded the file cached_file_path = self.cache.get(entity.dataFileHandleId, downloadLocation) + # This is to handle for cases where the names of the files in the cache and the + # requested file name match - But there is a difference in content. + if ( + expected_md5 is not None + and cached_file_path is not None + and utils.md5_for_file(cached_file_path).hexdigest() != expected_md5 + ): + cached_file_path = None + # location in .synapseCache where the file would be corresponding to its FileHandleId synapseCache_location = self.cache.get_cache_dir(entity.dataFileHandleId) diff --git a/tests/integration/synapseclient/integration_test_Entity.py b/tests/integration/synapseclient/integration_test_Entity.py index e04939b33..f5afb8781 100644 --- a/tests/integration/synapseclient/integration_test_Entity.py +++ b/tests/integration/synapseclient/integration_test_Entity.py @@ -7,13 +7,24 @@ import pytest from unittest.mock import patch +from pytest_mock import MockerFixture + + from synapseclient.core.upload.upload_functions import create_external_file_handle -from synapseclient import Activity, DockerRepository, File, Folder, Link, Project +from synapseclient import ( + Activity, + DockerRepository, + File, + Folder, + Link, + Project, + Synapse, +) from synapseclient.core.exceptions import SynapseError, SynapseHTTPError import synapseclient.core.utils as utils -def test_Entity(syn, project, schedule_for_cleanup): +def test_Entity(syn: Synapse, project: Project, schedule_for_cleanup): # Update the project project_name = str(uuid.uuid4()) project = Project(name=project_name) @@ -152,7 +163,7 @@ def test_Entity(syn, project, schedule_for_cleanup): assert os.path.basename(a_file_cached.path) == os.path.basename(a_file.path) -def test_special_characters(syn, project, schedule_for_cleanup): +def test_special_characters(syn: Synapse, project: Project): folder = syn.store( Folder( "Special Characters Here", @@ -174,7 +185,7 @@ def test_special_characters(syn, project, schedule_for_cleanup): assert folder.russian_annotation[0] == "Обезьяна прикладом" -def test_get_local_file(syn, project, schedule_for_cleanup): +def test_get_local_file(syn: Synapse, project: Project, schedule_for_cleanup): new_path = utils.make_bogus_data_file() schedule_for_cleanup(new_path) folder = Folder( @@ -204,7 +215,7 @@ def test_get_local_file(syn, project, schedule_for_cleanup): pytest.raises(SynapseError, syn.get, new_path, limitSearch="syn1") -def test_store_with_flags(syn, project, schedule_for_cleanup): +def test_store_with_flags(syn: Synapse, project: Project, schedule_for_cleanup): # -- CreateOrUpdate flag for Projects -- # If we store a project with the same name, it should become an update projUpdate = Project(project.name) @@ -287,11 +298,14 @@ def test_store_with_flags(syn, project, schedule_for_cleanup): assert ephemeralBogus.shoe_size == [11.5] -def test_get_with_downloadLocation_and_ifcollision(syn, project, schedule_for_cleanup): +def test_get_with_downloadLocation_and_ifcollision( + syn: Synapse, project: Project, schedule_for_cleanup +): # Store a File and delete it locally filepath = utils.make_bogus_binary_file() bogus = File(filepath, name="Bogus Test File", parent=project) bogus = syn.store(bogus) + schedule_for_cleanup(bogus) os.remove(filepath) # Compare stuff to this one @@ -339,7 +353,65 @@ def test_get_with_downloadLocation_and_ifcollision(syn, project, schedule_for_cl os.remove(renamedBogus.path) -def test_store_activity(syn, project, schedule_for_cleanup): +def test_get_with_cache_hit_and_miss_with_ifcollision( + syn: Synapse, project: Project, schedule_for_cleanup, mocker: MockerFixture +): + download_file_function = mocker.spy(obj=syn, name="_downloadFileHandle") + # GIVEN a File that is stored in Synapse - and removed from the local machine + filepath = utils.make_bogus_binary_file() + original_file_md5 = utils.md5_for_file(filepath).hexdigest() + bogus_file = File( + filepath, name="a_name_that_will_show_up_in_cache", parent=project + ) + bogus_file = syn.store(bogus_file) + schedule_for_cleanup(bogus_file) + os.remove(filepath) + + # WHEN I get the File from synapse + unmodified_file_from_server = syn.get( + bogus_file, + downloadLocation=os.path.dirname(filepath), + ifcollision="overwrite.local", + ) + + # THEN I expect the file to have been downloaded and match the original files MD5 + assert download_file_function.call_count == 1 + assert unmodified_file_from_server._file_handle["contentMd5"] == original_file_md5 + assert utils.md5_for_file(filepath).hexdigest() == original_file_md5 + + # AND I expect the file to be in the cache and match the original files MD5 + copy_of_file_from_server = syn.get( + bogus_file, + downloadLocation=os.path.dirname(filepath), + ifcollision="overwrite.local", + ) + assert copy_of_file_from_server.id == unmodified_file_from_server.id + assert filecmp.cmp(copy_of_file_from_server.path, unmodified_file_from_server.path) + assert download_file_function.call_count == 1 + assert copy_of_file_from_server._file_handle["contentMd5"] == original_file_md5 + assert utils.md5_for_file(filepath).hexdigest() == original_file_md5 + + # GIVEN another bogus file with the same name and non matching MD5 data + new_bogus_file = utils.make_bogus_binary_file() + os.rename(new_bogus_file, filepath) + assert utils.md5_for_file(filepath).hexdigest() != original_file_md5 + + # WHEN I get the File from synapse + copy_of_file_from_server = syn.get( + bogus_file, + downloadLocation=os.path.dirname(filepath), + ifcollision="overwrite.local", + ) + + # THEN I expect the file to have been downloaded and overwrite the file on my machine with the same name + assert copy_of_file_from_server._file_handle["contentMd5"] == original_file_md5 + assert utils.md5_for_file(filepath).hexdigest() == original_file_md5 + assert download_file_function.call_count == 2 + + os.remove(filepath) + + +def test_store_activity(syn: Synapse, project: Project, schedule_for_cleanup): # Create a File and an Activity path = utils.make_bogus_binary_file() schedule_for_cleanup(path) @@ -388,7 +460,7 @@ def test_store_activity(syn, project, schedule_for_cleanup): assert honking["id"] == honking2["id"] -def test_store_isRestricted_flag(syn, project, schedule_for_cleanup): +def test_store_isRestricted_flag(syn: Synapse, project: Project, schedule_for_cleanup): # Store a file with access requirements path = utils.make_bogus_binary_file() schedule_for_cleanup(path) @@ -402,7 +474,7 @@ def test_store_isRestricted_flag(syn, project, schedule_for_cleanup): assert intercepted.called -def test_ExternalFileHandle(syn, project, schedule_for_cleanup): +def test_ExternalFileHandle(syn: Synapse, project: Project): # Tests shouldn't have external dependencies, but this is a pretty picture of Singapore singapore_url = "http://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/1_singapore_city_skyline_dusk_panorama_2011.jpg/1280px-1_singapore_city_skyline_dusk_panorama_2011.jpg" # noqa singapore = File(singapore_url, parent=project, synapseStore=False) @@ -432,7 +504,7 @@ def test_ExternalFileHandle(syn, project, schedule_for_cleanup): assert s2.externalURL == singapore_2_url -def test_synapseStore_flag(syn, project, schedule_for_cleanup): +def test_synapseStore_flag(syn: Synapse, project: Project, schedule_for_cleanup): # Store a path to a local file path = utils.make_bogus_data_file() schedule_for_cleanup(path) @@ -474,7 +546,7 @@ def test_synapseStore_flag(syn, project, schedule_for_cleanup): assert not bogus.synapseStore -def test_create_or_update_project(syn, project, schedule_for_cleanup): +def test_create_or_update_project(syn: Synapse, project: Project, schedule_for_cleanup): name = str(uuid.uuid4()) project = Project(name, a=1, b=2) @@ -498,8 +570,8 @@ def test_create_or_update_project(syn, project, schedule_for_cleanup): pytest.raises(Exception, syn.store, project, createOrUpdate=False) -def test_download_file_false(syn, project, schedule_for_cleanup): - RENAME_SUFFIX = "blah" + str(uuid.uuid4()) +def test_download_file_false(syn: Synapse, project: Project, schedule_for_cleanup): + RENAME_SUFFIX = "blah" # Upload a file filepath = utils.make_bogus_binary_file() @@ -524,7 +596,7 @@ def test_download_file_false(syn, project, schedule_for_cleanup): assert reupload.name == file.name -def test_download_file_URL_false(syn, project, schedule_for_cleanup): +def test_download_file_URL_false(syn: Synapse, project: Project): # Upload an external file handle fileThatExists = "http://dev-versions.synapse.sagebase.org/synapsePythonClient" reupload = File(fileThatExists, synapseStore=False, parent=project) @@ -555,7 +627,9 @@ def test_download_file_URL_false(syn, project, schedule_for_cleanup): # SYNPY-366 -def test_download_local_file_URL_path(syn, project, schedule_for_cleanup): +def test_download_local_file_URL_path( + syn: Synapse, project: Project, schedule_for_cleanup +): path = utils.make_bogus_data_file() schedule_for_cleanup(path) @@ -567,7 +641,9 @@ def test_download_local_file_URL_path(syn, project, schedule_for_cleanup): # SYNPY-424 -def test_store_file_handle_update_metadata(syn, project, schedule_for_cleanup): +def test_store_file_handle_update_metadata( + syn: Synapse, project: Project, schedule_for_cleanup +): original_file_path = utils.make_bogus_data_file() schedule_for_cleanup(original_file_path) @@ -594,7 +670,7 @@ def test_store_file_handle_update_metadata(syn, project, schedule_for_cleanup): assert [os.path.basename(replacement_file_path)] == new_entity.files -def test_store_DockerRepository(syn, project, schedule_for_cleanup): +def test_store_DockerRepository(syn: Synapse, project: Project): repo_name = "some/repository/path" docker_repo = syn.store(DockerRepository(repo_name, parent=project)) assert isinstance(docker_repo, DockerRepository) @@ -604,7 +680,7 @@ def test_store_DockerRepository(syn, project, schedule_for_cleanup): @pytest.mark.flaky(reruns=3, only_rerun=["SynapseHTTPError"]) def test_store__changing_externalURL_by_changing_path( - syn, project, schedule_for_cleanup + syn: Synapse, project: Project, schedule_for_cleanup ): url = "https://www.synapse.org/Portal/clear.cache.gif" ext = syn.store(File(url, name="test", parent=project, synapseStore=False)) @@ -631,7 +707,7 @@ def test_store__changing_externalURL_by_changing_path( @pytest.mark.flaky(reruns=3, only_rerun=["SynapseHTTPError"]) def test_store__changing_from_Synapse_to_externalURL_by_changing_path( - syn, project, schedule_for_cleanup + syn: Synapse, project: Project, schedule_for_cleanup ): # create a temp file temp_path = utils.make_bogus_data_file() diff --git a/tests/unit/synapseclient/core/unit_test_download.py b/tests/unit/synapseclient/core/unit_test_download.py index cb7561481..7543f3fd9 100644 --- a/tests/unit/synapseclient/core/unit_test_download.py +++ b/tests/unit/synapseclient/core/unit_test_download.py @@ -122,7 +122,7 @@ def mock_generate_headers(self, headers=None): return {} -def test_mock_download(syn): +def test_mock_download(syn: Synapse): temp_dir = tempfile.gettempdir() fileHandleId = "42" @@ -388,7 +388,7 @@ def test_mock_download(syn): class Test__downloadFileHandle(unittest.TestCase): @pytest.fixture(autouse=True, scope="function") - def init_syn(self, syn): + def init_syn(self, syn: Synapse): self.syn = syn def tearDown(self) -> None: @@ -506,7 +506,7 @@ def test_multithread_false__S3_fileHandle(self): class Test_download_from_url_multi_threaded: @pytest.fixture(autouse=True, scope="function") - def init_syn(self, syn): + def init_syn(self, syn: Synapse): self.syn = syn def test_md5_mismatch(self): @@ -564,7 +564,7 @@ def test_md5_match(self): ) -def test_download_end_early_retry(syn): +def test_download_end_early_retry(syn: Synapse): """ -------Test to ensure download retry even if connection ends early-------- """ @@ -645,7 +645,7 @@ def test_download_end_early_retry(syn): mocked_move.assert_called_once_with(temp_destination, destination) -def test_download_md5_mismatch__not_local_file(syn): +def test_download_md5_mismatch__not_local_file(syn: Synapse): """ --------Test to ensure file gets removed on md5 mismatch-------- """ @@ -712,7 +712,7 @@ def test_download_md5_mismatch__not_local_file(syn): mocked_remove.assert_called_once_with(destination) -def test_download_md5_mismatch_local_file(syn): +def test_download_md5_mismatch_local_file(syn: Synapse): """ --------Test to ensure file gets removed on md5 mismatch-------- """ @@ -743,7 +743,7 @@ def test_download_md5_mismatch_local_file(syn): assert not mocked_remove.called -def test_download_file_entity__correct_local_state(syn): +def test_download_file_entity__correct_local_state(syn: Synapse): mock_cache_path = utils.normalize_path("/i/will/show/you/the/path/yi.txt") file_entity = File(parentId="syn123") file_entity.dataFileHandleId = 123 @@ -753,6 +753,7 @@ def test_download_file_entity__correct_local_state(syn): entity=file_entity, ifcollision="overwrite.local", submission=None, + expected_md5=None, ) assert mock_cache_path == utils.normalize_path(file_entity.path) assert os.path.dirname(mock_cache_path) == file_entity.cacheDir @@ -760,7 +761,7 @@ def test_download_file_entity__correct_local_state(syn): assert os.path.basename(mock_cache_path) == file_entity.files[0] -def test_getFileHandleDownload__error_UNAUTHORIZED(syn): +def test_getFileHandleDownload__error_UNAUTHORIZED(syn: Synapse): ret_val = { "requestedFiles": [ { @@ -772,7 +773,7 @@ def test_getFileHandleDownload__error_UNAUTHORIZED(syn): pytest.raises(SynapseError, syn._getFileHandleDownload, "123", "syn456") -def test_getFileHandleDownload__error_NOT_FOUND(syn): +def test_getFileHandleDownload__error_NOT_FOUND(syn: Synapse): ret_val = { "requestedFiles": [ {