Skip to content

Commit

Permalink
[SYNPY-1320] Upload benchmark + Documentation (#1012)
Browse files Browse the repository at this point in the history
* Benchmarking synapse with a py script and adding onto our documentation
  • Loading branch information
BryanFauble authored Nov 16, 2023
1 parent acbcf52 commit fc865fd
Show file tree
Hide file tree
Showing 12 changed files with 1,031 additions and 538 deletions.
544 changes: 269 additions & 275 deletions Pipfile.lock

Large diffs are not rendered by default.

158 changes: 130 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Clone the [source code repository](https://github.com/Sage-Bionetworks/synapsePy

git clone git://github.com/Sage-Bionetworks/synapsePythonClient.git
cd synapsePythonClient
python setup.py install
pip install .


Command line usage
Expand All @@ -85,37 +85,139 @@ Note that a [Synapse account](https://www.synapse.org/#RegisterAccount:0) is req
Usage as a library
------------------

The Synapse client can be used to write software that interacts with the Sage Bionetworks Synapse repository.
The Synapse client can be used to write software that interacts with the Sage Bionetworks Synapse repository. More examples can be found in the Tutorial section found [here](https://python-docs.synapse.org/build/html/getting_started/basics.html)

### Example
### Examples

import synapseclient
#### Log-in and create a Synapse object
```
import synapseclient
syn = synapseclient.Synapse()
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
```

#### Sync a local directory to synapse
This is the recommended way of synchronizing more than one file or directory to a synapse project through the use of `synapseutils`. Using this library allows us to handle scheduling everything required to sync an entire directory tree. Read more about the manifest file format in [`synapseutils.syncToSynapse`](https://python-docs.synapse.org/build/html/articles/synapseutils.html#synapseutils.sync.syncToSynapse)
```
import synapseclient
import synapseutils
import os
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
path = os.path.expanduser("~/synapse_project")
manifest_path = f"{path}/my_project_manifest.tsv"
project_id = "syn1234"
# Create the manifest file on disk
with open(manifest_path, "w", encoding="utf-8") as f:
pass
# Walk the specified directory tree and create a TSV manifest file
synapseutils.generate_sync_manifest(
syn,
directory_path=path,
parent_id=project_id,
manifest_path=manifest_path,
)
# Using the generated manifest file, sync the files to Synapse
synapseutils.syncToSynapse(
syn,
manifestFile=manifest_path,
sendMessages=False,
)
```

#### Store a Project to Synapse
```
import synapseclient
from synapseclient.entity import Project
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
project = Project('My uniquely named project')
project = syn.store(project)
print(project.id)
print(project)
```

## log in using auth token
syn.login(authToken='auth_token')

## retrieve a 100 by 4 matrix
matrix = syn.get('syn1901033')

## inspect its properties
print(matrix.name)
print(matrix.description)
print(matrix.path)

## load the data matrix into a dictionary with an entry for each column
with open(matrix.path, 'r') as f:
labels = f.readline().strip().split('\t')
data = {label: [] for label in labels}
for line in f:
values = [float(x) for x in line.strip().split('\t')]
for i in range(len(labels)):
data[labels[i]].append(values[i])

## load the data matrix into a numpy array
import numpy as np
np.loadtxt(fname=matrix.path, skiprows=1)
#### Store a Folder to Synapse (Does not upload files within the folder)
```
import synapseclient
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
folder = Folder(name='my_folder', parent="syn123")
folder = syn.store(folder)
print(folder.id)
print(folder)
```

#### Store a File to Synapse
```
import synapseclient
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
file = File(
path=filepath,
parent="syn123",
)
file = syn.store(file)
print(file.id)
print(file)
```

#### Get a data matrix
```
import synapseclient
syn = synapseclient.Synapse()
## log in using auth token
syn.login(authToken='auth_token')
## retrieve a 100 by 4 matrix
matrix = syn.get('syn1901033')
## inspect its properties
print(matrix.name)
print(matrix.description)
print(matrix.path)
## load the data matrix into a dictionary with an entry for each column
with open(matrix.path, 'r') as f:
labels = f.readline().strip().split('\t')
data = {label: [] for label in labels}
for line in f:
values = [float(x) for x in line.strip().split('\t')]
for i in range(len(labels)):
data[labels[i]].append(values[i])
## load the data matrix into a numpy array
import numpy as np
np.loadtxt(fname=matrix.path, skiprows=1)
```


Authentication
Expand Down
31 changes: 31 additions & 0 deletions docs/articles/benchmarking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
*****************
Benchmarking
*****************

Periodically we will be publishing results of benchmarking the Synapse Python Client
compared to directly working with AWS S3. The purpose of these benchmarks is to make
data driven decisions on where to spend time optimizing the client. Additionally, it will
give us a way to measure the impact of changes to the client.

===================
Results
===================


11/14/2023
==========================
The results were created on a `t3a.micro` EC2 instance with a 200GB disk size running in us-east-1.
The script that was run can be found in `docs/scripts`. The time to create the files on disk is not included.


+---------------------------+-------------------+---------------------+---------+---------------+
| Test | Synapseutils Sync | os.walk + syn.store | S3 Sync | Per file size |
+===========================+===================+=====================+=========+===============+
| 25 Files 1MB total size | 10.43s | 8.99s | 1.83s | 40KB |
+---------------------------+-------------------+---------------------+---------+---------------+
| 775 Files 10MB total size | 243.57s | 257.27s | 7.64s | 12.9KB |
+---------------------------+-------------------+---------------------+---------+---------------+
| 10 Files 1GB total size | 27.18s | 33.73s | 16.31s | 100MB |
+---------------------------+-------------------+---------------------+---------+---------------+
| 10 Files 100GB total size | 3211s | 3047s | 3245s | 10GB |
+---------------------------+-------------------+---------------------+---------+---------------+
Loading

0 comments on commit fc865fd

Please sign in to comment.