This repository has been archived by the owner on Jul 16, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18 from moka-guys/development
Add wscleaner v1.0
- Loading branch information
Showing
14 changed files
with
780 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
*.pyc | ||
*.egg-info | ||
wscleaner/wscleaner/config.json | ||
wscleaner/test/test_dir*.txt | ||
wscleaner/test/data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Workstation Cleaner Design Document | ||
|
||
Owner: Nana Mensah | ||
Date: 30/05/19 | ||
Status: Draft | ||
|
||
## Brief | ||
|
||
The Viapath Genome Informatics team use a linux workstation to manage sequencing files. These files are uploaded to the DNAnexus service for storage, however clearing the workstation is time intensive. | ||
|
||
## User Story | ||
|
||
As a Clinical Bioinformatician, I need to automate the deletion of sequencing folders that have been successfuly backed up, so that I can free up time for other duties. | ||
|
||
## Functional requirements | ||
|
||
FR1. Accurately detect sequencing folders have been successfully backed up | ||
FR2. Delete old sequencing folders that are successfully backed up | ||
FR3. Log all activity to a local logfile | ||
|
||
## Non-functional requirements | ||
|
||
NF1. Run from the Linux command line | ||
NF2. Process runfolders within 24 hours | ||
NF3. Use any available DNAnexus SDKs | ||
NF4. Attempt to process all folders at least once | ||
|
||
## Design Summary | ||
|
||
A RunFolderManager class will instatiate objects for local Runfolders, each of which has an associated DNA Nexus project object. The manager loops over the runfolders and deletes them if all checks pass. | ||
|
||
DNA Nexus projects are accessed with the dxpy module, a python wrapper for the DNA Nexus API. Credentials are cached locally using the command-line option '--set-key'. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Workstation Cleaner | ||
|
||
Workstation Cleaner (wscleaner) deletes local directories that have been uploaded to the DNAnexus cloud storage service. | ||
|
||
When executed, Runfolders in the input (root) directory are deleted based on the following criteria: | ||
|
||
* A single DNAnexus project is found matching the runfolder name | ||
* All local FASTQ files are uploaded and in a 'closed' state | ||
* Six logfiles are present in the DNA Nexus project /Logfiles directory | ||
|
||
A DNAnexus API key must be cached locally using the `--set-key` option. | ||
|
||
## Install | ||
|
||
```bash | ||
git clone https://github.com/moka-guys/workstation_housekeeping.git | ||
pip install workstation_housekeeping/wscleaner | ||
wscleaner --version # Print version number | ||
``` | ||
|
||
## Quickstart | ||
|
||
```bash | ||
wscleaner --set-key DNA_NEXUS_KEY # Cache dnanexus api key | ||
wscleaner ROOT_DIRECTORY | ||
``` | ||
|
||
## Usage | ||
|
||
``` | ||
wscleaner [-h] [--set-key SET_KEY] [--print-key] [--dry-run] | ||
[--logfile LOGFILE] [--min-age MIN_AGE] [--version] | ||
root | ||
positional arguments: | ||
root A directory containing runfolders to process | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
--set-key SET_KEY Cache a DNA Nexus API key | ||
--print-key Print the cached DNA Nexus API key | ||
--dry-run Perform a dry run without deleting files | ||
--logfile LOGFILE A path for the application logfile | ||
--min-age MIN_AGE The age (days) a runfolder must be to be deleted | ||
--version Print version | ||
``` | ||
|
||
## Test | ||
|
||
```bash | ||
# Run from the cloned repo directory after installation | ||
pytest . --auth_token DNA_NEXUS_KEY | ||
``` | ||
|
||
## License | ||
|
||
Developed by Viapath Genome Informatics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
from setuptools import setup, find_packages | ||
|
||
setup(name='wscleaner', | ||
version='1.0', | ||
description='Package to remove uploaded runfolders from \ | ||
the Viapath Genome Informatics NGS workstation', | ||
url='https://github.com/NMNS93/wscleaner', | ||
author='Nana Mensah', | ||
author_email='[email protected]', | ||
license='MIT', | ||
packages=find_packages(), | ||
zip_safe=False, | ||
|
||
python_requires = '>=3.6.8', | ||
install_requires = ['docutils>=0.3', 'dxpy==0.279.0', 'pytest==4.4.0', 'pytest-cov==2.6.1', | ||
'Sphinx==2.0.1', 'psutil==5.6.1'], | ||
|
||
package_data = {}, | ||
|
||
entry_points={ | ||
'console_scripts': 'wscleaner = wscleaner.main:main' | ||
} | ||
|
||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
"""conftest.py | ||
Config for pytest. | ||
""" | ||
import pytest | ||
import pathlib | ||
|
||
def pytest_addoption(parser): | ||
"""Add command line options to pytest""" | ||
parser.addoption("--auth_token", action="store", default=None, help="A DNANexus authentication key") | ||
|
||
@pytest.fixture | ||
def auth_token(request): | ||
"""Create pytest fixture from command line argument for authentication token""" | ||
return request.config.getoption("--auth_token") | ||
|
||
@pytest.fixture(scope="session") | ||
def data_test_runfolders(): | ||
"""A fixture that returns a list of tuples containing (runfolder_name, fastq_list_file).""" | ||
return [ | ||
('190408_NB551068_0234_AHJ7MTAFXY_NGS265B', 'test/test_dir_1_fastqs.txt'), | ||
('190410_NB551068_0235_AHKGMGAFXY_NGS265C', 'test/test_dir_2_fastqs.txt') | ||
] | ||
|
||
@pytest.fixture(scope="session", autouse=True) | ||
def create_test_dirs(request, data_test_runfolders): | ||
"""Create test data for testing. | ||
This is an autouse fixture with session scope, meaning it is run once before any tests are collected. | ||
""" | ||
for runfolder_name, fastq_list_file in data_test_runfolders: | ||
# Create the runfolder directory as per Illumina spec | ||
test_path = f'test/data/{runfolder_name}/Data/Intensities/BaseCalls' | ||
pathlib.Path(test_path).mkdir(parents=True, exist_ok=True) | ||
# Generate empty fastqfiles in runfolder | ||
with open(fastq_list_file) as f: | ||
fastq_list = f.read().splitlines() | ||
for fastq_file in fastq_list: | ||
pathlib.Path(test_path, fastq_file).touch(mode=777, exist_ok=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
============================= test session starts ============================== | ||
platform linux -- Python 3.6.8, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 | ||
rootdir: /home/nana/Documents/MOKAGUYS/wscleaner | ||
plugins: cov-2.6.1 | ||
collected 9 items | ||
|
||
test/test_all.py ......... [100%] | ||
|
||
----------- coverage: platform linux, python 3.6.8-final-0 ----------- | ||
Name Stmts Miss Cover | ||
-------------------------------------------------- | ||
wscleaner/__init__.py 0 0 100% | ||
wscleaner/auth.py 35 14 60% | ||
wscleaner/lib.py 101 6 94% | ||
wscleaner/main.py 43 26 40% | ||
wscleaner/mokaguys_logger.py 10 5 50% | ||
-------------------------------------------------- | ||
TOTAL 189 51 73% | ||
|
||
|
||
========================== 9 passed in 44.68 seconds =========================== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
"""generate.py | ||
Generates dummy data for testing. | ||
""" | ||
|
||
import pathlib | ||
|
||
def data_test_runfolders(): | ||
"""A fixture that returns a list of tuples containing (runfolder_name, fastq_list_file).""" | ||
return [ | ||
('190408_NB551068_0234_AHJ7MTAFXY_NGS265B', 'test/test_dir_1_fastqs.txt'), | ||
('190410_NB551068_0235_AHKGMGAFXY_NGS265C', 'test/test_dir_2_fastqs.txt') | ||
] | ||
|
||
def create_test_dirs(test_data): | ||
"""Create test data for testing. | ||
This is an autouse fixture with session scope, meaning it is run once before any tests are collected. | ||
""" | ||
for runfolder_name, fastq_list_file in test_data: | ||
# Create the runfolder directory as per Illumina spec | ||
test_path = f'test/data/{runfolder_name}/Data/Intensities/BaseCalls' | ||
pathlib.Path(test_path).mkdir(parents=True, exist_ok=True) | ||
# Generate empty fastqfiles in runfolder | ||
with open(fastq_list_file) as f: | ||
fastq_list = f.read().splitlines() | ||
for fastq_file in fastq_list: | ||
pathlib.Path(test_path, fastq_file).touch(mode=777, exist_ok=True) | ||
|
||
create_test_dirs(data_test_runfolders()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
import pytest | ||
import dxpy | ||
from pathlib import Path | ||
import argparse | ||
import json | ||
import sys | ||
import shutil | ||
|
||
from pkg_resources import resource_filename | ||
from wscleaner.auth import SetKeyAction, dx_set_auth, CONFIG_FILE | ||
from wscleaner.main import cli_parser | ||
from wscleaner.lib import RunFolderManager, RunFolder | ||
|
||
# AUTH: Set DNAnexus authentication for tests | ||
def test_auth(auth_token): | ||
"""Test that an authentication token is passed to pytest as a command line argument""" | ||
assert auth_token is not None | ||
|
||
@pytest.fixture(autouse=True) | ||
def set_auth(auth_token): | ||
"""Set the authenticatino token for all subsequent tests""" | ||
dx_set_auth(auth_token) | ||
|
||
|
||
# FIXTURES: Define functions to use in downstream tests | ||
@pytest.fixture | ||
def rfm(): | ||
"""Return an instance of the runfolder manager with the test/data directory""" | ||
test_path = Path(str(Path(__file__).parent), 'data') | ||
rfm = RunFolderManager(str(test_path)) | ||
return rfm | ||
|
||
@pytest.fixture | ||
def rfm_dry(): | ||
"""Return an instance of the runfolder manager with the test/data directory""" | ||
test_path = Path(str(Path(__file__).parent), 'data') | ||
rfm_dry = RunFolderManager(str(test_path), dry_run=True) | ||
return rfm_dry | ||
|
||
# TESTS | ||
class TestAuth: | ||
def test_set_auth(self, auth_token): | ||
"""test that the authentication token is set correctly""" | ||
authobj = dx_set_auth(auth_token) | ||
assert dxpy.SECURITY_CONTEXT['auth_token'] == auth_token | ||
|
||
def test_setkey(self, monkeypatch, auth_token): | ||
"""test that the --set-key command-line argument caches the authentication token""" | ||
# Set setkey cli arguments | ||
sys.argv = ['python', 'wscleaner', '--set-key', auth_token] | ||
# Mock Action object | ||
# Parse args | ||
with pytest.raises(SystemExit) as err: | ||
args = cli_parser() | ||
# Make assertions on created config file | ||
fn = resource_filename('wscleaner',CONFIG_FILE) | ||
with open(fn, 'r') as f: | ||
assert auth_token in f.read() | ||
# Delete temp config | ||
Path(fn).unlink() | ||
|
||
class TestFolders: | ||
def test_runfolders_ready(self, data_test_runfolders, rfm): | ||
"""Test that runfolders in the test directory pass checks for deletion. Est. 20 seconds.""" | ||
for runfolder in rfm.find_runfolders(min_age=0): | ||
assert all([runfolder.dx_project, rfm.check_fastqs(runfolder), rfm.check_logfiles(runfolder)]) | ||
|
||
def test_find_fastqs(self, data_test_runfolders): | ||
"""Tests the correct number of fastqs are present in local and uploaded directories""" | ||
for runfolder_name, fastq_list_file in data_test_runfolders: | ||
rf = RunFolder(Path('test/data', runfolder_name)) | ||
with open(fastq_list_file) as f: | ||
test_folder_fastqs = len(f.readlines()) | ||
assert len(rf.find_fastqs()) == test_folder_fastqs | ||
assert len(rf.dx_project.find_fastqs()) == test_folder_fastqs | ||
|
||
def test_min_age(self, rfm): | ||
"""test that the runfolder age function records age""" | ||
runfolders = rfm.find_runfolders(min_age=10) | ||
# Asser that this test runfolder was recently generated | ||
assert all([ rf.age > 10 for rf in runfolders ]) | ||
|
||
class TestRFM: | ||
def test_find_runfolders(self, data_test_runfolders, rfm): | ||
"""test the runfolder manager directory finding function""" | ||
rfm_runfolders = rfm.find_runfolders(min_age=0) | ||
runfolder_names = [str(folder.path.name) for folder in rfm_runfolders] | ||
test_runfolder_names = [ rf for rf, fastq_list_file in data_test_runfolders ] | ||
runfolders_bools = [ item in runfolder_names for item in test_runfolder_names ] | ||
assert all(runfolders_bools) | ||
|
||
def test_validate(self, rfm): | ||
"""test the runfoldermanager _validate function correctly reads the path""" | ||
assert rfm.root.name == Path(str(Path(__file__).parent), 'data').name | ||
|
||
def test_delete(self, monkeypatch, rfm): | ||
"""test that the runfolder manager delete call creates the log of deleted files. | ||
Here, the pytest monkeypatch fixture is used to overwrite the delete function and persist the test directories. | ||
""" | ||
test_folder = rfm.find_runfolders(min_age=0)[0] | ||
monkeypatch.setattr(shutil, 'rmtree', lambda x: 'TEST_DELETED') | ||
rfm.delete(test_folder) | ||
assert test_folder.name in rfm.deleted | ||
|
||
def test_dry_run(self, rfm_dry): | ||
"""test that the dry_run option does not cause the test directory to be deleted""" | ||
test_folder = rfm_dry.find_runfolders(min_age=0)[0] | ||
rfm_dry.delete(test_folder) | ||
assert test_folder.name not in rfm_dry.deleted |
Empty file.
Oops, something went wrong.