Multiplex PCR design, in silico
multiply
is a command-line tool enabling the design of multiplexed PCRs for a user-specified set of target genes and/or regions. It works by first producing a set of candidate primers for each target using primer3 (multiply generate
). It then computes the number of SNPs in each primer (multiply snpcheck
); potential dimers betweeen pairs of primers (multiply align
); and potential mispriming and off-target amplicons for each primer (multiply blast
). Information from these three quality control steps is passed to a cost function, which is minimised by brute-force or with a greedy search algorithm (multiply select
).
The pipeline is summarised below:
First, clone the repository to your local machine:
git clone https://github.com/JasonAHendry/multiply
Then, install the software dependencies using conda:
cd multiply
conda update conda
conda env create -f environments/run.yml
conda activate multiply-run
Finally, install multiply
itself with pip:
pip install -e .
Test installation by running:
multiply
To generate a new multiplex PCR, you first need to download the reference genome (FASTA) and information about gene locations (GFF) for your target organism. To see what organisms are available for download
multiply download --available
Organisms are specified by a GenusSpecies keyword; e.g. PlasmodiumFalciparum or AnophelesGambiae. For example, one would download information about Plasmodium falciparum by running:
multiply download -g PlasmodiumFalciparum
The next step is to specify your target genes and/or region(s) with a design file. Examples of design files can be seen in the /designs
folder. Genes are specified by a comma-separated list of gene identifiers (the target_ids
field in the design file). Regions are specified by a separate BED file, with a fourth column that gives a unique name to each region (the bed
field).
Candidate primers are generated using primer3. A collection of primer3 settings are available as JSON files in the folder settings/primer3
multiply
by passing them as a comma-separated list to the primer3_settings
field of the design file.
After your target organism is downloaded and your design file is prepared, you can run the complete multiply
pipeline with the following command:
multiply pipeline -d designs/<your-design-file.ini>
For multiplexes of moderate size (e.g. <20 targets), running the pipeline will typically take a few minutes. A directory containing results will be produced in the results
directory, within sub-folder whose name is specified in your design file (the name
field).
genomes/collection.ini
. Any organism available from PlasmoDB, EnsemblGenomes or RefSeq Genome can be added to the collection.
settings/primer3
folder.
multiply
uses the following external software and databases:
primer3
. Individual primer pair design. https://primer3.org/bedtools
. Genome arithmetic. https://bedtools.readthedocs.io/en/latest/blastn
. Local alignment search. https://blast.ncbi.nlm.nih.gov/Blast.cgi- PlasmoDB. Plasmodium reference genome. http://plasmodb.org/plasmo/
- MalariaGEN. Plasmodium genetic diversity. https://www.malariagen.net/data
- EnsemblGenomes. Additional reference genomes. https://ensemblgenomes.org/
- RefSeq Genome. Additional reference genomes. https://www.ncbi.nlm.nih.gov/genome/
Primer dimer detection uses an alignment algorithm similar to the one described by Johnston et al. (2019).
We have a preprint available on bioRxiv.
My thanks to Nada Kubikova, who gave helpful advice on primer design for multiplex PCR; and to Dan J. Bridges, Gavin Band, Mulenga Mwenda, and Annie Forster, who tested various versions multiply
.
This work was funded by the Bill and Melinda Gates Foundation (INV-003660).