Gene-PROBER - introduction

Gene-PROBER designs polynucleotide probe mixtures for the detection of a single target sequence. These probe mixtures consist of multiple, non-overlapping polynucleotides distributed across the length of the target region. The selected polynucleotides have similar melting properties, and thus, can be used in as a single hybridization probe. Such polynucleotide probe mixtures are required for gene detection in microbial cells by direct-geneFISH, phageFISH, geneFISH and geneELISA.

In a series of steps, gene-PROBER guides the user through the probe design process, from similarity searches between the target and non-targets, to the selection of polynucleotides with similar melting temperatures and of their primers (when the probes will be synthesized by PCR).

Gene-PROBER can be used whenever single targets need to be detected, as for example:

- lytic or temperate viruses infecting bacterial or archaeal cells

- plasmids or chromosomal regions in bacteria or archaea

Gene-PROBER is not meant for the design of probes to detect multiple targets, e.g. multiple alleles of a single gene. Once the probe mix has been designed for a single target, the coverage across the other alleles can be checked with PolyPro software. For more details about designing polynucleotide probes for multiple alleles, check the following publication:

Moraru, C., Moraru, G., Fuchs, B. & Amann, R. Concepts and software for a rational design of polynucleotide probes. Environmental Microbiology Reports 3, 69-78 (2011).

For a detailed description of the gene and phage hybridization methods, please read the following publications:

Moraru, C., Lam, P., Fuchs, B., Kuypers, M. & Amann, R. GeneFISH - an in situ technique for linking gene presence and cell identity in environmental microorganisms. Environmental Microbiology 12, (2010).

Allers, E. et al. Single-cell and population level viral infection dynamics revealed by phageFISH, a method to visualize intracellular and free viruses. Environmental microbiology 15, 2306–18 (2013).

Barrero-Canosa, J., Moraru, C., Zeugner, L., Fuchs, B. & Amann, R. Direct-geneFISH: a simplified protocol for the simultaneous detection and quantification of genes and rRNA in microorganisms. Environmental microbiology, (2016), doi:10.1111/1462-2920.13432.

If you are using gene-PROBER, please cite the following: TO BE UPDATED. The reference will be updated immediately after online publication.

If you have suggestions for gene-PROBER or if you encounter problems during probe design, please email to: liliana.cristina.moraru( at )uni-oldenburg.de and mention gene-PROBER in the subject

Developer - Cristina Moraru, PhD

Research Associate, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de

Gene-PROBER - user manual

Things to know before you start

1. You should read the manual - if you have come this far, you are on the right page.

2. Due to unstable internet connection, the website can disconnect from the server. To prevent data loss, save the PROJECT ID generated by gene-PROBER after step 1 (if you fill in an email address, the PROJECT ID will be sent to you). When the connection is lost, re-load the website by typing its address in the browser and loading the project using the PROJECT ID.

STEP 1 - Project data

In this step, a new project can be created or an existing project can be loaded.

To CREATE A NEW PROJECT, the following points are important:

Define the workflow:

At this step the user will chose amongst 4 possible probe design workflows by marking or unmarking two checkboxes (“Non-target sequences available” and “Probe synthesis by PCR”):

1. probes to be synthesized chemically, only target sequence available

- workflow: step 1 -> step 3 -> step 4a -> step 5

2. probes to be synthesized chemically, both target and non-target sequences available

- mark checkbox 'Non-target sequences available'

- workflow: step 1 -> step 2 (optional) -> step 3 -> step 4a -> step4c -> step 5

3. probes to be synthesized by PCR, only target sequence available

- workflow: step 1 -> step 3 -> step 4a -> step 4b -> step 4d -> step 5

4. probes to be synthesized by PCR, both target and non-target sequences available

- mark checkboxes “Non-target sequences available” and “Probe synthesis by PCR”

- workflow: step 1 -> step 2 (optional) -> step 3 -> step 4a -> step 4b -> step 4c -> step 4d -> step 5

When “Non-target sequences available” checkbox is marked:

- gene-PROBER will use BLASTn to search for similarities in between the target and non-targets in steps 2 and 4c, 4d (if the checkbox in not marked, these steps are not visible).

- a multi-FASTA file containing all sequences (target and non-target) needs to be loaded and the following information needs to be given about the target: contig name, start and end positions in contig (if the checkbox in not marked, these parameters are not visible).

When “Probe synthesis by PCR” checkbox is marked:

- gene-PROBER will calculate primer pairs for each polynucleotide in steps 4b and 4d (if the checkbox in not marked, these steps are not visible).

What are non-target sequences?

Non-target sequences are all the other DNA sequences which can be found in the FISH samples and which are not desired to be detected by the gene probes. For example, in the case of phageFISH targeting a lytic phage in an infected culture, the non-target sequences are represented by the host genome, while the target sequence is represented by the phage genome.

File import:

FASTA files without spaces in the comment field!!!

Target: a FASTA file with a single entry (single sequence). Multiple entries (fragmented targets) are not supported.

ALL sequences: a FASTA file with one or multiple entries. The target should be found only on one contig (one entry). Multiple entries for the non-targets (e.g. draft genomes, or multiple genomic elements) are accepted. There are cases when the target is found on the same contig as non-target sequences. For example, when targeting a prophage, the target sequence (the prophage genomic region) will be found within the non-target sequences as well (the host genome). In this cases, the user should NOT give the target sequence as a separate entry in the multi FASTA file. Whenever non-target sequences are given, the program always requires the name of the contig on which the target is found, plus the start and the end position relative to the respective contig. The program will take this information in consideration when checking for similarities between target and non-target sequences.

Pressing the 'CREATE NEW PROJECT' button:

Works only if all required inputs are given:

- user name and project name are necessary to give a unique ID to the new project. They have to be at least 6 characters in length.

- user email is required as input only if the checkbox “Check this box if you want to receive email updates about gene-PROBER.” is marked. Also, fill in your email adress if you whish to receive an email with the PROJECT ID for later use (to load the results of the project) and to be notified when calculations have finished. The email address is stored only if the checkbox “Check this box if you want to receive email updates about gene-PROBER.” is marked and only for the purpose of sending gene-PROBER updates to the users.

- all files have finished uploading

Pressing the 'RESET PROJECT' link:

Will reload the website and prepare the page for a new project.

To LOAD an EXISTING PROJECT, the following points are important:

Once the project has been created and the user has saved the PROJECT ID, the user can interrupt the probe design process (e.g. by closing the internet browser) and resume the project later, using the LOAD PROJECT button. The project can be interrupted even if gene-PROBER is in the middle of calculating one of the steps. The calculations will continue on the server and the results will be available later, at project load. Some of the steps required for probe design can require a long time for calculations, depending on the input parameters. In this case, or when the internet connection is unstable, the browser itself will disconnect from gene-PROBER. Again, the calculations will continue on the server and the projects can be re-loaded similarly, by using the PROJECT ID.

Projects which have been completed (step 5 calculated) can be loaded as well, again using the PROJECT ID. The user can then download the final data, or decide to re-run some steps, if the results are not satisfactorily

Pressing the 'LOAD PROJECT' button:

- works only if a valid PROJECT ID has been given (e.g. from a project which has been previously processed by gene-PROBER. All projects older than 30 days will be deleted from the server, so make sure to collect your data in time).

- will load gene-PROBER projects which can be in different stages of completion (anywhere in between steps 2 and 5), including a completely finished project. The status of the project, that is which step was the last one and if it has finished calculating or is still running, will be displayed at project load in the Project INFO tab

STEP 2 - Similarity searches in between target and non-target sequences

The purpose of the probe design is to find a mixture of polynucleotides which will bind only to the target genome, and not to the rest of the DNA sequences found in the hybridized sample. If both target and non-target sequences are known, a BLASTn search can be performed using the target sequence as query and the non-target as database. The results from the BLASTn search will be displayed in the 'Step 2 - BLASTn results' TabPanel. Table 1 lists all the BLASTn matches, including match position in target and non-target sequences. At least one match is expected: that of the target with itself. This match is easily identifiable from the parameters contig name ('subject' column) and on contig start ('start in subject' column) and end ('end in subject' column) positions. If the non-target sequences are not available, this panel will be inactive and the user should move to the next steps.

STEP 3 - Generate polynucleotides

The program will generate all possible polynucleotides along the target region. There are two parameters which the user should set:

- target region range, should be a region without significant sequence similarities with non-targets, as indicated by the BLASTn results from step 2.

- length of individual polynucleotides, recommended 300 bps.

Visual output: a graph with the %GC for each polynucleotide plotted along the target genome.

STEP 4 - Remove polynucleotides

From the pool of all polynucleotides, the program will remove the ones unsuited to be probes. It is using 4 removal criteria, in a stepwise manner. Each removal step will generate a graph with the %GC of the remaining polynucleotides plotted along the target genome.

4a. Remove polynucleotides based on the %GC of individual polynucleotides

The polynucleotides in a probe mix need to have the difference in %GC as small as possible, preferably not bigger than 5-10%. Furthermore, if possible, %GC higher than 70% should be avoided. Using the %GC plot from step 3 as a guide, the user has to choose a %GC range for polynucleotides in the probe mix. All the polynucleotides outside this range will be removed.

4b. Remove polynucleotides based on primer features

Potential primers of 18-22 nucleotides will be generated for the ends of each polynucleotide remaining after step 4a. Primer3 will then be used to calculate different primer features, as for example %GC, Tm of the primer itself, Tm of the possible primer dimers and hairpins. Only the primer pairs meeting parameters set by the user will be kept. All polynucleotides without a suitable primer pair will be removed.

Parameters of the PCR reaction:

- dNTP_conc (mM): The millimolar concentration of the sum of all deoxyribonucleotide triphosphates in the PCR reaction.

- primer conc (nM): The nanomolar (nM) concentration of each primer in the PCR reaction.

- Divalent ions conc (mM): The millimolar concentration of divalent salt cations (usually MgCl^(2+)) in the PCR reaction.

- Monovalent ions conc (mM): The millimolar (mM) concentration of monovalent salt cations (usually KCl) in the PCR reaction.

Parameters for primer selection:

- max (Fwd primer Tm - Rev primer Tm) (°C): The maximum temperature difference in between the Tm of the two primers in a pair. All primer pairs with a Tm difference bigger than this will be removed.

- min (primer Tm - dimer/hairpin Tm) (°C): the minimum temperature difference in between the Tm of a primer and the Tm of various dimers (dimers of the same primer or dimers of the two primers in the pair) and hairpins which can be formed by the primers. All primers and primer pairs with a Tm difference smaller than this will be removed.

- max length of homopolymers: The maximum allowable length of a mononucleotide repeat, for example AAAAAA. All primers with a higher number of homopolymers will be removed.

- primer %GC range: the allowed %GC values for the primers. All primers outside this %GC range will be removed.

- primer Tm range: the allowed Tm values for the primers (calculated using Primer3 for the given PCR conditions). All primers outside this Tm range will be removed.

4c. Remove polynucleotides based on polynucleotide specificity

The remaining polynucleotides from the previous steps will be further used as query in a BLASTn search against the ALL sequences database (including the target and non-targets), to detect potential unspecific binding sites. The polynucleotides with unspecific binding sites will be removed. Parameters for this step are:

A = Maximum allowed length of BLASTn alignments with 100% identity

B = Maximum allowed %GC of BLASTn alignments of 15 to A bases length and C percent identity

C = Maximum percent identity allowed for alignments of lengths between 15 and A bases and with B %GC.

D = Maximum percent identity allowed for alignments length of A-30 bases

E = Maximum percent identity allowed for alignments length of 31-40 bases

G = Maximum percent identity allowed for alignments length of 41-50 bases

H = Maximum percent identity allowed for alignments length from 51 bases on

4d. Remove polynucleotides based on primer specificity

For all remaining polynucleotides from the previous steps, the program will run BLASTn searches using all potential primers as query. The database used will be either only the target sequence, if the checkbox 'Target available as PCR template separate from non-targets' is marked, or ALL sequences (target and non-targets), if the checkbox is not marked. Primers with similarities outside the priming region of their respective polynucleotides will be removed. The parameters for removal are:

I = number of bases. This parameter helps to calculate the minimum length of BLASTn alignments which should be checked during this removal step: if alignment length >= (primer length -I), and the number of gaps and mismatches is smaller than parameter J, than the primer will be removed..

J = minimum number of mismatches or gaps allowed in alignments (of length = primer length - I ) of primers with non-targets

K = number of bases. This parameter helps to calculate the minimum length of BLASTn alignments which should be checked during this removal step: if alignment length >= (primer length - K) and < (primer length - I), and if percent identity is 100% and the last base in the alignment is the last base at the 3'end of the primer, than the primer is removed.

STEP5 - Calculate probe mix

After all the removal steps have been performed, the program will calculate a probe mix composed of non-overlapping polynucleotides, with the aim to cover as much from the target as possible.

Visual outputs:

- a graph plotting the distribution along the target genome of all the polynucleotides in the probe mix.

- a graph plotting the negative derivative of the melting curve for each polynucleotide in the probe mix.

- a table with primer pairs for each polynucleotide in the probe mix

- a table with each polynucleotide in the probe mix and its properties.

Downloadable outputs:

- a csv file with all polynucleotides in the probe mix.

- a csv file with primer pairs for each polynucleotide in the probe mix.

- a csv file with the melting profiles (negative derivative of the melting curves) for each polynucleotide in the probe mix.

- a txt file containing all the input parameters used for probe design.

Developer - Cristina Moraru, PhD

Research Associate, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de

Content is loading, please be patient ...