VirClust is now oppened for community feedback

This is a new and higly improved version of VirClust. Testing is in progress. Feedback from the user community is more than welcomed. Please email me at liliana.cristina.moraru( at )uol.de if you experience any problems or you have suggestions for improvements.

The stand-alone version, the associated databases and the manuals available on the Download page.

Whats is VirClust?

VirClust is a bioinformatics tool which can be used for:

• virus clustering

• protein annotation

• core protein calculation

At its core is the grouping of viral proteins into clusters of three different levels:

• at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.

• at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.

• at the third, still experimental level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.

More about the how it works can be read here DOI: 10.1101/2021.06.14.448304.

How to cite the use of VirClust?

If you are using VirClust, please cite the following pre-print publication: • Moraru, Cristina (2021): VirClust, a tool for hierarchical clustering, core gene detection and annotation of (prokaryotic) viruses. In BioRxiv. DOI: 10.1101/2021.06.14.448304.

Additionally, if you are performing viral protein annotations using VirClust, please also cite the respective databases used for the annotations, see VirClust manuscript for the complete citations

Developer - Cristina Moraru, PhD

Senior Scientist, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de

CREATE NEW PROJECT

Minimum number of genomes:

• 3 for genome clustering (steps 3A, 4A, 2B, 3B, 2C and 3C)

• 1 for only protein clusters and annotations

Other input requirements:

• Accepted input formats: .fasta, .fna or .fa

• Sequence names should contain at least one letter.

• Each contig should be long enough to predict at least one ORF.

** no multifasta files here

* Mandatory for new projects

LOAD EXISTING PROJECT

Info board

Each project you create is given a project ID and can be accessed at a later time point, as long as you have performed any calculations (basically, pressed the “Run” button in the next tab). VirClust calculations can take a long time and the browser can disconnect from the server. Save the project ID, to be able to access the results later.

PROTEIN CLUSTERING

Step 1A. Genomes to Proteins

Download results

Step 2A. Proteins to Protein Clusters (PCs)

Remove matches if

Download results

GENOME CLUSTERING

Step 3A. Order genomes hierarchically

Download results

Plot intergenomic similarities

Download results

Step 4A. Calculate stats and split in genome clusters (VGCs)

Download results

*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.

*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.

Output genome clustering PDF

Other options

Download results

CORE PROTEINS

Step 5A. Calculate core proteins for each VGC, based on PCs

Download results

PROTEIN ANNOTATIONS

Step 6A. Annotate proteins

Query the InterPro database using InterProScan
Query the pVOGs database using hhsearch
Query the VOGDB database using hhsearch
Query the PHROG database using hhsearch
Query the Efam database using hmmscan
Query the Efam-XC database using hmmscan
Query the NCBI database using BLASTP

Merge annotation tables

PROTEIN CLUSTERING

Step 1B. PCs to Protein Superclusters (PSCs)

Keep matches if ...
conditional 1 is true

AND

OR
conditional 2 is true:

AND

AND

Download results

GENOME CLUSTERING

Step 2B. Order genomes hierarchically

Download results

Plot intergenomic similarities

Download results

Step 3B. Calculate stats and split in genome clusters (VGCs)

Download results

*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.

*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.

Output genome clustering PDF

Other options

Download results

CORE PROTEINS

Step 4B. Calculate core proteins for each VGC, based on PSCs
Download results

PROTEIN ANNOTATIONS

Step 5B. Annotate proteins

Query the InterPro database using InterProScan
Query the pVOGs database using hhsearch
Query the VOGDB database using hhsearch
Query the PHROG database using hhsearch
Query the Efam database using hmmscan
Query the Efam-XC database using hmmscan
Query the NCBI database using BLASTP

Merge annotation tables

PROTEIN CLUSTERING

Step 1C. PSCs to Protein Super-superclusters (PSSCs)

Keep matches if ...
conditional 1 is true

AND

OR
conditional 2 is true:

AND

AND

Download results

GENOME CLUSTERING

Step 2C. Order genomes hierarchically

Download results

Plot intergenomic similarities

Download results

Step 3C. Calculate stats and split in genome clusters (VGCs)

Download results

*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.

*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.

Output clustering PDF

Other options

Download results

Step 4C. CORE PROTEINS

Calculate core proteins for each VGC, based on PSSCs
Download results

PROTEIN ANNOTATIONS

Step 5C. Annotate proteins

Query the InterPro database using InterProScan
Query the pVOGs database using hhsearch
Query the VOGDB database using hhsearch
Query the PHROG database using hhsearch
Query the Efam database using hmmscan
Query the Efam-XC database using hmmscan
Query the NCBI database using BLASTP

Merge annotation tables

Developer - Cristina Moraru, PhD

Senior Scientist, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de

VirClust v2 web-server

VirClust v2 stand-alone

Annotation databases for VirClust stand-alone

For the annotation of viral genomes, VirClust relies on several databases previously published. The InterProScan and the BLASTNR database should be installed by the user as described in the manual for the VirClust v2 stand-alone version. The other databases (Efam, Efam_XC, PHROG, pVOGs and VOGDB) need to be in a format specific for VirClust and they can be downloaded below. For each database used, please cite the original publications describing the respective databases.

Download database ...

Publication to cite ...

Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.

Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.

Terzian, P., Olo Ndela, E., Galiez, C., Lossouarn, J., Pérez Bucio, R.E., Mom, R., Toussaint, A., Petit, M.-A., and Enault, F. (2021) PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR genomics and bioinformatics, doi: 10.1093/nargab/lqab067.

Kiening, M., Ochsenreiter, R., Hellinger, H.-J., Rattei, T., Hofacker, I., and Frishman, D. (2019) Conserved Secondary Structures in Viral mRNAs. Viruses, doi: 10.3390/v11050401.

Grazziotin, A.L., Koonin, E.V., and Kristensen, D.M. (2017) Prokaryotic virus orthologous groups (pVOGs). A resource for comparative genomics and protein family annotation. Nucleic acids research, doi: 10.1093/nar/gkw975.

Developer - Cristina Moraru, PhD

Senior Scientist, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de