This is a new and higly improved version of VirClust. Testing is in progress. Feedback from the user community is more than welcomed. Please email me at liliana.cristina.moraru( at )uol.de if you experience any problems or you have suggestions for improvements.
The stand-alone version, the associated databases and the manuals available on the Download page.
VirClust is a bioinformatics tool which can be used for:
• virus clustering
• protein annotation
• core protein calculation
At its core is the grouping of viral proteins into clusters of three different levels:
• at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.
• at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.
• at the third, still experimental level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.
More about the how it works can be read here DOI: 10.1101/2021.06.14.448304.
If you are using VirClust, please cite the following pre-print publication: • Moraru, Cristina (2021): VirClust, a tool for hierarchical clustering, core gene detection and annotation of (prokaryotic) viruses. In BioRxiv. DOI: 10.1101/2021.06.14.448304.
Additionally, if you are performing viral protein annotations using VirClust, please also cite the respective databases used for the annotations, see VirClust manuscript for the complete citations
Senior Scientist, Department of The Biology of Geological Processes
Institute for Chemistry and Biology of the Marine Environment
Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany
Email: liliana.cristina.moraru( at )uni-oldenburg.de
Minimum number of genomes:
• 3 for genome clustering (steps 3A, 4A, 2B, 3B, 2C and 3C)
• 1 for only protein clusters and annotations
Other input requirements:
• Accepted input formats: .fasta, .fna or .fa
• Sequence names should contain at least one letter.
• Each contig should be long enough to predict at least one ORF.
** no multifasta files here
* Mandatory for new projects
Each project you create is given a project ID and can be accessed at a later time point, as long as you have performed any calculations (basically, pressed the “Run” button in the next tab). VirClust calculations can take a long time and the browser can disconnect from the server. Save the project ID, to be able to access the results later.
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
AND
AND
AND
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
AND
AND
AND
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
Senior Scientist, Department of The Biology of Geological Processes
Institute for Chemistry and Biology of the Marine Environment
Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany
Email: liliana.cristina.moraru( at )uni-oldenburg.de
For the annotation of viral genomes, VirClust relies on several databases previously published. The InterProScan and the BLASTNR database should be installed by the user as described in the manual for the VirClust v2 stand-alone version. The other databases (Efam, Efam_XC, PHROG, pVOGs and VOGDB) need to be in a format specific for VirClust and they can be downloaded below. For each database used, please cite the original publications describing the respective databases.
Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.
Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.
Terzian, P., Olo Ndela, E., Galiez, C., Lossouarn, J., Pérez Bucio, R.E., Mom, R., Toussaint, A., Petit, M.-A., and Enault, F. (2021) PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR genomics and bioinformatics, doi: 10.1093/nargab/lqab067.
Kiening, M., Ochsenreiter, R., Hellinger, H.-J., Rattei, T., Hofacker, I., and Frishman, D. (2019) Conserved Secondary Structures in Viral mRNAs. Viruses, doi: 10.3390/v11050401.
Grazziotin, A.L., Koonin, E.V., and Kristensen, D.M. (2017) Prokaryotic virus orthologous groups (pVOGs). A resource for comparative genomics and protein family annotation. Nucleic acids research, doi: 10.1093/nar/gkw975.
Senior Scientist, Department of The Biology of Geological Processes
Institute for Chemistry and Biology of the Marine Environment
Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany
Email: liliana.cristina.moraru( at )uni-oldenburg.de