VirClust is a bioinformatics tool which can be used for:
• virus clustering
• protein annotation
• core protein calculation
At its core is the grouping of viral proteins into clusters of three different levels:
• at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.
• at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.
• at the third, still experimental level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.
More about the how it works can be read here DOI: 10.1101/2021.06.14.448304.
You can run VirClust as a web-service, by going to 'VirClust WEB' tab. Alternatively, you can download and install VirClust on your own servers, either as a singularity container or directly the source-code, in its own Conda environment. The singularity container can be downloaded from the 'Downloads' tab. The source code and the corresponding Conda environment can be downloaded from the VirClust github repository.
If you are using VirClust, please cite the following publication: • Moraru, C. (2023) VirClus - A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (Prokaryotic) Viruses, Viruses 15(4), pp 1007 DOI: 10.3390/v15041007.
Additionally, if you are performing viral protein annotations using VirClust, please also cite the respective databases used for the annotations, see VirClust manuscript for the complete citations.
I've developed VirClust during my possition as Senior Scientist in the Department of The Biology of Geological Processes, at the Institute for Chemistry and Biology of the Marine Environment, Germany. The VirClust web-site and web-service are hosted at this institution.
My current possition is that of Senior Scientist in the Environmental Metagenomics Department, at the Research Center One Health Ruhr of the University Alliance Ruhr, Essen, Germany.
Please post any VirClust related problems or feature requests here in the Issues section of the correponding github repository.
Minimum number of genomes:
• 3 for genome clustering (steps 4-7)
• 1 for only protein clusters and annotations
Accepted input formats: .fasta, .fna or .fa
Sequence names should contain at least one letter.
** no multifasta files here
* Mandatory for new projects
Each project you create is given a project ID and can be accessed at a later time point, as long as you have performed any calculations (basically, pressed the “Run” button in the next tab). VirClust calculations can take a long time and the browser can disconnect from the server. Save the project ID, to be able to access the results later.
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
AND
AND
AND
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
AND
AND
AND
*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.
*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.
I've developed VirClust during my possition as Senior Scientist in the Department of The Biology of Geological Processes, at the Institute for Chemistry and Biology of the Marine Environment, Germany. The VirClust web-site and web-service are hosted at this institution.
My current possition is that of Senior Scientist in the Environmental Metagenomics Department, at the Research Center One Health Ruhr of the University Alliance Ruhr, Essen, Germany.
Please post any VirClust related problems or feature requests here in the Issues section of the correponding github repository.
For the annotation of viral genomes, VirClust relies on several databases previously published. The InterProScan and the BLASTNR database should be installed by the user as described in the manual for the VirClust v2 stand-alone version. The other databases (Efam, Efam_XC, PHROG, pVOGs and VOGDB) need to be in a format specific for VirClust and they can be downloaded below. For each database used, please cite the original publications describing the respective databases.
Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.
Zayed, A.A., Lücking, D., Mohssen, M., Cronin, D., Bolduc, B., Gregory, A.C., Hargreaves, K.R., Piehowski, P.D., White, R.A., Huang, E.L., Adkins, J.N., Roux, S., Moraru, C., and Sullivan, M.B. (2021) efam: an expanded, metaproteome-supported HMM profile database of viral protein families. Bioinformatics (Oxford, England), doi: 10.1093/bioinformatics/btab451.
Terzian, P., Olo Ndela, E., Galiez, C., Lossouarn, J., Pérez Bucio, R.E., Mom, R., Toussaint, A., Petit, M.-A., and Enault, F. (2021) PHROG: families of prokaryotic virus proteins clustered using remote homology. NAR genomics and bioinformatics, doi: 10.1093/nargab/lqab067.
Kiening, M., Ochsenreiter, R., Hellinger, H.-J., Rattei, T., Hofacker, I., and Frishman, D. (2019) Conserved Secondary Structures in Viral mRNAs. Viruses, doi: 10.3390/v11050401.
Grazziotin, A.L., Koonin, E.V., and Kristensen, D.M. (2017) Prokaryotic virus orthologous groups (pVOGs). A resource for comparative genomics and protein family annotation. Nucleic acids research, doi: 10.1093/nar/gkw975.
I've developed VirClust during my possition as Senior Scientist in the Department of The Biology of Geological Processes, at the Institute for Chemistry and Biology of the Marine Environment, Germany. The VirClust web-site and web-service are hosted at this institution.
My current possition is that of Senior Scientist in the Environmental Metagenomics Department, at the Research Center One Health Ruhr of the University Alliance Ruhr, Essen, Germany.
Please post any VirClust related problems or feature requests here in the Issues section of the correponding github repository.