VirClust is now oppened for community feedback

VirClust is now in the last testing and development phase, and therefore, it might not always run smoothly. Feedback from the user community is more than welcomed. Please email me at liliana.cristina.moraru( at )uol.de if you experience any problems or you have suggestions for improvements.

The stand-alone version is in work at the moment and not yet available for download.

Whats is VirClust?

VirClust is a bioinformatics tool which can be used for:

• virus clustering

• protein annotation

• core protein calculation

At its core is the grouping of viral proteins into clusters of three different levels:

• at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.

• at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.

• at the third, still experimental level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.

More about the how it works can be read here DOI: 10.1101/2021.06.14.448304.

How to cite the use of VirClust?

If you are using VirClust, please cite the following pre-print publication: • Moraru, Cristina (2021): VirClust, a tool for hierarchical clustering, core gene detection and annotation of (prokaryotic) viruses. In BioRxiv. DOI: 10.1101/2021.06.14.448304.

Additionally, if you are performing viral protein annotations using VirClust, please also cite the respective databases used for the annotations, see VirClust manuscript for the complete citations

Developer - Cristina Moraru, PhD

Senior Scientist, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de

CREATE NEW PROJECT

Minimum number of genomes:

• 3 for genome clustering (steps 4-7)

• 1 for only protein clusters and annotations

Accepted input formats: .fasta, .fna or .fa

** no multifasta files here

* Mandatory for new projects

LOAD EXISTING PROJECT

Info board

Each project you create is given a project ID and can be accessed at a later time point, as long as you have performed any calculations (basically, pressed the “Run” button in the next tab). VirClust calculations can take a long time and the browser can disconnect from the server. Save the project ID, to be able to access the results later.

Our server is currently experiencing a high load. This can interrupt some VirClust processes. If more than one day has passed after you have started your VirClust project and you have not received an email that it has finished, please email me. I will manually reset you project, and you will be able to access the data already calculated and to continue with further calculations. Meanwhile, we are working on a more sustainable solution.

PROTEIN CLUSTERING

Remove matches if

Remove matches if (1st step):
Remove matches if (2nd step):

Remove matches if (1st step):
Remove matches if (2nd step):

GENOME CLUSTERING

...based on PCs
...based on PSCs
...based on PSSCs

*The clustering distance is minimum 0.1 and maximum 1. The higher the value, the lower the number of clusters resulted. At a value of 1, all genomes will belong to the same cluster.

*Known issues: If the chosen clustering distance results in each genome forming its own VGC, then the output PDF will be empty. To solve this problem: increase the clustering distance progressively and recalculate steps 5 and 6.

PDF OPTIONS for 6a and/or 6b

Other options

CORE PROTEINS

PROTEIN ANNOTATIONS

GENOME CLUSTERING

...based on PCs
4a. Order genomes hierarchically
5a. Split in VGCs and calculate stats
6a. Output PDF
...based on PSCs
4b. Order genomes hierarchically
5b. Split in VGCs and calculate stats
6b. Output PDF
...based on PSSCs
4c. Order genomes hierarchically
5c. Split in VGCs and calculate stats
6c. Output PDF

PROTEIN ANNOTATIONS

Developer - Cristina Moraru, PhD

Senior Scientist, Department of The Biology of Geological Processes

Institute for Chemistry and Biology of the Marine Environment

Carl-von-Ossietzky –Str. 9 -11, D-26111 Oldenburg, Germany

Email: liliana.cristina.moraru( at )uni-oldenburg.de