IntroductionMBioSEQ Ridom Typer implements CheckM2 (citation), a machine learning–based technique to evaluate microbial genome assemblies by identifying intra-species contamination and predicting genome completeness. CheckM2 delivers highly accurate quality predictions for medium- and low-quality genomes, including those from poorly represented lineages. It can even produce reliable results for phyla with only a single genomic representative. CheckM2 uses two machine learning models to predict genome completeness score. A 'general' gradient boosting model is designed for novel or distantly related organisms (e.g., new orders, classes, or phyla) and generalizes well to poorly represented lineages. A 'specific' neural network model provides higher accuracy for genomes closely related to the reference set (e.g., known species, genera, or families), particularly when genomes are less complete. This ensures better accuracy across lineages. Please note: Especially with older Illumina machines there is frequently some low contamination due to inter-run valve and/or flexible tube carry-over. Notable the SKESA assembler handles contaminations of up to 10% very well. Task Entry OverviewThe task entry overview shows the CheckM2 results for the sample. The Intra-species Contamination is color-coded according to the quantitative value of contamination percentage using the following thresholds:
Result FieldsThe task entry stores the following result fields for each sample:
The controlled vocabulary that is shown as legend in the results for classification of draft genome quality based on estimated genome completeness and contamination is the one used by the developers in their first publication (Parks et al. Genome Res. 2015, 25: 1043-55 [PubMed 25977477]). The Result tab of the Sample Overview shows the Intra-species Contamination and the Completeness fields. Those two fields are also written to the Procedure Statistics. When a comparison table is created for a project that contains the CheckM2 Intra-species Contamination Check Task Template the two fields are automatically added to the comparison table. If the task template is explicitly selected when creating a comparison table, all result fields are added to the table. Run times
|