VhgSubsetHittable {Virusparies}R Documentation

VhgSubsetHittable: Filter VirusHunter and VirusGatherer hittables

Description

VhgSubsetHittable filters a VirusHunter or VirusGatherer hittable based on specified criteria, including specific virus groups, minimum number of hits, and observations below certain E-value or identity percentage criteria.

Usage

VhgSubsetHittable(
  file,
  group_column = "best_query",
  virus_groups = NULL,
  num_hits_min = NULL,
  ViralRefSeq_E_criteria = NULL,
  ViralRefSeq_ident_criteria = NULL,
  contig_len_criteria = NULL
)

Arguments

file

A data frame containing VirusHunter or VirusGatherer hittable results.

group_column

A string indicating the column containing the virus groups specified in the virus_groups argument. Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

virus_groups

A character vector specifying virus groups to filter by.

num_hits_min

Minimum number of hits required. Default is NULL, which means no filter based on num_hits.

ViralRefSeq_E_criteria

Maximum E-value threshold for ViralRefSeq_E criteria. Default is NULL, which means no filter based on ViralRefSeq_E.

ViralRefSeq_ident_criteria

Maximum or minimum sequence identity percentage threshold for ViralRefSeq_ident criteria. Default is NULL, which means no filter based on ViralRefSeq_ident. If positive, filters where ViralRefSeq_ident is above the threshold. If negative, filters where ViralRefSeq_ident is below the absolute value of the threshold.

contig_len_criteria

(Gatherer only): Minimum contig length required.

Details

The function filters the input VirusHunter or VirusGatherer data (file) based on specified criteria:

Value

A filtered dataframe based on the specified criteria.

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples


path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

cat("The dimensions of the VirusHunter hittable before filtering are: \n");dim(file)

file_filtered <- VhgSubsetHittable(file,group_column = "best_query",
virus_groups = "Anello_ORF1core",
num_hits_min = 4,ViralRefSeq_ident_criteria = -90,ViralRefSeq_E_criteria = 0.00001)

cat("The dimensions of the VirusHunter Hittable after filtering are: \n");dim(file_filtered)

# other examples for viral_group

# Include a single group:
result1 <- VhgSubsetHittable(file, virus_groups = "Hepadna-Nackedna_TP")
# Include multiple groups:
result2 <- VhgSubsetHittable(file, virus_groups = c("Hepadna-Nackedna_TP", "Gemini_Rep"))
# Exclude a single group:
result3 <- VhgSubsetHittable(file, virus_groups = list(exclude = "Hepadna-Nackedna_TP"))
# Exclude multiple groups:
result4 <- VhgSubsetHittable(file, virus_groups = list(exclude =
 c("Hepadna-Nackedna_TP", "Anello_ORF1core")))



[Package Virusparies version 1.1.0 Index]