VhgIdenFacetedScatterPlot {Virusparies}R Documentation

VhgIdenFacetedScatterPlot: Create a scatter plot of Viral refseq identity vs. -log10 of viral refseq E-value.

Description

VhgIdenFacetedScatterPlot generates a scatter plot of viral refseq identity versus -log10 of refseq E-value for each virus group in the best_query or ViralRefSeq_taxonomy column . The points are colored based on whether the E-value meets a specified cutoff and are faceted by the viral groups in the best_query or ViralRefSeq_taxonomy column.

Usage

VhgIdenFacetedScatterPlot(
  file,
  groupby = "best_query",
  taxa_rank = "Family",
  cutoff = 1e-05,
  conlen_bubble_plot = FALSE,
  contiglen_breaks = 5,
  theme_choice = "linedraw",
  title = "Faceted scatterplot of viral reference E-values and sequence identity",
  title_size = 16,
  title_face = "bold",
  title_colour = "#2a475e",
  subtitle = NULL,
  subtitle_size = 12,
  subtitle_face = "bold",
  subtitle_colour = "#1b2838",
  xlabel = "Viral reference sequence identity (%)",
  ylabel = "-log10 of viral reference E-values",
  axis_title_size = 12,
  xtext_size = 10,
  x_angle = NULL,
  ytext_size = 10,
  y_angle = NULL,
  legend_position = "bottom",
  legend_title_size = 12,
  legend_title_face = "bold",
  legend_text_size = 10,
  true_colour = "blue",
  false_colour = "red",
  wrap_ncol = 2,
  filter_group_criteria = NULL
)

Arguments

file

VirusHunterGatherer hittable.

groupby

(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

taxa_rank

(optional): When groupby is set to "ViralRefSeq_taxonomy", specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

cutoff

(optional): A numeric value representing the cutoff for the refseq E-value. Points with ViralRefSeq_E less than or equal to this value will be colored blue; otherwise, they will be colored red (default: 1e-5).

conlen_bubble_plot

(optional): Logical value indicating whether the contig_len column should be used to size the bubbles in the plot. Applicable only to VirusGatherer hittables input (default: FALSE).

contiglen_breaks

(optional): Number of breaks (default: 5) for the bubble plot (for conlen_bubble_plot=TRUE).

theme_choice

(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted").

title

(optional): The title of the plot (default: "Faceted scatter plot of viral reference E-values and sequence identity").

title_size

(optional): The size of the title text (default: 16).

title_face

(optional): The face (bold, italic, etc.) of the title text (default: "bold").

title_colour

(optional): The color of the title text (default: "#2a475e").

subtitle

(optional): The subtitle of the plot (default: NULL).

subtitle_size

(optional): The size of the subtitle text (default: 12).

subtitle_face

(optional): The face (bold, italic, etc.) of the subtitle text (default: "bold").

subtitle_colour

(optional): The color of the subtitle text (default: "#1b2838").

xlabel

(optional): The label for the x-axis (default: "Viral reference sequence identity (%)").

ylabel

(optional): The label for the y-axis (default: "-log10 of viral reference E-values").

axis_title_size

(optional): The size of the axis titles (default: 12).

xtext_size

(optional): The size of the x-axis text (default: 10).

x_angle

(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change.

ytext_size

(optional): The size of the y-axis text (default: 10).

y_angle

(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change.

legend_position

(optional): The position of the legend (default: "bottom).

legend_title_size

(optional): The size of the legend title text (default: 12).

legend_title_face

(optional): The face (bold, italic, etc.) of the legend title text (default: "bold").

legend_text_size

(optional): The size of the legend text (default: 10).

true_colour

(optional): The color for points that meet the cutoff condition (default: "blue").

false_colour

(optional): The color for points that do not meet the cutoff condition (default: "red").

wrap_ncol

(optional): The number of columns for faceting (default: 12).

filter_group_criteria

(optional): Character vector, numeric vector, or single character/numeric value.

  • Character vector: Names of viral groups to filter.

  • Numeric vector: Indices of viral groups to filter.

  • Single character or numeric value: Filter a single viral group.

  • NULL: No filtering is performed (default).

Details

'VhgIdenFacetedScatterPlot' takes a VirusHunter or VirusGatherer hittable and a cutoff value as inputs. The plot includes:

filter_group_criteria: Allows filtering of viral groups by specifying either a single character string or a vector of character strings that match unique entries in groupby. Alternatively, a single numeric value, a range, or a vector of numeric values can be used to filter groups.

For example, if groupby is "best_query" with the following unique groups:

Setting filter_group_criteria to c("Anello_ORF1core", "Genomo_Rep") will filter the data to only include observations where the "best_query" column has 'Anello_ORF1core' or 'Genomo_Rep'. Alternatively, setting filter_group_criteria to 2:3 will return only the second and third alphabetically ordered viral groups from "best_query". The order also matches the order of the viral groups in the faceted scatter plot.

This is particularly useful when there are too many viral groups to be plotted in a single plot, allowing for separation into different groups. It also enables the user to focus on specific groups of interest for more detailed analysis.

Tibble data frames containing summary statistics (median, Q1, Q3, mean, sd, min, and max) for 'ViralRefSeq_E' and 'ViralRefSeq_ident' values are generated. Optionally, summary statistics for 'contig_len' values are also included if applicable. These summary statistics, along with the plot object, are returned within a list object.

Warning: In some cases, E-values might be exactly 0. When these values are transformed using -log10, R returns "inf" as the output. To avoid this issue, we replace all E-values that are 0 with the smallest E-value that is greater than 0. If the smallest E-value is above the user-defined cutoff, we use a value of cutoff * 10^-10 to replace the zeros.

Value

A list containing the following components:

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

# plot 1
plot <- VhgIdenFacetedScatterPlot(file,cutoff = 1e-5)

plot

# plot 2 with custom data
custom_plot <- VhgIdenFacetedScatterPlot(file,
                                         cutoff = 1e-4,
                                         theme_choice = "dark",
                                         title = "Custom Scatterplot",
                                         title_size = 18,
                                         title_face = "italic",
                                         title_colour = "orange",
                                         xlabel = "Custom X Label",
                                         ylabel = "Custom Y Label",
                                         axis_title_size = 14,
                                         legend_position = "right",
                                         true_colour = "green",
                                         false_colour = "purple")

custom_plot

# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)

# vgplot: virusgatherer plot with ViralRefSeq_taxonomy as custom grouping
vgplot <- VhgIdenFacetedScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy")
vgplot



[Package Virusparies version 1.1.0 Index]