VhgBoxplot {Virusparies} | R Documentation |
VhgBoxplot: Generate box plots comparing E-values,identity or contig length (Gatherer only) for each virus group
Description
VhgBoxplot generates box plots comparing either E-values,identity or contig length (Gatherer only) for each group from VirusHunter or VirusGatherer hittable results.
Usage
VhgBoxplot(
file,
x_column = "best_query",
taxa_rank = "Family",
y_column = "ViralRefSeq_E",
contiglen_log10_scale = FALSE,
cut = 1e-05,
add_cutoff_line = TRUE,
cut_colour = "#990000",
reorder_criteria = "median",
theme_choice = "linedraw",
flip_coords = TRUE,
add_mean_point = FALSE,
mean_color = "white",
mean_point_size = 2,
title = "default",
title_size = 16,
title_face = "bold",
title_colour = "#2a475e",
subtitle = "default",
subtitle_size = 12,
subtitle_face = "bold",
subtitle_colour = "#1b2838",
xlabel = NULL,
ylabel = NULL,
axis_title_size = 12,
xtext_size = 10,
x_angle = NULL,
ytext_size = 10,
y_angle = NULL,
remove_group_labels = FALSE,
legend_title = "Phylum",
legend_position = "bottom",
legend_title_size = 12,
legend_title_face = "bold",
legend_text_size = 10,
facet_ncol = NULL,
group_unwanted_phyla = NULL
)
Arguments
file |
A data frame containing VirusHunter or VirusGatherer hittable results. |
x_column |
(optional): A character specifying the column containing the groups (default:"best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping. |
taxa_rank |
(optional): When
|
y_column |
A character specifying the column containing the values to be compared. Currently "ViralRefSeq_ident", "contig_len" (column in Gatherer hittable) and "ViralRefSeq_E" are supported columns (default:"ViralRefSeq_E"). |
contiglen_log10_scale |
(optional): When |
cut |
(optional): The significance cutoff value for E-values (default: 1e-5). |
add_cutoff_line |
(optional): Whether to add a horizontal line based on |
cut_colour |
(optional): The color for the significance cutoff line (default: "#990000"). |
reorder_criteria |
Character string specifying the criteria for reordering the x-axis ('max', 'min', 'median'(Default),'mean','phylum'). NULL sorts alphabetically. You can also specify criteria with 'phylum_' prefix (e.g., 'phylum_median') to sort by phylum first and then by the specified statistic within each phylum. |
theme_choice |
(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted"). |
flip_coords |
(optional): Logical indicating whether to flip the coordinates of the plot (default: TRUE). |
add_mean_point |
(optional): Logical indicating whether to add mean points to the box plot (default: FALSE). |
mean_color |
(optional): Change color of point indicating mean value in box plot (default: "white"). |
mean_point_size |
(optional): Change size of point indicating mean value in box plot (default: 2). |
title |
(optional): A character specifying the title of the plot. Default title is set based on y_column. |
title_size |
(optional): Numeric specifying the size of the title text (default: 16). |
title_face |
(optional): A character specifying the font face for the title text (default: "bold"). |
title_colour |
(optional): A character specifying the color for the title text (default: "#2a475e"). |
subtitle |
(optional): A character specifying the subtitle of the plot. Default subtitle is set based on y_column. |
subtitle_size |
(optional): Numeric specifying the size of the subtitle text(default: 12). |
subtitle_face |
(optional): A character specifying the font face for the subtitle text (default: "bold"). |
subtitle_colour |
(optional): A character specifying the color for the subtitle text (default: "#1b2838"). |
xlabel |
(optional): A character specifying the label for the x-axis (default: "Virus found in query"). |
ylabel |
(optional): A character specifying the label for the y-axis. Default is set based on y_column. |
axis_title_size |
(optional): Numeric specifying the size of the axis title text (default: 12). |
xtext_size |
(optional): Numeric specifying the size of the x-axis tick labels (default: 10). |
x_angle |
(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change. |
ytext_size |
(optional): Numeric specifying the size of the y-axis tick labels (default: 10). |
y_angle |
(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change. |
remove_group_labels |
(optional): If |
legend_title |
(optional): A character specifying the title for the legend (default: "Phylum"). |
legend_position |
(optional): A character specifying the position of the legend (default: "bottom"). |
legend_title_size |
(optional): Numeric specifying the size of the legend title text (default: 12). |
legend_title_face |
(optional): A character specifying the font face for the legend title text (default: "bold"). |
legend_text_size |
(optional): Numeric specifying the size of the legend text (default: 10). |
facet_ncol |
(optional): The number of columns for faceting (default: NULL). It is recommended to specify this when the number of viral groups is high, to ensure they fit well in one plot. |
group_unwanted_phyla |
(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:
All other phyla not in the specified group will be grouped into a single category:
"Non-RNA-virus" for |
Details
VhgBoxplot generates box plots comparing either E-values, identity, or contig length (Gatherer only) for each virus group from the VirusHunter or Gatherer hittable.
The user can specify whether to generate box plots for E-values, identity, or contig length (Gatherer only) by specifying the 'y_column'. This means that 'VhgBoxplot' can generate three different types of box plots. By default, 'y_column' is set to "ViralRefSeq_E" and will plot the reference E-Value on the y-axis. Grouping on the x-axis is done by the 'x_column' argument. By default, the "best_query" will be used.
Additionally, the function calculates summary statistics and identifies outliers for further analysis ("ViralRefSeq_E" and "contig_len" only). When 'y_column' is set to "ViralRefSeq_E", the output also includes 'rows_belowthres', which contains the hittable filtered for the rows below the threshold specified in the 'cut' argument.
The 'cut' argument is used differently depending on the 'y_column' value:
For 'y_column' set to "contig_len" or "ViralRefSeq_ident", the 'cut' argument filters the data to plot only the values with a "ViralRefSeq_E" below the specified threshold (default: 1e-5).
For 'y_column' set to "ViralRefSeq_E", the rows are not filtered. Instead, a horizontal line (h_line) is shown in the plot to indicate the cutoff value.
This allows the user to plot only the significant contig lengths and identities while also visualizing the number of non-significant and significant values for comparison.
Warning: In some cases, E-values might be exactly 0. When these values are transformed using -log10, R
returns "inf" as the output. To avoid this issue, we replace all E-values that are 0 with the smallest e-value that is greater than 0.
If the smallest E-value is above the user-defined cutoff, we use a value of cutoff * 10^-10
to replace the zeros.
Value
A list containing:
The generated box plot.
Summary statistics.
Outliers ("ViralRefSeq_E" and "contig_len" only).
rows_belowthres ("ViralRefSeq_E" only).
Author(s)
Sergej Ruff
See Also
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
Examples
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
# plot 1 for E-values
plot1 <- VhgBoxplot(file, x_column = "best_query", y_column = "ViralRefSeq_E")
plot1
# plot 2 for identity
plot2 <- VhgBoxplot(file, x_column = "best_query", y_column = "ViralRefSeq_ident")
plot2
# plot 3 custom arguments used
plot3 <- VhgBoxplot(file,
x_column = "best_query",
y_column = "ViralRefSeq_E",
theme_choice = "grey",
subtitle = "Custom subtitle: Identity for custom query",
xlabel = "Custom x-axis label: Custom query",
ylabel = "Custom y-axis label: Viral Reference Evalue in -log10 scale",
legend_position = "right")
plot3
# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)
# plot 4: Virusgatherer plot for ViralRefSeq_taxonomy agains contig length
plot5 <- VhgBoxplot(vg_file,x_column = "ViralRefSeq_taxonomy",y_column = "contig_len")
plot5