XtraSNPlocs-class {BSgenome} | R Documentation |
XtraSNPlocs objects
Description
The XtraSNPlocs class is a container for storing extra SNP locations and alleles for a given organism. While a SNPlocs object can store only molecular variations of class snp, an XtraSNPlocs object contains molecular variations of other classes (in-del, heterozygous, microsatellite, named-locus, no-variation, mixed, multinucleotide-polymorphism).
XtraSNPlocs objects are usually made in advance by a volunteer and made
available to the Bioconductor community as XtraSNPlocs data packages.
See ?available.SNPs
for how to get the list of
SNPlocs and XtraSNPlocs data packages curently available.
The main focus of this man page is on how to extract SNPs from an XtraSNPlocs object.
Usage
## S4 method for signature 'XtraSNPlocs'
snpcount(x)
## S4 method for signature 'XtraSNPlocs'
snpsBySeqname(x, seqnames,
columns=c("seqnames", "start", "end", "strand", "RefSNP_id"),
drop.rs.prefix=FALSE, as.DataFrame=FALSE)
## S4 method for signature 'XtraSNPlocs'
snpsByOverlaps(x, ranges,
columns=c("seqnames", "start", "end", "strand", "RefSNP_id"),
drop.rs.prefix=FALSE, as.DataFrame=FALSE, ...)
## S4 method for signature 'XtraSNPlocs'
snpsById(x, ids,
columns=c("seqnames", "start", "end", "strand", "RefSNP_id"),
ifnotfound=c("error", "warning", "drop"), as.DataFrame=FALSE)
## S4 method for signature 'XtraSNPlocs'
colnames(x, do.NULL=TRUE, prefix="col")
Arguments
x |
An XtraSNPlocs object. |
seqnames |
The names of the sequences for which to get SNPs. NAs and duplicates
are not allowed. The supplied |
columns |
The names of the columns to return. Valid column names are:
|
drop.rs.prefix |
Should the |
as.DataFrame |
Should the result be returned in a DataFrame instead of a GRanges object? |
ranges |
One or more regions of interest specified as a GRanges
object. A single region of interest can be specified as a character string
of the form |
... |
Additional arguments, for use in specific methods. Arguments passed to the |
ids |
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or
character vector, with or without the |
ifnotfound |
What to do if SNP ids are not found. |
do.NULL , prefix |
These arguments are ignored. |
Value
snpcount
returns a named integer vector containing the number
of SNPs for each chromosome in the reference genome.
snpsBySeqname
and snpsById
both return a
GRanges object with 1 element per SNP,
unless as.DataFrame
is set to TRUE
in which case they
return a DataFrame with 1 row per SNP.
When a GRanges object is returned, the columns
requested via the columns
argument are stored as metada columns
of the object, except for the following columns: seqnames
,
start
, end
, width
, and strand
.
These "spatial columns" (in the sense that they describe the genomic
locations of the SNPs) can be accessed by calling the corresponding
getter on the GRanges object.
Summary of available columns (my_snps
being the returned object):
-
seqnames
: The name of the chromosome where each SNP is located. Access withseqnames(my_snps)
whenmy_snps
is a GRanges object. -
start
andend
: The starting and ending coordinates of each SNP with respect to the chromosome indicated inseqnames
. Coordinated are 1-based and with respect to the 5' end of the plus strand of the chromosome in the reference genome. Access withstart(my_snps)
,end(my_snps)
, orranges(my_snps)
whenmy_snps
is a GRanges object. -
width
: The number of nucleotides spanned by each SNP on the reference genome (e.g. a width of 0 means the SNP is an insertion). Access withwidth( my_snps)
whenmy_snps
is a GRanges object. -
strand
: The strand that the alleles of each SNP was reported to. Access withstrand(my_snps)
whenmy_snps
is a GRanges object. -
RefSNP_id
: The RefSNP id (a.k.a. rs id) of each SNP. Access withmcols(my_snps)$RefSNP_id
whenmy_snps
is a GRanges object. -
alleles
: The alleles of each SNP in the format used by dbSNP. Access withmcols(my_snps)$alleles
whenmy_snps
is a GRanges object. -
snpClass
: Class of each SNP. Possible values arein-del
,heterozygous
,microsatellite
,named-locus
,no-variation
,mixed
, andmultinucleotide-polymorphism
. Access withmcols(my_snps)$snpClass
whenmy_snps
is a GRanges object. -
loctype
: See ftp://ftp.ncbi.nih.gov/snp/00readme.txt for the 6 loctype codes used by dbSNP, and their meanings. WARNING: The code assigned to each SNP doesn't seem to be reliable. For example, loctype codes 1 and 3 officially stand for insertion and deletion, respectively. However, when looking at the SNP ranges it actually seems to be the other way around. Access withmcols(my_snps)$loctype
whenmy_snps
is a GRanges object.
colnames(x)
returns the names of the available columns.
Author(s)
H. Pagès
See Also
-
GRanges objects in the GenomicRanges package.
-
SNPlocs packages and objects for molecular variations of class snp.
Examples
library(XtraSNPlocs.Hsapiens.dbSNP144.GRCh38)
snps <- XtraSNPlocs.Hsapiens.dbSNP144.GRCh38
snpcount(snps)
colnames(snps)
## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get the location, RefSNP id, and alleles for all "extra SNPs"
## located on chromosome 22 or MT:
snpsBySeqname(snps, c("ch22", "chMT"), columns=c("RefSNP_id", "alleles"))
## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get the location, RefSNP id, and alleles for all "extra SNPs"
## overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6",
columns=c("RefSNP_id", "alleles"))
## With the regions of interest being all the known CDS for hg38
## (except for the chromosome naming convention, hg38 is the same
## as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
hg38_cds <- cds(txdb)
seqlevelsStyle(hg38_cds) # UCSC
seqlevelsStyle(snps) # dbSNP
seqlevelsStyle(hg38_cds) <- seqlevelsStyle(snps)
genome(hg38_cds) <- genome(snps)
snpsByOverlaps(snps, hg38_cds, columns=c("RefSNP_id", "alleles"))
## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Get the location and alleles for some RefSNP ids:
my_rsids <- c("rs367617508", "rs398104919", "rs3831697", "rs372470289",
"rs141568169", "rs34628976", "rs67551854")
snpsById(snps, my_rsids, c("RefSNP_id", "alleles"))
## See ?XtraSNPlocs.Hsapiens.dbSNP144.GRCh38 for more examples of using
## snpsBySeqname() and snpsById().