nearest-methods {GenomicRanges} | R Documentation |
Finding the nearest genomic range/position neighbor
Description
The nearest
, precede
, follow
, distance
,
nearestKNeighbors
, and distanceToNearest
methods for
GenomicRanges
objects and subclasses.
Usage
## S4 method for signature 'GenomicRanges,GenomicRanges'
precede(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
precede(x, subject,
select=c("first", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
follow(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
follow(x, subject,
select=c("last", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
nearest(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
nearest(x, subject,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
nearestKNeighbors(x, subject, k=1L,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,missing'
nearestKNeighbors(x, subject, k=1L,
select=c("arbitrary", "all"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges,GenomicRanges'
distanceToNearest(x, subject,
ignore.strand=FALSE, ...)
## S4 method for signature 'GenomicRanges,missing'
distanceToNearest(x, subject,
ignore.strand=FALSE, ...)
## S4 method for signature 'GenomicRanges,GenomicRanges'
distance(x, y,
ignore.strand=FALSE, ...)
Arguments
x |
The query GenomicRanges instance. |
subject |
The subject GenomicRanges instance
within which the nearest neighbors are found. Can be missing,
in which case |
y |
For the |
k |
For the |
select |
Logic for handling ties. By default, all methods
select a single interval (arbitrary for When |
ignore.strand |
A |
... |
Additional arguments for methods. |
Details
nearest: Performs conventional nearest neighbor finding. Returns an integer vector containing the index of the nearest neighbor range in
subject
for each range inx
. If there is no nearest neighborNA
is returned. For details of the algorithm see the man page in the IRanges package (?nearest
).precede: For each range in
x
,precede
returns the index of the range insubject
that is directly preceded by the range inx
. Overlapping ranges are excluded.NA
is returned when there are no qualifying ranges insubject
.follow: The opposite of
precede
,follow
returns the index of the range insubject
that is directly followed by the range inx
. Overlapping ranges are excluded.NA
is returned when there are no qualifying ranges insubject
.nearestKNeighbors: Performs conventional k-nearest neighbor finding. Returns an IntegerList containing the index of the k-nearest neighbors in
subject
for each range inx
. If there is no nearest neighborNA
is returned. Ifselect="all"
is specified, ties will be included in the resulting IntegerList.Orientation and strand for
precede
andfollow
: Orientation is 5' to 3', consistent with the direction of translation. Because positional numbering along a chromosome is from left to right and transcription takes place from 5' to 3',precede
andfollow
can appear to have ‘opposite’ behavior on the+
and-
strand. Using positions 5 and 6 as an example, 5 precedes 6 on the+
strand but follows 6 on the-
strand.The table below outlines the orientation when ranges on different strands are compared. In general, a feature on
*
is considered to belong to both strands. The single exception is when bothx
andsubject
are*
in which case both are treated as+
.x | subject | orientation -----+-----------+---------------- a) + | + | ---> b) + | - | NA c) + | * | ---> d) - | + | NA e) - | - | <--- f) - | * | <--- g) * | + | ---> h) * | - | <--- i) * | * | ---> (the only situation where * arbitrarily means +)
distanceToNearest: Returns the distance for each range in
x
to its nearest neighbor in thesubject
.distance: Returns the distance for each range in
x
to the range iny
. The behavior ofdistance
has changed in Bioconductor 2.12. See the man page?distance
in the IRanges package for details.
Value
For nearest
, precede
and follow
, an integer
vector of indices in subject
, or a Hits if
select="all"
.
For nearestKNeighbors
, an IntegerList of vertices in
subject
.
For distanceToNearest
, a Hits object with a
column for the query
index (queryHits), subject
index
(subjectHits) and the distance
between the pair.
For distance
, an integer vector of distances between the ranges
in x
and y
.
Author(s)
P. Aboyoun and V. Obenchain
See Also
The GenomicRanges and GRanges classes.
The IntegerRanges class in the IRanges package.
The Hits class in the S4Vectors package.
The nearest-methods man page in the IRanges package.
-
findOverlaps-methods for finding just the overlapping ranges.
The nearest-methods man page in the GenomicFeatures package.
Examples
## -----------------------------------------------------------
## precede() and follow()
## -----------------------------------------------------------
query <- GRanges("A", IRanges(c(5, 20), width=1), strand="+")
subject <- GRanges("A", IRanges(rep(c(10, 15), 2), width=1),
strand=c("+", "+", "-", "-"))
precede(query, subject)
follow(query, subject)
strand(query) <- "-"
precede(query, subject)
follow(query, subject)
## ties choose first in order
query <- GRanges("A", IRanges(10, width=1), c("+", "-", "*"))
subject <- GRanges("A", IRanges(c(5, 5, 5, 15, 15, 15), width=1),
rep(c("+", "-", "*"), 2))
precede(query, subject)
precede(query, rev(subject))
## ignore.strand=TRUE treats all ranges as '+'
precede(query[1], subject[4:6], select="all", ignore.strand=FALSE)
precede(query[1], subject[4:6], select="all", ignore.strand=TRUE)
## -----------------------------------------------------------
## nearest()
## -----------------------------------------------------------
## When multiple ranges overlap an "arbitrary" range is chosen
query <- GRanges("A", IRanges(5, 15))
subject <- GRanges("A", IRanges(c(1, 15), c(5, 19)))
nearest(query, subject)
## select="all" returns all hits
nearest(query, subject, select="all")
## Ranges in 'x' will self-select when 'subject' is present
query <- GRanges("A", IRanges(c(1, 10), width=5))
nearest(query, query)
## Ranges in 'x' will not self-select when 'subject' is missing
nearest(query)
## -----------------------------------------------------------
## nearestKNeighbors()
## -----------------------------------------------------------
## Without an argument, k defaults to 1
query <- GRanges("A", IRanges(c(2, 5), c(8, 15)))
subject <- GRanges("A", IRanges(c(1, 4, 10, 15), c(5, 7, 12, 19)))
nearestKNeighbors(query, subject)
## Return multiple neighbors with k > 1
nearestKNeighbors(query, subject, k=3)
## select="all" returns all hits
nearestKNeighbors(query, subject, select="all")
## -----------------------------------------------------------
## distance(), distanceToNearest()
## -----------------------------------------------------------
## Adjacent, overlap, separated by 1
query <- GRanges("A", IRanges(c(1, 2, 10), c(5, 8, 11)))
subject <- GRanges("A", IRanges(c(6, 5, 13), c(10, 10, 15)))
distance(query, subject)
## recycling
distance(query[1], subject)
## zero-width ranges
zw <- GRanges("A", IRanges(4,3))
stopifnot(distance(zw, GRanges("A", IRanges(3,4))) == 0L)
sapply(-3:3, function(i)
distance(shift(zw, i), GRanges("A", IRanges(4,3))))
query <- GRanges(c("A", "B"), IRanges(c(1, 5), width=1))
distanceToNearest(query, subject)
## distance() with GRanges and TxDb see the
## ?'distance,GenomicRanges,TxDb-method' man
## page in the GenomicFeatures package.