call_repeats {trace}R Documentation

Call Repeats for Fragments

Description

This function calls the repeat lengths for a list of fragments.

Usage

call_repeats(
  fragments_list,
  assay_size_without_repeat = 87,
  repeat_size = 3,
  correction = "none",
  force_whole_repeat_units = FALSE,
  force_repeat_pattern = FALSE,
  force_repeat_pattern_size_period = repeat_size * 0.93,
  force_repeat_pattern_size_window = 0.5
)

Arguments

fragments_list

A list of fragments_repeats objects containing fragment data.

assay_size_without_repeat

An integer specifying the assay size without repeat for repeat calling. This is the length of the sequence flanking the repeat in the PCR product.

repeat_size

An integer specifying the repeat size for repeat calling. Default is 3.

correction

A character vector of either "batch" to carry out a batch correction from common samples across runs (known repeat length not required), or "repeat" to use samples with validated modal repeat lengths to correct the repeat length. Requires metadata to be added (see add_metadata()) with both "batch" and "repeat" requiring "batch_run_id", "batch" requiring ("batch_sample_id") and "repeat" requiring "batch_sample_modal_repeat" (but also benefits from having "batch_sample_id").

force_whole_repeat_units

A logical value specifying if the peaks should be forced to be whole repeat units apart. Usually the peaks are slightly under the whole repeat unit if left unchanged.

force_repeat_pattern

A logical value specifying if the peaks should be re called to fit the specific repeat unit pattern. This requires trace information so you must have started with fsa files.

force_repeat_pattern_size_period

A numeric value to set the peak periodicity bp size. In fragment analysis, the peaks are usually slightly below the actual repeat unit size, so you can use this value to fine tune what the periodicity should be.

force_repeat_pattern_size_window

A numeric value for the size window when assigning the peak. The algorithm jumps to the predicted scan for the next peak. This value opens a window of the given base pair size neighboring scans to pick the tallest in.

Details

This function has a lot of different options features for determining the repeat length of your samples. This includes i) an option to force the peaks to be whole repeat units apart, ii) corrections to correct batch effects or accurately call repeat length by comparing to samples of known length, and iii) algorithms or re-calling the peaks to remove any contaminating peaks or shoulder-peaks.

———— correction ————

There are two main correction approaches that are somewhat related: either 'batch' or 'repeat'. Batch correction is relatively simple and just requires you to link samples across batches to correct batch-batch variation in repeat sizes. However, even though the repeat size that is return will be precise, it will not be accurate and underestimates the real repeat length. By contrast, repeat correction can be used to accurately call repeat lengths (which also corrects the batch effects). However, the repeat correction will only be as good as your sample used to call the repeat length so this is a challenging and advanced feature. You need to use a sample that reliably returns the same peak as the modal peak, or you need to be willing to understand the shape of the distribution and manually validate the repeat length of each batch_sample_id for each run.

———— force_whole_repeat_units ————

The force_whole_repeat_units option aims to correct for the systematic underestimation in fragment sizes that occurs in capillary electrophoresis. It is independent to the algorithms described above and can be used in conjunction. It modifies repeat lengths in a way that helps align peaks with the underlying repeat pattern, making the repeat lengths whole units (rather than ~0.9 repeats). The calculated repeat lengths start from the main peak's repeat length and increases in increments of the specified repeat_size in either direction. This option basically enables you to get exactly the same result as expansion_index values calculated from data from Genemapper.

———— force_repeat_pattern ————

This parameter re-calls the peaks based on specified (force_repeat_pattern_size_period) periodicity of the peaks. The main application of this algorithm is to solve the issue of contaminating peaks in the expected regular pattern of peaks. We can use the periodicity to jump between peaks and crack open a window (force_repeat_pattern_size_window) to then pick out the tallest scan in the window.

Value

This function modifies list of fragments objects in place with repeats added.

See Also

find_alleles(), add_metadata(), plot_batch_correction_samples(), plot_repeat_correction_model(), extract_repeat_correction_summary()

Examples


fsa_list <- lapply(cell_line_fsa_list[c(16:19)], function(x) x$clone())

find_ladders(fsa_list, show_progress_bar = FALSE)

fragments_list <- find_fragments(
  fsa_list,
  min_bp_size = 300
)

find_alleles(fragments_list)

add_metadata(fragments_list,
   metadata[c(16:19), ]
)

# Simple conversion from bp size to repeat size
call_repeats(
  fragments_list,
  assay_size_without_repeat = 87,
  repeat_size = 3
)

plot_traces(fragments_list[1], xlim = c(120, 170))

# Use force_whole_repeat_units algorithm to make sure called
# repeats are the exact number of bp apart

call_repeats(
  fragments_list,
  force_whole_repeat_units = TRUE,
  assay_size_without_repeat = 87,
  repeat_size = 3
)

plot_traces(fragments_list[1], xlim = c(120, 170))


# apply batch correction
call_repeats(
  fragments_list,
  correction = "batch",
  assay_size_without_repeat = 87,
  repeat_size = 3
)

plot_traces(fragments_list[1], xlim = c(120, 170))

# apply repeat correction
call_repeats(
  fragments_list,
  correction = "repeat",
  assay_size_without_repeat = 87,
  repeat_size = 3
)

plot_traces(fragments_list[1], xlim = c(120, 170))

#ensure only periodic peaks are called
call_repeats(
  fragments_list,
  force_repeat_pattern = TRUE,
  force_repeat_pattern_size_period = 2.75,
  assay_size_without_repeat = 87,
  repeat_size = 3
)

plot_traces(fragments_list[1], xlim = c(120, 170))


[Package trace version 0.6.0 Index]