class Mspire::Digester
A Digester
splits a protein sequence into peptides at specified sites.
trypsin = Mspire::Digester[:trypsin] trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG') # => ['MIVIGR', 'SIVHPYITNEYEPFAAEK', 'QQILSIMAG']
With 1 missed cleavage:
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1) # => ['MIVIGR','MIVIGRSIVHPYITNEYEPFAAEK','SIVHPYITNEYEPFAAEK', # 'SIVHPYITNEYEPFAAEKQQILSIMAG', 'QQILSIMAG']
Return the start and end sites of digestion:
trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1) # => [[0,6],[0,24],[6,24],[6,33],[24,33]]
Constants
- ENZYMES
- MASCOT_ENZYME_CONFIG_STRINGS
ARG_C = mascot_parse(‘Arg-C C-Term R P no no’) ENZYMES = <‘Arg-C’ enzyme>
- MULTILINE_WHITESPACE
Attributes
A string of residues at which cleavage occurs
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
The name of the digester
Public Class Methods
# File lib/mspire/digester.rb, line 41 def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true) regexp = [] 0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] } @name = name @cleave_str = cleave_str @cleave_regexp = Regexp.new(regexp.join('|')) @cterm_exception = case when cterm_exception == nil || cterm_exception.empty? then nil when cterm_exception.length == 1 then cterm_exception[0] else raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}" end @cterm_cleavage = cterm_cleavage @scanner = StringScanner.new('') end
Protected Class Methods
takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found)
# File lib/mspire/digester.rb, line 185 def [](enzyme_name) ENZYMES[ enzyme_name.to_s.downcase.gsub(/\W+/,'_').to_sym ] end
Public Instance Methods
Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.
d = Digester.new('Trypsin', 'KR', 'P') seq = "AARGGR" sites = d.cleavage_sites(seq) # => [0, 3, 6] seq[sites[0], sites[0+1] - sites[0]] # => "AAR" seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
Trailing whitespace is included in the fragment.
seq = "AAR \n GGR" sites = d.cleavage_sites(seq) # => [0, 8, 11] seq[sites[0], sites[0+1] - sites[0]] # => "AAR \n " seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
The digested section of sequence may be specified using offset and length.
# File lib/mspire/digester.rb, line 81 def cleavage_sites(seq, offset=0, length=seq.length-offset) return [0, 1] if seq.size == 1 # adding exceptions is lame--algorithm should just work adjustment = cterm_cleavage ? 0 : 1 limit = offset + length positions = [offset] pos = scan(seq, offset, limit) do |pos| positions << (pos - adjustment) end # add the final position if (pos < limit) || (positions.length == 1) positions << limit end # adding exceptions is lame.. this code probably needs to be # refactored (corrected). if !cterm_cleavage && pos == limit positions << limit end positions end
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites
; as in that method, the digested section of sequence may be specified using offset and length.
# File lib/mspire/digester.rb, line 126 def digest(seq, max_misses=0, offset=0, length=seq.length-offset) site_digest(seq, max_misses, offset, length).map do |s, e| seq[s, e-s] end end
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites
; as in that method, the digested section of sequence may be specified using offset and length.
Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.
# File lib/mspire/digester.rb, line 111 def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index frag_sites = cleavage_sites(seq, offset, length) overlay(frag_sites.length, max_misses, 1) do |start_index, end_index| start_index = frag_sites[start_index] end_index = frag_sites[end_index] block ? block.call(start_index, end_index) : [start_index, end_index] end end