class Mspire::Digester

A Digester splits a protein sequence into peptides at specified sites.

trypsin = Mspire::Digester[:trypsin]

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => ['MIVIGR', 'SIVHPYITNEYEPFAAEK', 'QQILSIMAG']

With 1 missed cleavage:

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => ['MIVIGR','MIVIGRSIVHPYITNEYEPFAAEK','SIVHPYITNEYEPFAAEK',
#     'SIVHPYITNEYEPFAAEKQQILSIMAG', 'QQILSIMAG']

Return the start and end sites of digestion:

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [[0,6],[0,24],[6,24],[6,33],[24,33]]

Constants

ENZYMES
MASCOT_ENZYME_CONFIG_STRINGS

ARG_C = mascot_parse(‘Arg-C C-Term R P no no’) ENZYMES = <‘Arg-C’ enzyme>

MULTILINE_WHITESPACE

Attributes

cleave_str[R]

A string of residues at which cleavage occurs

cterm_cleavage[R]

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.

cterm_exception[R]

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).

name[R]

The name of the digester

Public Class Methods

new(name, cleave_str, cterm_exception=nil, cterm_cleavage=true) click to toggle source
# File lib/mspire/digester.rb, line 41
def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
                     when cterm_exception == nil || cterm_exception.empty? then nil
                     when cterm_exception.length == 1 then cterm_exception[0]
                     else
                       raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
                     end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Protected Class Methods

[](enzyme_name) click to toggle source

takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found)

# File lib/mspire/digester.rb, line 185
def [](enzyme_name)
  ENZYMES[ enzyme_name.to_s.downcase.gsub(/\W+/,'_').to_sym ]
end

Public Instance Methods

cleavage_sites(seq, offset=0, length=seq.length-offset) click to toggle source

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.

# File lib/mspire/digester.rb, line 81
def cleavage_sites(seq, offset=0, length=seq.length-offset)
  return [0, 1] if seq.size == 1  # adding exceptions is lame--algorithm should just work

  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << (pos - adjustment)
  end

  # add the final position
  if (pos < limit) || (positions.length == 1)
    positions << limit
  end
  # adding exceptions is lame.. this code probably needs to be
  # refactored (corrected).
  if !cterm_cleavage && pos == limit
    positions << limit
  end
  positions
end
digest(seq, max_misses=0, offset=0, length=seq.length-offset) click to toggle source

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

# File lib/mspire/digester.rb, line 126
def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).map do |s, e|
    seq[s, e-s]
  end
end
site_digest(seq, max_misses=0, offset=0, length=seq.length-offset) { |start_index, end_index| ... } click to toggle source

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.

# File lib/mspire/digester.rb, line 111
def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]

    block ? block.call(start_index, end_index) : [start_index, end_index]
  end  
end