class BioTCM::Apps::GeneDetector

To detect gene symbols in text

Exampe Usage

BioTCM::Apps::GeneDetector.new.detect(str)

Constants

DEFAULT_GENE_BLACKLIST

Default patterns of genes to exclude

DEFAULT_TEXT_CHANGELIST

Default patterns of text to transform

VERSION

Version of GeneDetector

Public Class Methods

new( gene_blacklist: [], text_changelist: [], if_formalize: true ) click to toggle source

Initialize a gene detector @param gene_blacklist [Array] @param text_changelist [Array] @param if_formalize [Boolean]

# File lib/biotcm/apps/gene_detector.rb, line 32
def initialize(
  gene_blacklist: [],
  text_changelist: [],
  if_formalize: true
)
  @gene_blacklist = Regexp.new('(' + (DEFAULT_GENE_BLACKLIST + gene_blacklist).join(')|(') + ')')
  @text_changelist = DEFAULT_TEXT_CHANGELIST + text_changelist
  @if_formalize = if_formalize
end

Public Instance Methods

detect(text) click to toggle source

Detect genes appearing in text @param text [String] @return [Array] list of symbols

# File lib/biotcm/apps/gene_detector.rb, line 45
def detect(text)
  # Check dependency
  BioTCM::Databases::HGNC.ensure

  # Prepare symbols
  unless instance_variable_defined?(:@symbols)
    @symbols = BioTCM::Databases::HGNC.symbol2hgncid.keys
    @symbols.reject! { |sym| sym =~ @gene_blacklist }
  end

  # Transform text
  @text_changelist.each do |item|
    text.gsub!(item[0], item[1])
  end

  # Split sentences into words and eliminate redundancies
  rtn = text.split(/\.\s|\s?[,:!?#()\[\]{}]\s?|\s/).uniq & @symbols

  # Return approved symbols
  @if_formalize ? rtn.map(&:to_formal_symbol).uniq : rtn
end