class BioTCM::Apps::GeneDetector
To detect gene symbols in text
Exampe Usage¶ ↑
BioTCM::Apps::GeneDetector.new.detect(str)
Constants
- DEFAULT_GENE_BLACKLIST
Default patterns of genes to exclude
- DEFAULT_TEXT_CHANGELIST
Default patterns of text to transform
- VERSION
Version of
GeneDetector
Public Class Methods
new( gene_blacklist: [], text_changelist: [], if_formalize: true )
click to toggle source
Initialize a gene detector @param gene_blacklist [Array] @param text_changelist [Array] @param if_formalize [Boolean]
# File lib/biotcm/apps/gene_detector.rb, line 32 def initialize( gene_blacklist: [], text_changelist: [], if_formalize: true ) @gene_blacklist = Regexp.new('(' + (DEFAULT_GENE_BLACKLIST + gene_blacklist).join(')|(') + ')') @text_changelist = DEFAULT_TEXT_CHANGELIST + text_changelist @if_formalize = if_formalize end
Public Instance Methods
detect(text)
click to toggle source
Detect genes appearing in text @param text [String] @return [Array] list of symbols
# File lib/biotcm/apps/gene_detector.rb, line 45 def detect(text) # Check dependency BioTCM::Databases::HGNC.ensure # Prepare symbols unless instance_variable_defined?(:@symbols) @symbols = BioTCM::Databases::HGNC.symbol2hgncid.keys @symbols.reject! { |sym| sym =~ @gene_blacklist } end # Transform text @text_changelist.each do |item| text.gsub!(item[0], item[1]) end # Split sentences into words and eliminate redundancies rtn = text.split(/\.\s|\s?[,:!?#()\[\]{}]\s?|\s/).uniq & @symbols # Return approved symbols @if_formalize ? rtn.map(&:to_formal_symbol).uniq : rtn end