module Mspire::Fasta
A convenience class for working with fasta formatted sequence databases. the file which includes this class also includes Enumerable
with Bio::FlatFile
so you can do things like this:
accessions = Mspire::Fasta.open("file.fasta") do |fasta| fasta.map(&:accession) end
A few aliases are added to Bio::FastaFormat
entry.header == entry.definition entry.sequence == entry.seq
Mspire::Fasta.new
accepts both an IO
object or a String (a fasta formatted string itself)
# taking an io object: File.open("file.fasta") do |io| fasta = Mspire::Fasta.new(io) ... do something with it end # taking a string string = ">id1 a simple header\nAAASDDEEEDDD\n>id2 header again\nPPPPPPWWWWWWTTTTYY\n" fasta = Mspire::Fasta.new(string) (simple, not_simple) = fasta.partition {|entry| entry.header =~ /simple/ }
Public Class Methods
foreach(file, &block)
click to toggle source
yields each Bio::FastaFormat
object in turn
# File lib/mspire/fasta.rb, line 48 def self.foreach(file, &block) block or return enum_for(__method__, file) Bio::FlatFile.open(Bio::FastaFormat, file) do |fasta| fasta.each(&block) end end
new(io)
click to toggle source
takes an IO
object or a string that is the fasta data itself
# File lib/mspire/fasta.rb, line 56 def self.new(io) io = StringIO.new(io) if io.is_a?(String) Bio::FlatFile.new(Bio::FastaFormat, io) end
open(file, &block)
click to toggle source
opens the flatfile and yields a Bio::FlatFile
object
# File lib/mspire/fasta.rb, line 43 def self.open(file, &block) Bio::FlatFile.open(Bio::FastaFormat, file, &block) end
uniprot_id(header)
click to toggle source
takes the header string and returns the uniprot id
'sp|Q04917|1433F_HUMAN' #=> 'Q04917'
This can also be found with BioFastaFormat#accession (but it may be much slower)
# File lib/mspire/fasta.rb, line 66 def self.uniprot_id(header) header[/^[^\|]+\|([^\|]+)\|/, 1] end