class CTioga2::Data::Backends::TextBackend
Constants
- InvalidLineRE
A line is invalid if it is blank or starts neither with a digit nor +, - or .
Maybe to be improved later.
Public Class Methods
CTioga2::Data::Backends::Backend::new
# File lib/ctioga2/data/backends/backends/text.rb, line 87 def initialize @dummy = nil @current = nil # Current is the name of the last file used. Necessary for '' specs. @current_data = nil # The data of the last file used. @skip = 0 @included_modules = [NaN] # to make sure we give them to # Dvector.compute_formula @default_column_spec = "1:2" @separator = /\s+/ # We don't split data by default. @split = false @param_regex = nil @header_line_regex = /^\#\#\s*/ super() # Override Backend's cache - for now. @cache = {} # A cache file_name -> data @param_cache = {} # Same thing as cache, but for parameters @headers_cache = {} # Same thing as cache, but for header # lines. end
Public Instance Methods
Expands specifications into few sets. This function will separate the set into a file spec and a col spec. Within the col spec, the 2##6 keyword is used to expand to 2,3,4,5,6. 2## followed by a non-digit expands to 2,…,last column in the file. For now, the expansions stops on the first occurence found, and the second form doesn't work yet. But soon…
CTioga2::Data::Backends::Backend#expand_sets
# File lib/ctioga2/data/backends/backends/text.rb, line 129 def expand_sets(spec) if m = /(\d+)##(\D|$)/.match(spec) a = m[1].to_i trail = m[2] b = read_file(spec) b = (b.length - 1) ret = [] a.upto(b) do |i| ret << m.pre_match + i.to_s + trail + m.post_match end return ret else m = Dir::glob(spec) if m.size > 0 m.sort! return m else return super end end end
# File lib/ctioga2/data/backends/backends/text.rb, line 118 def extend(mod) super @included_modules << mod end
Protected Instance Methods
Gets the data corresponding to the given column. If compute_formulas is true, the column specification is taken to be a formula (in the spirit of gnuplot's)
# File lib/ctioga2/data/backends/backends/text.rb, line 354 def get_data_column(column, compute_formulas = false, parameters = nil, header = nil) if compute_formulas formula = Utils::parse_formula(column, parameters, header) debug { "Using formula #{formula} for column spec: #{column}" } return Ruby.compute_formula(formula, @current_data, @included_modules) else if @current_data[column.to_i] return @current_data[column.to_i].dup else raise "Cannot find column number #{column.to_i} -- maybe you got the column separator wrong ?" end end end
Returns a IO object suitable to acquire data from it for the given file, which can be one of the following:
-
a real file name
-
a compressed file name
-
a pipe command.
# File lib/ctioga2/data/backends/backends/text.rb, line 160 def get_io_object(file) if file == "-" return $stdin elsif file =~ /(.*?)\|\s*$/ # A pipe return IO.popen($1) else return Utils::open(file) end end
Returns an IO object corresponding to the given file.
# File lib/ctioga2/data/backends/backends/text.rb, line 209 def get_io_set(file) if not @split return get_io_object(file) else file =~ /(.*?)(?:#(\d+))?$/; # ; to make ruby-mode indent correctly. filename = $1 if $2 set = $2.to_i else set = 1 end debug { "Trying to get set #{set} from file '#{filename}'" } str = get_set_string(get_io_object(filename), set) return StringIO.new(str) end end
Returns a string corresponding to the given set of the given io object.
Sets are 1-based.
# File lib/ctioga2/data/backends/backends/text.rb, line 180 def get_set_string(io, set) cur_set = 1 last_line_is_invalid = true str = "" line_number = 0 while line = io.gets line_number += 1 if line =~ InvalidLineRE debug { "Found invalid line at #{line_number}" } if ! last_line_is_invalid # We begin a new set. cur_set += 1 debug { "Found set #{cur_set} at line #{line_number}" } if(cur_set > set) return str end end last_line_is_invalid = true else last_line_is_invalid = false if cur_set == set str += line end end end return str end
A proper writer for @param_regex
# File lib/ctioga2/data/backends/backends/text.rb, line 228 def param_regex=(val) if val.is_a? Regexp @param_regex = val elsif val =~ /([^\\]|^)\(/ # Has capturing groups @param_regex = /#{val}/ else # Treat as separator @param_regex = /(\S+)\s*#{val}\s*(\S+)/ end end
Turns an array of comments into a hash column name -> column number (1-based)
# File lib/ctioga2/data/backends/backends/text.rb, line 251 def parse_header_line(comments) for line in comments if line =~ @header_line_regex colnames = line.gsub(@header_line_regex,'').split(@separator) i = 1 ret = {} for n in colnames ret[n] = i i += 1 end return ret end end return {} end
Turns an array of comments into a hash -> value
# File lib/ctioga2/data/backends/backends/text.rb, line 239 def parse_parameters(comments) ret = {} for line in comments if line =~ @param_regex ret[$1] = $2.to_f end end return ret end
This is called by the architecture to get the data. It splits the set name into filename@cols, reads the file if necessary and calls get_data
# File lib/ctioga2/data/backends/backends/text.rb, line 325 def query_dataset(set) if set =~ /(.*)@(.*)/ col_spec = $2 file = $1 else col_spec = @default_column_spec file = set end if file.length > 0 @current_data = read_file(file) @current = file end # Wether we need or not to compute formulas: if col_spec =~ /\$/ compute_formulas = true else compute_formulas = false end return Dataset.dataset_from_spec(set, col_spec) do |col| get_data_column(col, compute_formulas, @current_parameters, @current_header) end end
Reads data from a file. If needed, extract the file from the columns specification.
todo the cache really should include things such as time of last modification and various parameters that influence the reading of the file, and the parameters read from the file using parse_parameters
todo There should be a real global handling of meta-data extracted from files, so that they could be included for instance in the automatic labels ? (and we could have fun improving this one ?)
@todo There should be a way to read pure text columns and use them somehow, to annotate the output ? This should be implemented at the Tioga level, though (both for reading, in fancy_read, and for using hover stuff)
warning This needs Tioga r561
# File lib/ctioga2/data/backends/backends/text.rb, line 286 def read_file(file) if file =~ /(.*)@.*/ file = $1 end name = file # As file will be modified. if ! @cache.key?(file) # Read the file if it is not cached. comments = [] fancy_read_options = {'index_col' => true, 'skip_first' => @skip, 'sep' => @separator, 'comment_out' => comments } io_set = get_io_set(file) debug { "Fancy read '#{file}', options #{fancy_read_options.inspect}" } @cache[name] = Dvector.fancy_read(io_set, nil, fancy_read_options) if @param_regex # Now parsing params @param_cache[name] = parse_parameters(comments) info { "Read #{@param_cache[name].size} parameters from #{name}" } debug { "Parameters read: #{@param_cache[name].inspect}" } end if @header_line_regex @headers_cache[name] = parse_header_line(comments) info { "Read #{@headers_cache[name].size} column names from #{name}" } debug { "Got: #{@headers_cache[name].inspect}" } end end ## @todo These are not very satisfying; ideally, the data ## information should be embedded into @cache[name] rather ## than as external variables. Well... @current_parameters = @param_cache[name] @current_header = @headers_cache[name] return @cache[name] end