class FlexColumns::Contents::ColumnData
ColumnData
is one of the core classes in flex_columns
. An instance of ColumnData
represents the data present in a single row for a single flex column; it stores that data, is used to set and retrieve that data, and can serialize and deserialize itself from and to JSON (with headers and optional compression added for binary storage).
Clients do not interact with ColumnData
itself; rather, they interact with an instance of a generated subclass of FlexColumnsContentsBase, and it delegates core methods to this object.
Constants
- FLEX_COLUMN_CURRENT_VERSION_NUMBER
What’s the current version number of our storage format? Because we only have a single version right now, this is also the only version we accept.
- MIN_SIZE_REDUCTION_RATIO_FOR_COMPRESSION
What maximum fraction of the uncompressed size does a compressed string have to be before we use it in preference to the uncompressed string?
Attributes
Public Class Methods
Creates a new instance. field_set
is the FlexColumns::Definition::FieldSet
that contains the set of fields defined for this flex column; options
can contain:
- :storage_string
-
The data present in the column in the database; this can be omitted if creating an instance for a row that has no data, or for a new row.
- :data_source
-
Where did that data come from? This can be any object; it must respond to describe_flex_column_data_source (no arguments), which should return a String that is used in thrown exceptions to let the client know what data caused the problem; it also must respond to notification_hash_for_flex_column_data_source (no arguments), which should return a Hash that is used to generate the payload for the ActiveSupport::Notification calls this class makes. (This is, in practice, always an instance of the FlexColumnsContentsBase subclass generated for the column.)
- :unknown_fields
-
Must pass
:preserve
or:delete
. If there are keys in the serialized JSON that do not correspond to any fields that the FieldSet knows about, this determines what will happen to that data when re-serializing it to save::preserve
keeps that data, while:delete
removes it. (In neither case is that data actually accessible; you must declare a field if you want access to it.) - :length_limit
-
If present, specifies the maximum length of data that can be stored in the underlying storage mechanism (the column). When serializing data, this object will raise an exception if the serialized form is longer than this limit. This is used to avoid cases where the database might otherwise silently truncate the data being stored (I’m looking at you, MySQL) and hence corrupt stored data.
- :storage
-
This must be
:binary
,:text
, or :json. If:text
, standard, uncompressed JSON will always be stored. (It is not possible to store compressed data reliably in a text column, because the database will interpret the bytes as characters and may modify them or raise an exception if byte sequences are present that would be invalid characters in whatever encoding it’s using.) If :binary, then a very small header will be written that’s just for versioning (currentlyFC:01,
), followed by a marker indicating if it’s compressed (1,
) or not (0,
), followed by either standard, uncompressed JSON encoded in UTF-8 or the GZipped version of the same. If :json, then we assume the database has a native JSON type (like PostgreSQL with sufficiently-recentActiveRecord
and PG gem), and deal in an actual Hash, which the database processes directly. - :compress_if_over_length
-
If present, must be set to an integer. If
:storage
is:binary
and the JSON string is at least this many bytes long, then this class will compress it before returning its stored data (fromto_stored_data
); if the compressed version is at most 95% (MIN_SIZE_REDUCTION_RATIO_FOR_COMPRESSION
) as long as the uncompressed version, then the compressed version will be used instead. - :binary_header
-
Must be
true
orfalse
. Iffalse
, then, even if:storage
is:binary
, no header will be written to the binary column. (As a consequence, compression will also be disabled, since compression requires the header.) - :null
-
Must be
true
orfalse
. Iffalse
, assumes the underlying column in the database is defined as non-NULL (although this is not recommended), and therefore will set an empty string (“”) on the column if there’s no data in it, rather than SQLNULL
.
# File lib/flex_columns/contents/column_data.rb, line 56 def initialize(field_set, options = { }) options.assert_valid_keys(:storage_string, :data_source, :unknown_fields, :length_limit, :storage, :compress_if_over_length, :binary_header, :null) @storage_string = options[:storage_string] @field_set = field_set @data_source = options[:data_source] @unknown_fields = options[:unknown_fields] @length_limit = options[:length_limit] @storage = options[:storage] @compress_if_over_length = options[:compress_if_over_length] @binary_header = options[:binary_header] @null = options[:null] raise ArgumentError, "Invalid JSON string: #{storage_string.inspect}" if storage_string && (! storage_string.kind_of?(String)) && (! storage_string.kind_of?(Hash)) raise ArgumentError, "Must supply a FieldSet, not: #{field_set.inspect}" unless field_set.kind_of?(FlexColumns::Definition::FieldSet) raise ArgumentError, "Must supply a data source, not: #{data_source.inspect}" unless data_source raise ArgumentError, "Invalid value for :unknown_fields: #{unknown_fields.inspect}" unless [ :preserve, :delete ].include?(unknown_fields) raise ArgumentError, "Invalid value for :length_limit: #{length_limit.inspect}" if length_limit && (! (length_limit.kind_of?(Integer) && length_limit >= 8)) raise ArgumentError, "Invalid value for :storage: #{storage.inspect}" unless [ :binary, :text, :json ].include?(storage) raise ArgumentError, "Invalid value for :compress_if_over_length: #{compress_if_over_length.inspect}" if compress_if_over_length && (! compress_if_over_length.kind_of?(Integer)) raise ArgumentError, "Invalid value for :binary_header: #{binary_header.inspect}" unless [ true, false ].include?(binary_header) raise ArgumentError, "Invalid value for :null: #{null.inspect}" unless [ true, false ].include?(null) @field_contents_by_field_name = nil @unknown_field_contents_by_key = nil end
Public Instance Methods
Returns the data for the given field_name
. Raises FlexColumns::Errors::NoSuchFieldError
if there is no field of the given name. Returns nil if there is such a field, but no data for it.
# File lib/flex_columns/contents/column_data.rb, line 87 def [](field_name) field_name = validate_and_deserialize_for_field(field_name) field_contents_by_field_name[field_name] end
Sets the data for the given field_name
to the given new_value
. Raises FlexColumns::Errors::NoSuchFieldError
if there is no field of the given name. Returns new_value
.
# File lib/flex_columns/contents/column_data.rb, line 94 def []=(field_name, new_value) field_name = validate_and_deserialize_for_field(field_name) # We do this for a very good reason. When encoding as JSON, Ruby's JSON library happily accepts Symbols, but # encodes them as simple Strings in the JSON. (This makes sense, because JSON doesn't support Symbols.) This # means that if you save a value in a flex column as a Symbol, and then re-read that row from the database, # you'll get back a String, not the Symbol you put in. # # Unfortunately, this is different from what you'll get if there is no intervening save/load cycle, where it'd # otherwise stay a Symbol. This difference in behavior can be the source of some really annoying bugs. While # ActiveRecord has this annoying behavior, this is a chance to clean it up in a small way -- so, if you set a # Symbol, we return a String. (And, yes, this has no bearing on Symbols stored nested inside Arrays or Hashes; # and that's OK.) new_value = new_value.to_s if new_value.kind_of?(Symbol) old_value = field_contents_by_field_name[field_name] # We deliberately delete from the hash anything that's being set to +nil+; this is so that we don't end up just # binding keys to +nil+, and returning them in #keys, etc. (Yes, this means that you can't distinguish a key # explicitly set to +nil+ from a key that's not present; this is different from Ruby's semantics for a Hash, # but not by very much, and it makes use of +flex_columns+ a whole lot simpler.) if new_value == nil field_contents_by_field_name.delete(field_name) nil else field_contents_by_field_name[field_name] = new_value end end
Has this object been deserialized? If it’s been deserialized, then we need to do things like run validations on it, save it back to the database when someone calls save! on the parent object, and so on.
Not at all obvious: originally, we had a method called touched? that let you know whether the given object had been changed at all. It simply got set on #[]=
, above. The problem with this is that very frequently, flex_columns
is used to store complex data structures (because that’s one of the things that’s dramatically easier in a serialized JSON blob than in a traditional relational structure). But if you have an array stored, and you call << on it to append an element, then #[]=
never gets called at all – because it’s still the same object, just with different contents.
We could have worked around this by saving off a copy of each field when we deserialized, then comparing them using a deep equality (== should work just fine) to determine if they’ve changed. However, this adds very significant overhead to each and every single use of a flex_column
object, whether or not you rely on or care about this kind of tracking – we would have to dup every flex column field every single time we deserialized, and, if you have large objects in there, that can get extremely expensive.
Since almost every object in Ruby is mutable – even Strings – there aren’t really any easy wins here. Numbers are the only commonplace object that aren’t, and it’s not going to be a common use case that someone uses a flex_column
with fields that each simply store one single number. (Storing an array or a hash of numbers is much more common, but then you’re talking about Arrays and Hashes, which are back to being mutable.)
Another option would be to freeze all of the fields on a flex column, thus requiring clients to reassign them with a new object if they wanted to change them at all. That, however, presents an API that most users would hate – I don’t want to say user.prefs_map = user.prefs_map.merge(:foo => bar)
; I want to just say user.prefs_map[:foo] = bar
.
Instead, once we deserialize a field, we just assume that it has changed. While this may end up causing the client to do extra work at times, it’s much higher-performance than doing the tracking every time.
(There is definitely room to add code that would make this configurable, on a per-flex-column or even per-field basis. As always, patches are welcome; as of this writing, it seems likely that it might just not be an issue big enough to worry about.)
# File lib/flex_columns/contents/column_data.rb, line 177 def deserialized? !! field_contents_by_field_name end
Returns an Array of all field names that are currently set to something.
# File lib/flex_columns/contents/column_data.rb, line 124 def keys deserialize_if_necessary! field_contents_by_field_name.keys end
Returns a representation of this data as a Hash. This should not be used in flex_columns
to manipulate data, as it does not contain a full representation of a column (in particular, unknown-field data is not represented in the returned Hash); however, it’s useful to construct a string (e.g., FlexColumnsContentsBase#inspect) to help with debugging.
# File lib/flex_columns/contents/column_data.rb, line 133 def to_hash deserialize_if_necessary! field_contents_by_field_name.dup.with_indifferent_access end
Returns a String with the current contents of this object as JSON. (This will deserialize from JSON, if it hasn’t already happened.)
Always returns a string encoded in UTF-8, if we’re running on a Ruby >= 1.9 (that is, with encoding support).
# File lib/flex_columns/contents/column_data.rb, line 185 def to_json deserialize_if_necessary! json_hash = to_json_hash as_string = JSON.generate(json_hash, :allow_nan => true) as_string = as_string.encode(Encoding::UTF_8) if as_string.respond_to?(:encode) as_string end
Returns the exact String that should be stored in the database – compressed or not, with header or not, etc. Raises FlexColumns::Errors::JsonTooLongError
if the string is too long to fit in the database.
(Under PostgreSQL, with appropriate ActiveRecord
and PostgreSQL support,)
# File lib/flex_columns/contents/column_data.rb, line 199 def to_stored_data out = nil deserialize_if_necessary! return to_json_hash if storage == :json instrument("serialize") do if storage == :json out = to_json_hash else out = to_json if out.length < 8 && out =~ /^\s*\{\s*\}\s*$/i out = @null ? nil : "" else out = to_binary_storage(out) if storage == :binary end end end actual_length = out ? out.length : 0 if length_limit && actual_length > length_limit raise FlexColumns::Errors::JsonTooLongError.new(data_source, length_limit, out) end out end
Does nothing, other than making sure the JSON has been deserialized. This therefore has the effect both of ensuring that the stored data (if any) is valid, and also will remove any unknown keys (on save) if :unknown_fields
was set to :delete
.
# File lib/flex_columns/contents/column_data.rb, line 141 def touch! deserialize_if_necessary! end
Private Instance Methods
Compresses a string with GZip and returns its compressed representation.
# File lib/flex_columns/contents/column_data.rb, line 306 def compress(json_string) stream = StringIO.new("w") writer = Zlib::GzipWriter.new(stream) writer.write(json_string) writer.close stream.string end
Decompresses a GZipped string and returns the decompressed version.
# File lib/flex_columns/contents/column_data.rb, line 316 def decompress(data, raw_data) begin input = StringIO.new(data, "r") reader = Zlib::GzipReader.new(input) reader.read rescue Zlib::GzipFile::Error => gze raise FlexColumns::Errors::InvalidCompressedDataInDatabaseError.new(data_source, raw_data, gze) end end
If we haven’t yet deserialized the JSON string, do it now, and store the data appropriately. This also checks for a validly-encoded string.
# File lib/flex_columns/contents/column_data.rb, line 384 def deserialize_if_necessary! unless deserialized? raw_data = storage_string || '' # PostgreSQL's JSON data type, combined with recent-enough adapters and ActiveRecord, will return JSON as a # Hash directly from the driver (!). if raw_data.kind_of?(Hash) store_fields!(raw_data) return end if raw_data.respond_to?(:valid_encoding?) && (! raw_data.valid_encoding?) raise FlexColumns::Errors::IncorrectlyEncodedStringInDatabaseError.new(data_source, raw_data) end if raw_data.strip.length > 0 parsed = instrument("deserialize", :raw_data => raw_data) do parse_json(from_stored_data(raw_data)) end store_fields!(parsed) else @field_contents_by_field_name = { } @unknown_field_contents_by_key = { } end end end
Given a storage string, returns a pure-JSON string. This involves looking for a header, and, if it’s present, validating it and uncompressing the content (if compressed).
# File lib/flex_columns/contents/column_data.rb, line 328 def from_stored_data(storage_string) if storage_string =~ /^(FC:(\d+),(\d+),)/i prefix = $1 version_number = Integer($2) compressed = Integer($3) remaining_data = storage_string[prefix.length..-1] if version_number > FLEX_COLUMN_CURRENT_VERSION_NUMBER raise FlexColumns::Errors::InvalidFlexColumnsVersionNumberInDatabaseError.new( data_source, storage_string, version_number, FLEX_COLUMN_CURRENT_VERSION_NUMBER) end case compressed when 0 then remaining_data when 1 then decompress(remaining_data, storage_string) else raise FlexColumns::Errors::InvalidDataInDatabaseError.new( data_source, storage_string, "the compression number was #{compressed.inspect}, not 0 or 1.") end else storage_string end end
Fires the appropriate flex_columns
notification with the given name
, any additional
options in the payload, wrapped around the supplied block.
# File lib/flex_columns/contents/column_data.rb, line 256 def instrument(name, additional = { }, &block) ::ActiveSupport::Notifications.instrument("flex_columns.#{name}", data_source.notification_hash_for_flex_column_data_source.merge(additional), &block) end
Parses JSON. This just adds exception handling that tells you exactly where the failure was.
# File lib/flex_columns/contents/column_data.rb, line 352 def parse_json(json) out = begin JSON.parse(json, :allow_nan => true) rescue ::JSON::ParserError => pe raise FlexColumns::Errors::UnparseableJsonInDatabaseError.new(data_source, json, pe) end unless out.kind_of?(Hash) raise FlexColumns::Errors::InvalidJsonInDatabaseError.new(data_source, json, out) end out end
Given a hash returned by parsing JSON, stores the data away in either @field_contents_by_field_name or @unknown_field_contents_by_key, depending on whether the data matches one of our fields or not.
# File lib/flex_columns/contents/column_data.rb, line 368 def store_fields!(parsed_hash) @field_contents_by_field_name = { } @unknown_field_contents_by_key = { } parsed_hash.each do |field_name, field_value| field = field_set.field_with_json_storage_name(field_name) if field @field_contents_by_field_name[field.field_name] = field_value else @unknown_field_contents_by_key[field_name] = field_value end end end
Given a JSON string, returns the appropriate binary-storage string. This is the method that figures out whether we should compress the data or not and applies the binary header, if appropriate.
# File lib/flex_columns/contents/column_data.rb, line 275 def to_binary_storage(json_string) json_string = json_string.force_encoding(Encoding::BINARY) if json_string.respond_to?(:force_encoding) return json_string if (! binary_header) header = "FC:%02d," % FLEX_COLUMN_CURRENT_VERSION_NUMBER json_length = if json_string.respond_to?(:bytesize) then json_string.bytesize else json_string.length end if compress_if_over_length && json_length > compress_if_over_length compressed = compress(json_string) compressed.force_encoding(Encoding::BINARY) if compressed.respond_to?(:force_encoding) compressed = header + "1," + compressed compressed.force_encoding(Encoding::BINARY) if compressed.respond_to?(:force_encoding) end compressed_length = if compressed if compressed.respond_to?(:bytesize) compressed.bytesize else compressed.length end end if compressed_length && compressed_length < (MIN_SIZE_REDUCTION_RATIO_FOR_COMPRESSION * json_length) compressed else header + "0," + json_string end end
Returns a Hash with exactly the key-to-value mappings that we’d store as JSON – that is, uses fields’ JSON storage aliases, not field names, and omits unknown fields if unknown_fields == :delete
.
# File lib/flex_columns/contents/column_data.rb, line 242 def to_json_hash json_hash = { } json_hash.merge!(unknown_field_contents_by_key) unless unknown_fields == :delete field_contents_by_field_name.each do |field_name, field_contents| storage_name = field_set.field_named(field_name).json_storage_name json_hash[storage_name] = field_contents end json_hash end
Given a field_name
, ensures that that is, in fact, a valid field name, and that we have been deserialized. Used for implementing []
and []=
.
# File lib/flex_columns/contents/column_data.rb, line 262 def validate_and_deserialize_for_field(field_name) field = field_set.field_named(field_name) unless field raise FlexColumns::Errors::NoSuchFieldError.new(data_source, field_name, field_set.all_field_names) end deserialize_if_necessary! field.field_name end