class HexaPDF::Font::CMap

Represents a CMap, a mapping from character codes to CIDs (character IDs) or to their Unicode value.

See: PDF1.7 s9.7.5, s9.10.3; Adobe Technical Notes #5014 and #5411

Attributes

name[RW]

The name of the CMap.

ordering[RW]

The ordering part of the CMap version.

registry[RW]

The registry part of the CMap version.

supplement[RW]

The supplement part of the CMap version.

wmode[RW]

The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.

Public Class Methods

create_to_unicode_cmap(mapping) click to toggle source

Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.

See: Writer#create_to_unicode_cmap

# File lib/hexapdf/font/cmap.rb, line 83
def self.create_to_unicode_cmap(mapping)
  Writer.new.create_to_unicode_cmap(mapping)
end
for_name(name) click to toggle source

Creates a new CMap object by parsing a predefined CMap with the given name.

Raises an error if the given CMap is not found.

# File lib/hexapdf/font/cmap.rb, line 63
def self.for_name(name)
  return @cmap_cache[name] if @cmap_cache.key?(name)

  file = File.join(CMAP_DIR, name)
  if File.exist?(file)
    @cmap_cache[name] = parse(File.read(file, encoding: ::Encoding::UTF_8))
  else
    raise HexaPDF::Error, "No CMap named '#{name}' found"
  end
end
new() click to toggle source

Creates a new CMap object.

# File lib/hexapdf/font/cmap.rb, line 106
def initialize
  @codespace_ranges = []
  @cid_mapping = {}
  @cid_range_mappings = []
  @unicode_mapping = {}
end
parse(string) click to toggle source

Creates a new CMap object from the given string which needs to contain a valid CMap file.

# File lib/hexapdf/font/cmap.rb, line 75
def self.parse(string)
  Parser.new.parse(string)
end
predefined?(name) click to toggle source

Returns true if the given name specifies a predefined CMap.

# File lib/hexapdf/font/cmap.rb, line 56
def self.predefined?(name)
  File.exist?(File.join(CMAP_DIR, name))
end

Public Instance Methods

add_cid_mapping(code, cid) click to toggle source

Adds an individual mapping from character code to CID.

# File lib/hexapdf/font/cmap.rb, line 166
def add_cid_mapping(code, cid)
  @cid_mapping[code] = cid
end
add_cid_range(start_code, end_code, start_cid) click to toggle source

Adds a CID range, mapping characters codes from start_code to end_code to CIDs starting with start_cid.

# File lib/hexapdf/font/cmap.rb, line 172
def add_cid_range(start_code, end_code, start_cid)
  @cid_range_mappings << [start_code..end_code, start_cid]
end
add_codespace_range(first, *rest) click to toggle source

Add a codespace range using an array of ranges for the individual bytes.

This means that the first range is checked against the first byte, the second range against the second byte and so on.

# File lib/hexapdf/font/cmap.rb, line 125
def add_codespace_range(first, *rest)
  @codespace_ranges << [first, rest]
end
add_unicode_mapping(code, string) click to toggle source

Adds a mapping from character code to Unicode string in UTF-8 encoding.

# File lib/hexapdf/font/cmap.rb, line 191
def add_unicode_mapping(code, string)
  @unicode_mapping[code] = string
end
read_codes(string) click to toggle source

Parses the string and returns all character codes.

An error is raised if the string contains invalid bytes.

# File lib/hexapdf/font/cmap.rb, line 132
def read_codes(string)
  codes = []
  bytes = string.each_byte

  loop do
    byte = bytes.next
    code = 0

    found = @codespace_ranges.any? do |first_byte_range, rest_ranges|
      next unless first_byte_range.cover?(byte)

      code = (code << 8) + byte
      valid = rest_ranges.all? do |range|
        begin
          byte = bytes.next
        rescue StopIteration
          raise HexaPDF::Error, "Missing bytes while reading codes via CMap"
        end
        code = (code << 8) + byte
        range.cover?(byte)
      end

      codes << code if valid
    end

    unless found
      raise HexaPDF::Error, "Invalid byte while reading codes via CMap: #{byte}"
    end
  end

  codes
end
to_cid(code) click to toggle source

Returns the CID for the given character code, or 0 if no mapping was found.

# File lib/hexapdf/font/cmap.rb, line 177
def to_cid(code)
  cid = @cid_mapping.fetch(code, -1)
  if cid == -1
    @cid_range_mappings.reverse_each do |range, start_cid|
      if range.cover?(code)
        cid = start_cid + code - range.first
        break
      end
    end
  end
  (cid == -1 ? 0 : cid)
end
to_unicode(code) click to toggle source

Returns the Unicode string in UTF-8 encoding for the given character code, or nil if no mapping was found.

# File lib/hexapdf/font/cmap.rb, line 197
def to_unicode(code)
  unicode_mapping[code]
end
use_cmap(cmap) click to toggle source

Add all mappings from the given CMap to this CMap.

# File lib/hexapdf/font/cmap.rb, line 114
def use_cmap(cmap)
  @codespace_ranges.concat(cmap.codespace_ranges)
  @cid_mapping.merge!(cmap.cid_mapping)
  @cid_range_mappings.concat(cmap.cid_range_mappings)
  @unicode_mapping.merge!(cmap.unicode_mapping)
end