class Subconv::Scc::Reader

SCC reader Parse and render an SCC file sequentially into a background and foreground grid like a TV set would do and store the resulting closed captions as grid snapshots into an array whenever the foreground grid changes.

Only captions in data channel 1 are read. Also, invalid byte parity will raise an error unless checking is disabled. The advanced recovery methods mentioned in CEA608 are not implemented since the source is assumed to contain no errors (e.g. DVD source).

Constants

EXTENDED_CHARACTER_MAPS

Extended character maps for Western European languages and box drawing

LINE_REGEXP

Regular expression for parsing one line of data

PREAMBLE_ADDRESS_CODE_ROW_MAP

Map of preamble address code high bytes to their corresponding base row numbers (counted from 0)

SPECIAL_CHARACTER_MAP

Map of special characters to unicode codepoints 0x39 transparent space is handled specially since it is not a real character

STANDARD_CHARACTER_MAP

Map of standard characters that do not match the standard ASCII codes to their corresponding unicode characters

Attributes

captions[R]

Actual conversion result

Public Instance Methods

read(io, fps, check_parity = true) click to toggle source

Read an SCC file from the IO object io for a video

# File lib/subconv/scc/reader.rb, line 304
def read(io, fps, check_parity = true)
  # Initialize new grids for character storage
  @foreground_grid = Grid.new
  @background_grid = Grid.new
  # Initialize state
  @state = State.default
  @captions = []
  @now = Timecode.new(0, fps)
  @data_channel = 0

  update_active_grid

  magic = io.readline.chomp!
  fail InvalidFormatError, 'File does not start with "' + Scc::FILE_MAGIC + '"' unless Scc::FILE_MAGIC == magic

  io.each_line do |line|
    line.chomp!
    # Skip empty lines between the commands
    next if line.empty?

    line_data = LINE_REGEXP.match(line)
    fail InvalidFormatError, "Invalid line \"#{line}\"" if line_data.nil?

    # Parse timecode
    old_time = @now
    timecode = Timecode.new(line_data[:timecode], fps)
    @now = timecode
    fail InvalidFormatError, 'New timecode is behind last time' if @now < old_time

    # Parse data words
    parse_data(line_data[:data], check_parity)
  end
end

Private Instance Methods

correct_parity?(byte) click to toggle source

Check a byte for odd parity

# File lib/subconv/scc/reader.rb, line 593
def correct_parity?(byte)
  byte.ord.to_s(2).count('1').odd?
end
handle_character(byte) click to toggle source

Insert a CEA608 character into the grid at the current position, converting it to its unicode representation

# File lib/subconv/scc/reader.rb, line 435
def handle_character(byte)
  # Ignore filler character
  return if byte.zero?

  char = STANDARD_CHARACTER_MAP[byte.chr]
  insert_character(char)
end
handle_control_code(hi, lo) click to toggle source

Process a miscellaneous control code

# File lib/subconv/scc/reader.rb, line 501
def handle_control_code(hi, lo)
  if hi == 0x14 && lo == 0x20
    # Resume caption loading
    @state.mode = :pop_on
    update_active_grid
  elsif hi == 0x14 && lo == 0x21
    # Backspace
    unless @state.column.zero? # Ignore in the first column
      @state.column -= 1
      # Delete character at cursor after moving one character back
      @active_grid[@state.row][@state.column] = nil
    end
  elsif hi == 0x14 && lo == 0x24
    # Delete to end of row
    (@state.column...GRID_COLUMNS).each do |column|
      @active_grid[@state.row][column] = nil
    end
  elsif hi == 0x14 && lo == 0x28
    # Flash on
    # Flash is a spacing character
    insert_character(' ')
    @state.style.flash = true
  elsif hi == 0x14 && lo == 0x29
    # Resume direct captioning
    @state.mode = :paint_on
    update_active_grid
    # elsif hi == 0x14 && lo == 0x2b
    # Resume text display -> command not described in spec
    # fail "RTD"
  elsif hi == 0x14 && lo == 0x2c
    # Erase displayed memory
    @foreground_grid = Grid.new
    post_frame
    update_active_grid
  elsif hi == 0x14 && lo == 0x2e
    # Erase non-displayed memory
    @background_grid = Grid.new
    update_active_grid
  elsif hi == 0x14 && lo == 0x2f
    # End of caption (flip memories)
    @foreground_grid, @background_grid = @background_grid, @foreground_grid
    # This also forces pop-on mode
    @state.mode = :pop_on
    post_frame

    update_active_grid
  elsif hi == 0x17 && lo >= 0x21 && lo <= 0x23
    # Tab offset
    # Bits 0 and 1 designate how many columns to go
    @state.column += (lo & 0x3)
  else
    puts "Ignoring unknown control code #{hi}/#{lo}"
  end
end
handle_extended_character(map, byte) click to toggle source

Insert an extended character into the grid at the current position, converting it to its Unicode representation

# File lib/subconv/scc/reader.rb, line 457
def handle_extended_character(map, byte)
  char = EXTENDED_CHARACTER_MAPS.fetch(map).fetch(byte)
  # Extended characters include automatic backspace+overwrite
  @state.column -= 1
  insert_character(char)
  @state.char_replaced = true
end
handle_mid_row_code(_hi, lo) click to toggle source

Process a mid-row code

# File lib/subconv/scc/reader.rb, line 557
def handle_mid_row_code(_hi, lo)
  # Mid-row codes are spacing characters
  insert_character(' ')
  # Low byte bit 0 indicates whether underlining is to be enabled
  @state.style.underline = ((lo & 1) == 1)
  # Low byte bits 1 to 3 are the color code
  color = (lo >> 1) & 0x7

  if color == 0x7
    @state.style.italics = true
  else
    # Color mid-row codes disable italics
    @state.style.italics = false
    @state.style.color = Color.for_value(color)
  end
  # All mid-row codes always disable flash
  @state.style.flash = false
end
handle_preamble_address_code(hi, lo) click to toggle source

Set drawing position and style according to the information in a preamble address code

# File lib/subconv/scc/reader.rb, line 466
def handle_preamble_address_code(hi, lo)
  @state.row = PREAMBLE_ADDRESS_CODE_ROW_MAP.fetch(hi)
  # Low byte bit 5 adds 1 to the row number if set
  @state.row += 1 if lo & (1 << 5) != 0

  # Low byte bit 0 indicates whether underlining is to be enabled
  @state.style.underline = ((lo & 1) == 1)
  # Low byte bit 4 indicates whether it is an indent or a formatting code
  is_indent = (((lo >> 4) & 1) == 1)
  # Low byte bits 1 to 3 are the color or indent code, depending on is_indent
  color_or_indent = (lo >> 1) & 0x7

  # Reset style
  @state.style.flash = false
  @state.style.italics = false

  if is_indent
    # Indent code always sets white as color attribute
    @state.style.color = Color::WHITE
    # One indent equals 4 characters
    @state.column = color_or_indent * 4
  else
    # Style code always sets first column
    @state.column = 0
    if color_or_indent == 7
      # "color" 7 is white with italics
      @state.style.color = Color::WHITE
      @state.style.italics = true
    else
      @state.style.color = Color.for_value(color_or_indent)
    end
  end
end
handle_special_character(byte) click to toggle source

Insert a special character into the grid at the current position, or delete the current column in case of a transparent space.

# File lib/subconv/scc/reader.rb, line 445
def handle_special_character(byte)
  if byte == 0x39
    # Transparent space: Move cursor after deleting the current column to open up a hole
    @active_grid[@state.row][@state.column] = nil
    @state.column += 1
  else
    char = SPECIAL_CHARACTER_MAP.fetch(byte)
    insert_character(char)
  end
end
insert_character(char) click to toggle source

Insert one unicode character into the grid at the current position and with the current style, then advance the cursor one column

# File lib/subconv/scc/reader.rb, line 429
def insert_character(char)
  @active_grid[@state.row][@state.column] = Character.new(char, @state.style.dup)
  @state.column += 1
end
parse_data(data, check_parity) click to toggle source

Parse one line of SCC data

# File lib/subconv/scc/reader.rb, line 341
def parse_data(data, check_parity)
  last_command = [0, 0]

  data.split(' ').each do |word_string|
    begin
      @state.start_new_frame

      # Decode hexadecimal word into two-byte string
      word = [word_string].pack('H*')
      # Check parity
      fail ParityError, "At least one byte in word #{word_string} has even parity, odd required" unless !check_parity || (correct_parity?(word[0]) && correct_parity?(word[1]))

      # Remove parity bit for further processing
      word = word.bytes.collect { |byte|
        # Unset 8th bit
        (byte & ~(1 << 7))
      }

      hi, lo = word

      # First check if the word contains characters only
      if hi >= 0x20 && hi <= 0x7f
        # Skip characters if last command was on different channel
        if @data_channel != 0
          puts 'Skipping characters on channel 2'
          next
        end

        [hi, lo].each do |byte|
          handle_character(byte)
        end

        # Reset last command
        last_command = [0, 0]
      else
        if word == last_command
          # Skip commands transmitted twice for redundancy
          # But don't skip the next time, too
          last_command = [0, 0]
          next
        end

        # Channel information is encoded in the 4th bit, read it out
        @data_channel = (hi >> 3) & 1
        if @data_channel != 0
          puts 'Skipping command on channel 2'
          next
          # If channel 2 processing is needed, parse the file two times and
          # change the above condition as needed, then unset the channel bit
          # for further processing.
        end

        # rubocop:disable Style/NumericPredicate

        if hi == 0x11 && (0x30..0x3f).cover?(lo)
          # Special character
          handle_special_character(lo)
        elsif (0x12..0x13).cover?(hi) && (0x20..0x3f).cover?(lo)
          # Extended character
          handle_extended_character(hi & 1, lo)
        elsif (0x10..0x17).cover?(hi) && (0x40..0xff).cover?(lo)
          # Preamble address code
          handle_preamble_address_code(hi, lo)
        elsif [0x14, 0x17].include?(hi) && (0x20..0x2f).cover?(lo)
          handle_control_code(hi, lo)
        elsif hi == 0x11 && (0x20..0x2f).cover?(lo)
          handle_mid_row_code(hi, lo)
        elsif hi == 0x00 && lo == 0x00
          # Ignore filler
        else
          puts "Ignoring unknown command #{hi}/#{lo}"
        end

        # rubocop:enable Style/NumericPredicate

        last_command = word
      end

      post_frame if @state.paint_on_mode?
    ensure
      # Advance one frame for each word read
      @now += 1
    end
  end
end
post_frame() click to toggle source

Insert the currently displayed foreground grid as caption into the captions array Must be called whenever the foreground grid is changed as a result of a command

# File lib/subconv/scc/reader.rb, line 578
def post_frame
  # Only push a new caption if the grid has changed, but do not push out an empty grid initially
  return unless @foreground_grid != @last_grid && !(@last_grid.nil? && @foreground_grid.empty?)

  # Save space by not saving the grid if it is completely empty
  grid = @foreground_grid.empty? ? nil : @foreground_grid
  @captions.push(Caption.new(timecode: @now, grid: grid.clone, mode: @state.mode, char_replacement: @state.char_replaced))
  @last_grid = @foreground_grid.clone
end
update_active_grid() click to toggle source
# File lib/subconv/scc/reader.rb, line 588
def update_active_grid
  @active_grid = @state.paint_on_mode? ? @foreground_grid : @background_grid
end