class Strelka::MultipartParser

A parser for extracting uploaded files and parameters from the body of a multipart/form-data request.

Synopsis

require 'strelka/multipartmimeparser'

parser = Strelka::MultipartMimeParser.new
files, params = parser.parse( io, '---boundary' )

Authors

Constants

BLANK_LINE_REGEXP

Pattern for matching a blank line

CRLF_REGEXP

Line-ending regexp. Supports UNIX line-endings for testing.

HEADER_FIELD_EOL

Line-ending for RFC5322 header fields; EOL not followed by a WSP char

Attributes

buffer[R]

The current buffer for unparsed data

fields[R]

Parsed form fields

Public Class Methods

new( io, boundary ) click to toggle source

Create a new Strelka::MultipartMimeParser

# File lib/strelka/multipartparser.rb, line 72
def initialize( io, boundary )
        io        = StringIO.new( io ) unless io.respond_to?( :read )
        boundary  = '--' + boundary # unless boundary.start_with?( '--' )

        @bufsize  = self.class.bufsize || self.class.defaults.spooldir
        @spooldir = self.class.spooldir || Pathname( self.class.defaults.spooldir )
        @io       = io
        @boundary = boundary
        @fields   = {}
        @buffer   = String.new( encoding: io.internal_encoding || io.external_encoding )

        # Ensure that the buffer can contain at least a whole boundary,
        # otherwise we can't scan for it.
        @bufsize  = @boundary.bytesize * 1.5 if @bufsize < @boundary.bytesize * 1.5
        @spooldir.mkpath
end

Public Instance Methods

parse() click to toggle source

Parse the form data from the IO and return it as a Hash.

# File lib/strelka/multipartparser.rb, line 102
def parse
        self.log.debug "Starting parse: %p" % [ self ]

        # Strip off the initial boundary
        self.strip_boundary or
                raise Strelka::ParseError, "No initial boundary"

        # Now scan until we see the ending boundary (the one with the trailing '--')
        self.scan_part until @buffer.start_with?( '--' )

        self.log.debug "Finished parse. %d fields" % [ self.fields.length ]
        return self.fields
end

Protected Instance Methods

read_at_least( bytecount ) click to toggle source

Read at least bytecount bytes from the io, appending the data onto the buffer.

# File lib/strelka/multipartparser.rb, line 302
def read_at_least( bytecount )
        # self.log.debug "Reading at least %d bytes from %p." % [ bytecount, @io ]

        if @io.eof?
                # self.log.debug "  input stream at EOF. Returning."
                return false
        end

        self.read_some_more until
                @buffer.bytesize >= bytecount || @io.eof?

        return true
end
read_chunk() click to toggle source

Read data from the state's IO until the buffer contains at least the number of bytes in the chunksize, or the IO is at EOF.

# File lib/strelka/multipartparser.rb, line 293
def read_chunk
        # self.log.debug "Reading a new chunk."
        self.read_at_least( @bufsize )
        # self.log.debug "  buffer is now: %p" % [ @buffer ]
end
read_some_more() click to toggle source

Try to read another chunk of data into the buffer of the given state, returning true unless the state's IO is at eof.

# File lib/strelka/multipartparser.rb, line 319
def read_some_more
        # self.log.debug "Reading more data from %p..." % [ @io ]
        return false if @io.eof?
        startsize = @buffer.bytesize

        @buffer << @io.read( @bufsize )
        # self.log.debug "  after reading, buffer has %d bytes." % [ @buffer.bytesize ]

        until @buffer.bytesize > startsize
                return false if @io.eof?
                Thread.pass
                @buffer << @io.read( @bufsize )
        end

        return true
end
scan_file_field( filename, headers ) click to toggle source

Scan the body of the current document part, spooling the data to a tempfile on disk and returning the resulting filehandle.

# File lib/strelka/multipartparser.rb, line 215
def scan_file_field( filename, headers )
        self.log.info "Parsing file '%s'" % [ filename ]

        io, size = self.spool_file_upload

        io.extend( FileInputField )
        io.filename       = filename
        io.content_type   = headers['content-type']
        io.content_length = size

        self.log.debug "Scanned file %p to: %s (%d bytes)" % [ io.filename, io.path, size ]
        return io
end
scan_headers() click to toggle source

Scan the buffer for MIME headers and return them as a Hash.

# File lib/strelka/multipartparser.rb, line 160
def scan_headers
        headerlines = ''

        @buffer.slice!( /^#{CRLF_REGEXP}/ )

        # Find the headers
        while headerlines.empty?
                if pos = @buffer.index( BLANK_LINE_REGEXP )
                        headerlines = @buffer.slice!( 0, pos )
                else
                        self.log.debug "Couldn't find a blank line in the first %d bytes (%p)" %
                                [ @buffer.bytesize, @buffer[0..100] ]
                        self.read_at_least( @bufsize ) or
                                raise Strelka::ParseError, "EOF while searching for headers"
                end
        end

        # put headers into a hash
        headers = headerlines.strip.split( HEADER_FIELD_EOL ).inject({}) {|hash, line|
                line.gsub!( CRLF_REGEXP, '' ) # Un-fold long headers
                key, val = line.split( /:\s*/, 2 )
                hash[ key.downcase ] = val
                hash
        }
        self.log.debug "Scanned headers: %p" % [headers]

        # remove headers from parse buffer
        @buffer.slice!( /^#{BLANK_LINE_REGEXP}/ )

        return headers
end
scan_part() click to toggle source

Scan a part from the buffer.

# File lib/strelka/multipartparser.rb, line 123
def scan_part
        headers = self.scan_headers
        disposition = headers['content-disposition']

        raise UnimplementedError, "don't know what to do with %p parts" % [ disposition ] unless
                disposition.start_with?( 'form-data' )
        key = disposition[ /\bname="(\S+)"/i, 1 ] or
                raise Strelka::ParseError, "no field name: %p" % [ disposition ]
        val = nil

        # :TODO: Support for content-type and content-transfer-encoding headers for parts.

        # If it's a file, spool it out to a tempfile
        if disposition =~ /\bfilename=/i
                file = disposition[ /\bfilename="(?:.*\\)?(.+?)"/, 1 ] or return nil
                self.log.debug "Parsing an uploaded file %p (%p)" % [ key, file ]
                val = self.scan_file_field( file, headers )

        # otherwise just read it as a regular parameter
        else
                self.log.debug "Parsing a form parameter (%p)" % [ key ]
                val = self.scan_regular_field( key )
        end

        # Convert the value to an Array if there are more than one
        if @fields.key?( key )
                @fields[ key ] = [ @fields[key] ] unless @fields[ key ].is_a?( Array )
                @fields[ key ] << val
        else
                @fields[ key ] = val
        end

        self.strip_boundary
end
scan_regular_field( key ) click to toggle source

Scan the value after the scan pointer for the specified metadata parameter.

# File lib/strelka/multipartparser.rb, line 195
def scan_regular_field( key )
        param = ''

        self.log.debug "Scanning form parameter: %p" % [key]
        while param.empty?
                if start = @buffer.index( @boundary )
                        self.log.debug "Found the end of the parameter."
                        param = @buffer.slice!( 0, start )
                else
                        self.read_some_more or raise Strelka::ParseError,
                                "EOF while scanning a form parameter"
                end
        end

        return param.chomp
end
spool_file_upload() click to toggle source

Scan the file data and metadata in the given scannner, spooling the file data into a temporary file. Returns the tempfile object and a hash of metadata.

# File lib/strelka/multipartparser.rb, line 233
def spool_file_upload
        self.log.debug "Spooling file from upload"
        tmpfile = Tempfile.open( 'filedata', @spooldir.to_s, encoding: 'ascii-8bit' )
        size = 0

        # :TODO: Use mmap(2) to map the resulting IOs from mongrel's spool file
        # rather than writing them all out to disk a second time.
        until tmpfile.closed?

                # look for end, store everything until boundary
                if start = @buffer.index( @boundary )
                        self.log.debug "Found the end of the file"
                        leavings = @buffer.slice!( 0, start )
                        leavings.slice!( -2, 2 ) # trailing CRLF
                        tmpfile.write( leavings )
                        size += leavings.length
                        tmpfile.close

                # not at the end yet, buffer this chunker to disk
                elsif @buffer.bytesize >= @bufsize
                        # make sure we're never writing a portion of the boundary
                        # out while we're buffering
                        buf = @buffer.slice!( 0, @buffer.bytesize - @bufsize )
                        # self.log.debug "  writing %d bytes" % [ buf.bytesize ]
                        tmpfile.print( buf )
                        size += buf.bytesize
                end

                # put some more data into the buffer
                unless tmpfile.closed?
                        self.read_some_more or
                                raise Strelka::ParseError, "EOF while spooling file upload"
                end
        end

        return tmpfile, size
end
strip( pat ) click to toggle source

Strip data from the head of the buffer that matches pat, returning it if successful, or returning nil if not. The matched data should fit within the parser's chunk size.

# File lib/strelka/multipartparser.rb, line 275
def strip( pat )
        self.read_chunk
        return nil unless @buffer.index( pat ) == 0
        @buffer.slice!( pat )
end
strip_boundary() click to toggle source

Strip the boundary that's at the front of the buffer, reading more data into it as necessary. Returns the boundary if successful, or nil if there wasn't a boundary in the buffer.

# File lib/strelka/multipartparser.rb, line 285
def strip_boundary
        self.log.debug "Stripping boundary:\n%p at:\n%p" % [ @boundary, @buffer[0,40] ]
        self.strip( @boundary )
end