H1P - a blocking HTTP/1 parser for Ruby

H1P is a blocking/synchronous HTTP/1 parser for Ruby with a simple and intuitive API. Its design lends itself to writing HTTP servers in a sequential style. As such, it might prove useful in conjunction with the new fiber scheduler introduced in Ruby 3.0, but is also useful with a normal thread-based server (see example.) The H1P was originally written as part of Tipi, a web server running on top of Polyphony.

H1P is still a very young project and as such should be used with caution. It has not undergone any significant conformance or security testing, and its API is not yet stable.

Features

Installing

If you're using bundler just add it to your Gemfile:

source 'https://rubygems.org'

gem 'h1p'

You can then run bundle install to install it. Otherwise, just run gem install h1p.

Usage

Start by creating an instance of H1P::Parser, passing a connection instance:

require 'h1p'

parser = H1P::Parser.new(conn)

To read the next request from the connection, call #parse_headers:

loop do
  headers = parser.parse_headers
  break unless headers

  handle_request(headers)
end

The #parse_headers method returns a single hash containing the different HTTP headers. In case the client has closed the connection, #parse_headers will return nil (see the guard clause above).

In addition to the header keys and values, the resulting hash also contains the following “pseudo-headers”:

The header keys are always lower-cased. Consider the following HTTP request:

GET /foo HTTP/1.1
Host: example.com
User-Agent: curl/7.74.0
Accept: */*

The request will be parsed into the following Ruby hash:

{
  ":method"     => "get",
  ":path"       => "/foo",
  ":protocol"   => "http/1.1",
  "host"        => "example.com",
  "user-agent"  => "curl/7.74.0",
  "accept"      => "*/*",
  ":rx"         => 78
}

Multiple headers with the same key will be coalesced into a single key-value where the value is an array containing the corresponding values. For example, multiple Cookie headers will appear in the hash as a single "cookie" entry, e.g. { "cookie" => ['a=1', 'b=2'] }

Handling of invalid requests

When an invalid request is encountered, the parser will raise a H1P::Error exception. An incoming request may be considered invalid if an invalid character has been encountered at any point in parsing the request, or if any of the tokens have an invalid length. You can consult the limits used by the parser here.

Reading the request body

To read the request body use #read_body:

# read entire body
body = parser.read_body

The H1P parser knows how to read both request bodies with a specified Content-Length and request bodies in chunked encoding. The method call will return when the entire body has been read. If the body is incomplete or has invalid formatting, the parser will raise a H1P::Error exception.

You can also read a single chunk of the body by calling #read_body_chunk:

# read a body chunk
chunk = parser.read_body_chunk(false)

# read chunk only from buffer:
chunk = parser.read_body_chunk(true)

If no more chunks are availble, #read_body_chunk will return nil. To test whether the request is complete, you can call #complete?:

headers = parser.parse_headers
unless parser.complete?
  body = parser.read_body
end

The #read_body and #read_body_chunk methods will return nil if no body is expected (based on the received headers).

Parsing from arbitrary transports

The H1P parser was built to read from any arbitrary transport or source, as long as they conform to one of two alternative interfaces:

ruby data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"] data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"] parser = H1P::Parser.new { data.shift } parser.parse_headers #=> {":method"=>"get", ":path"=>"/foo", ":protocol"=>"http/1.1", ":rx"=>21}

Design

The H1P parser design is based on the following principles:

One of the unique aspects of H1P is that instead of the server needing to feed data to the parser, the parser itself reads data from its source whenever it needs more of it. If no data is yet available, the parser blocks until more data is received.

The different parts of the request are parsed one byte at a time, and once each token is considered complete, it is copied from the buffer into a new string, to be stored in the headers hash.

Performance

The included benchmark (against http_parser.rb, based on the old node.js HTTP parser) shows the H1P parser to be about 10-20% slower than http_parser.rb.

However, in a fiber-based environment such as Polyphony, H1P is slightly faster, as the overhead of dealing with pipelined requests (which will cause http_parser.rb to emit callbacks multiple times) significantly affects its performance.

Roadmap

Here are some of the features and enhancements planned for H1P:

Contributing

Issues and pull requests will be gladly accepted. If you have found this gem useful, please let me know.