Feature: StringNG-Based HTTP Parser

  • Goal: Gain code clarity and performance in HTTP parsing using the infrastructure offered by StringNg, SBuf and SBufTokenizer

  • Status: On the drawing Board

  • ETA: unknown

  • Version: 3.2

  • Priority: 2

  • Developer: FrancescoChemolli

  • Depends On: Features/BetterStringBuffer/StringNg

  • More:

  • Feature Branch: lp:~kinkie/squid/http-parser-ng

Details

One of the main expected gains from StringNg is increased clarity and performance in HTTP parsing. The (as of Squid-3.1) implementation of the HTTP parser is a bit byzantine and might also benefit from a makeover. The code shows that attempts have been made in the pasts but have not been completed.

Code Architecture

Current

Parsing is handled by HttpMsg::parse. HttpMsg has a virtual parseFirstLine method which is implemented by its heirs HttpRequest and HttpReply. All those methods return a boolean value, set to false to signal that not enough data has been received to completely understand the request or reply. In that case the caller will retry once it has more data.

HTTP request parsing call chain (very C-ish): ConnStateData::clientReadRequest -> clientParseRequest -> HttpParserInit and parseHttpRequest

HttpParser is a struct holding enough state to parse the request as needed.

Proposal for evolution

It's not clear to me what's the purpose of HttpMsg is, and it makes the class hierarchy quite complex. Also, carving the parse functions out of the object themselves seems like a sensible thing to do.

I propose to implement a class hierarchy with this basic signature (to be extended as needed):

   1 class HttpMsg {
   2    HttpVersion ...;
   3    HttpHeadersList ...;
   4    // stuff common to request and reply. NO parsing functions
   5 };
   6 class HttpRequest : public HttpMsg {
   7    // stuff specific to http requests. 
   8 };
   9 class HttpReply : public HttpMsg {
  10    // stuff specific to http requests. 
  11 };
  12 
  13 // One parser per connection, to handle state.
  14 // a parser pool may be used to avoid allocating and deallocating them, or
  15 // MemPools can perform that function.
  16 // one parser class for both requests and replies.
  17 class HttpBaseParser {
  18    HttpParseState ...; // whatever needed. Includes the request being parsed.
  19 
  20    bool parseHeader (SBuf raw);
  21    // not used by clients, but does stuff common to requests and replies.
  22    
  23    void reset();
  24 };
  25 class HttpRequestParser : public HttpBaseParser {
  26    //factory method. Returns NULL to request more data,
  27    // throws exceptions in case of parse errors.
  28    HttpRequest * parseRequest(SBuf raw);
  29    // parses the first line, then hands off control
  30    // to HttpBaseParser for the headers, then assembles
  31    // the results
  32 };
  33 class HttpReplyParser: public HttpBaseParser {
  34    //factory method. Returns NULL to request more data,
  35    // throws exceptions in case of parse errors.
  36    HttpRequest * parseReply(SBuf raw);
  37    // parses the first line, then hands off control
  38    // to HttpBaseParser for the headers, then assembles
  39    // the results
  40 }
  41 


CategoryFeature

Features/StringNgHttpParser (last edited 2010-02-19 10:37:18 by FrancescoChemolli)