Feature: StringNG-Based HTTP Parser
Goal: Gain code clarity and performance in HTTP parsing using the infrastructure offered by StringNg, SBuf and SBufTokenizer
Status: On the drawing Board
ETA: unknown
Version: 3.2
Priority: 2
Developer: FrancescoChemolli
Depends On: Features/BetterStringBuffer/StringNg
More:
Feature Branch: lp:~kinkie/squid/http-parser-ng
Details
One of the main expected gains from StringNg is increased clarity and performance in HTTP parsing. The (as of Squid-3.1) implementation of the HTTP parser is a bit byzantine and might also benefit from a makeover. The code shows that attempts have been made in the pasts but have not been completed.
Code Architecture
Current
Parsing is handled by HttpMsg::parse. HttpMsg has a virtual parseFirstLine method which is implemented by its heirs HttpRequest and HttpReply. All those methods return a boolean value, set to false to signal that not enough data has been received to completely understand the request or reply. In that case the caller will retry once it has more data.
HTTP request parsing call chain (very C-ish): ConnStateData::clientReadRequest -> clientParseRequest -> HttpParserInit and parseHttpRequest
HttpParser is a struct holding enough state to parse the request as needed.
Proposal for evolution
It's not clear to me what's the purpose of HttpMsg is, and it makes the class hierarchy quite complex. Also, carving the parse functions out of the object themselves seems like a sensible thing to do.
I propose to implement a class hierarchy with this basic signature (to be extended as needed):
1 class HttpMsg {
2 HttpVersion ...;
3 HttpHeadersList ...;
4 // stuff common to request and reply. NO parsing functions
5 };
6 class HttpRequest : public HttpMsg {
7 // stuff specific to http requests.
8 };
9 class HttpReply : public HttpMsg {
10 // stuff specific to http requests.
11 };
12
13 // One parser per connection, to handle state.
14 // a parser pool may be used to avoid allocating and deallocating them, or
15 // MemPools can perform that function.
16 // one parser class for both requests and replies.
17 class HttpBaseParser {
18 HttpParseState ...; // whatever needed. Includes the request being parsed.
19
20 bool parseHeader (SBuf raw);
21 // not used by clients, but does stuff common to requests and replies.
22
23 void reset();
24 };
25 class HttpRequestParser : public HttpBaseParser {
26 //factory method. Returns NULL to request more data,
27 // throws exceptions in case of parse errors.
28 HttpRequest * parseRequest(SBuf raw);
29 // parses the first line, then hands off control
30 // to HttpBaseParser for the headers, then assembles
31 // the results
32 };
33 class HttpReplyParser: public HttpBaseParser {
34 //factory method. Returns NULL to request more data,
35 // throws exceptions in case of parse errors.
36 HttpRequest * parseReply(SBuf raw);
37 // parses the first line, then hands off control
38 // to HttpBaseParser for the headers, then assembles
39 // the results
40 }
41
