Squid Web Cache wiki

Squid Web Cache documentation

🔗 Feature: Faster HTTP parser

Goal: Improve non-caching Squid3 performance by 20+%
Version: 3.6
Status: started
ETA: 2016
Priority: 1
Developer: AmosJeffries and FrancescoChemolli
Feature Branch: lp:~squid/squid/parser-ng (old: lp:~kinkie/squid/http-parser-ng)

🔗 Details

Avoid parsing the same HTTP header several times. Implement incremental header parsing.

One of the main expected gains from this and StringNg is increased clarity and performance in HTTP parsing. The (as of Squid-3.1) implementation of the HTTP parser (below “baseline situation”) is a bit byzantine and also benefits from a makeover. The code shows that attempts have been made in the pasts but have not been completed.

🔗 Code Architecture

Parsing handled by an Http::Parser child class which has an SBuf buffer and virtual parse method which splits the buffer content into message segments for followup processing.

Parsing of mime header block is (for now) handled as char* strings by HttpMsg objects in turn using {HttpHeader objects outside the Parser hierarchy. This object and all the logics it uses need to be refactored to operate on the SBuf presented by Http::One::Parser method mimeHeaders

The HttpMsg hierarchy objects are currently overloaded with two purposes;

as general purpose HTTP message state storage objects
as HTTP and ICAP response message parsing objects

🔗 going forward

Under review:

conversion of ICAP I/O read buffer to SBuf

Underway:

add HTTP/2 frame parser
add ICAP response parser

TODO:

use the parsed ICAP response to interpret how the ICAP payload segments need to be parsed instead of attempting (badly) to auto-detect by throwing the HttpMsg parser at it.
code using the HttpMsg parser needs to be refactored to use the Http::Parser API instead and the duplicate parser removed from Squid.
refactor the HttpHeader parsing logics to use SBuf and ::Parser::Tokenizer API. Possibly run by the new Parser child classes.
refactor ChunkedDecoder::parse to use SBuf and ::Parser::Tokenizer.

🔗 current state

After initial structural updates to the Http::Parser hierarchy.

the stack is asynchronous, now with incremental parse checkpoints resumed after read operations.

The request parsing system Http1::RequestParser::parse in Squid-3.6+ is as follows:

scan to skip over garbage prefix
incremental checkpoint wherever it halts, (start of request-line or empty buffer)
scan to find method
incremental checkpoint at end of method
scan to find URI and version
- in relaxed parser scan to find LF then work backwards
- in strict parsser scan for SP delimiters with extra checkppoint after URI
- incremental checkpoint at end of request-line
char* loop scan for end of header chunk (Http1::Parser::findMimeBlock / headersEnd)
incremental checkpoint at end of mime headers block
strcmp / scanf / char* loops for parsing URL (urlParse)
char* loop scan for end of each header line (HttpHeader::parse)
strcmp scan for : delimiter on header name and generate header objects
strListGet scan for parse of header content options

The response parsing system Http1::ResponseParser::parse in Squid-3.6+ is as follows:

scan for message version field
- accepting both “HTTP/1.x” and “ICY” protocol versions
- if necessary generates a fake HTTP/0.9 reply and terminates parsing.
incremental checkpoint at end of version label
scan for message status code field
incremental checkpoint at end of status code
scan for end of first line
incremental checkpoint at end of line
char* loop scan for end of header chunk (Http1::Parser::findMimeBlock / headersEnd)
incremental checkpoint at end of mime headers block
char* loop scan for end of header chunk (HttpMsg::httpMsgIsolateStart)
strcmp scan for : delimiter on header name and generate header objects (HttpHeader::parse)
strListGet scan for parse of header content options

The ICAP response parsing system Adaptation::Icap::ResponseParser::parse in parser-ng-icap-pt2 branch is as follows:

class inherits from the Http1::ResponseParser parser, but replaces the stage 1 version scan with an ICAP specific scan.

scan for message version field
- accepting “ICAP/1.0” protocol version only
- incremental checkpoint at end of version label
scan for message status code field
incremental checkpoint at end of status code
scan for end of first line
incremental checkpoint at end of line
char* loop scan for end of header chunk (Http1::Parser::findMimeBlock / headersEnd)
incremental checkpoint at end of mime headers block
char* loop scan for end of header chunk (HttpMsg::httpMsgIsolateStart)
strcmp scan for : delimiter on header name and generate header objects (HttpHeader::parse)
strListGet scan for parse of header content options

NOTE: Parsing of ICAP response messages and payload segments still uses the old HttpMsg API documented below for HTTP responses, when the payload segment is a request it uses the HttpMsg::parse request-line code paths.

🔗 the baseline situation

Saved for comparison.

Initial analysis of the request parsing systems in Squid-3 showed the parser stack to be as follows:

the entire stack is asynchronous with a full reset to step 1 after read operation where the message was incompletely received.

scan to skip over garbage prefix
parse request line to find LF, and invalid CR and NIL (HttpParser::parseRequestLine)
- discard prior parse information !!
and again, parse request line to find SP positions (HttpParser::parseRequestLine)
- discard prior parse information !!
parse inside each request-line token to check method/URL/version syntax (HttpParser::parseRequestLine)
- discard prior parse information !!
char* loop scan for end of header chunk (headersEnd)
sscanf re- scan and sanity check request line (HttpRequest::sanityCheck)
- incomplete, duplicates step 2 and 3, partially duplicates step 5.
strcmp parse out request method,url,version (HttpRequest::parseFirstLine)
- duplicates step 3 and 4
strcmp / scanf / char* loops for parsing URL (urlParse)
char* loop scan for end of each header line (headersEnd)
strcmp scan for : delimiter on header name and generate header objects
strListGet scan for parse of header content options

The parse sequences join at header line parsing (step 6), with some crossover at sanity checks (step 3). response parsing is as follows:

processReplyHeader calls HttpMsg::parse
- discarding all previous parse information !!
1. char* loop scan for end of header chunk (headersEnd)
2. sscanf re- scan and sanity check first line (HttpReply::sanityCheck)
  - on fail skip to stage ii below
3. strcspn scan for end of header line
4. char* loop scan for end of header chunk (HttpMSg::httpMsgIsolateStart)
5. strcmp parse out response version, status message (HttpReply::parseFirstLine)
6. strcspn scan for end of header line
7. char* loop scan for end of header chunk (wow 6 in a row!) (HttpMSg::httpMsgIsolateStart)
8. strcmp scan for : delimiter on header name and generate header objects
9. strListGet scan for parse of header content options
ii. check for special case missing “HTTP” and “ICY” protocol versions
- generates a fake HTTP/0.9 reply
- packs it into a buffer
- parses the fake reply !!
  - discarding all previous parse information !!
  - repeat all of stage i
iii. char* loop scan for end of header chunk (headersEnd)
- because we seem not to have scanned enough times in stage i

TODO: document the ICAP response parsing sequence. Despite visible efforts to make it simple that is even worse than HTTP response parsing due to its need to run the whole of the response AND request parsing chains above on payloads to auto-detect which will succeed.

Categories: WantedFeature

Navigation: Site Search, Site Pages, Categories, 🔼 go up