ClientStreams provides an API to retrieve and manipulate data from squid, from inside squid. Squid's ClientSide processing uses ClientStreams to fulfill standard client HTTP requests.

What follows is a very slightly edited transcript (with permission) of an IRC chat about ClientStreams, it needs to be cleaned up and made more organised...

14:48 nicholas Hi. I'm working on bug 1160 (analyze HTML to prefetch embedded objects). I can't figure out why, but even though it
14:49 lifeless I'd use the same mechanism ESI does.
14:49 nicholas Ok, that's client streams.
14:49 lifeless the fwdState api is on the wrong side of the store
14:49 nicholas doh!
14:49 lifeless so it doesn't have any of the required logic - cachability, vary handling, updates of existing opbjects...
14:50 lifeless things like store digests just haven't been updated to use client streams yet.
14:50 nicholas What, concisely, is a store digest?
14:51 lifeless a bitmap that lossilly represents the contents of an entire squid cache, biased to hits.
14:51 lifeless uses a thing called a bloom filter
14:52 lifeless it lets squid predict that another cache will have a given object, for predictive routing (as opposed to ICP which is
14:52 nicholas That's strange, but ok. I suppose it's necessary for performance when you have a large number of cached objects.
14:52 lifeless well its an optional feature.
14:53 nicholas Ok, I tried to track a standard request through the code and it runs through http.cc. Http.cc uses the store, but it
14:53 lifeless right.
14:54 lifeless http.cc just retrieves http objects, like ftp.cc retrieves ftp objects.
14:54 nicholas I'm on the wrong side of the fence. Gotcha.
14:54 nicholas Honestly, I spent about 5 days trying to understand the client stream API.
14:55 nicholas Concisely, what is a client stream? I suggested that they're a chain of observers to the results of a request. Is that
14:56 lifeless http://www.squid-cache.org/Doc/FAQ/FAQ-16.html
14:57 lifeless client streams..
14:57 lifeless they are a 'chain of responsibility' pattern.
14:58 lifeless sortof.
14:58 lifeless the clientStream code was started in C in the squid 2.6 timeframe, it needs an overhaul badly, now we can actually
14:59 nicholas Right, I noticed that the code is in flux. Might I add that I don't like CBDATA either ... not that I'm offering to do
15:00 nicholas For a ClientStreamData, I'm supposed to create my own Data class which is derived from, er, Refcountable? Then let the
15:01 nicholas See, I don't really understand what my callbacks are really supposed to do, since I only want "default" behaviour. As
15:02 lifeless well you don't want that.
15:02 lifeless because you don't want to parse requests.
15:02 lifeless ClientSocketContext is likely to be the closest thing to what you want though.
15:03 lifeless so your readfunc needs to eat all the data it recieves.
15:04 lifeless you can throw it away.
15:04 lifeless your detach function can just call clientStreamDetach(node, http);
15:04 nicholas so do I add my function into ClientSocketContext's read function?
15:04 lifeless see clientSocketDetach
15:04 nicholas or do I add another node in the clientStream?
15:04 lifeless no, you should have all your stuff in its own .cc file.
15:04 lifeless you'll construct a new clientStream to service your requests.
15:05 nicholas Oh it is, but somebody has to enter my .cc file at some point, right?
15:05 lifeless right, you should have that already written though - whatever is doing the parsing should already be a clientStream
15:06 nicholas Nope. I just hacked it into http.cc.
15:06 lifeless if its not, then don't worry for now, get it working is the first step.
15:06 nicholas Not that I can't move it pretty easily.
15:06 nicholas Everything works, except that it doesn't cache what it fetches. And now I know why.
15:06 lifeless your Status calls should always return prev()->status()
15:07 lifeless the callback call is the one that is given the data, it too should throw it away.
15:08 nicholas and someone else will cache it?
15:08 lifeless yes
15:08 nicholas ok, I assume you're talking about just the fetching part?
15:08 lifeless I'm talking about the clientStream node you need to implement.
15:09 nicholas so when I know a URL that I want to prefetch, I create my clientStream with this one node that you just described.
15:10 lifeless ESIInclude.cc shows this well
15:10 nicholas I've spent a lot of time reading it, but since I didn't understand clientStreams, I never managed to quite figure it
15:11 lifeless ok, start with ESIInclude::Start
15:11 lifeless this calls clientBeginRequest
15:12 nicholas esiBufferRecipient seems to do a lot of work, including checking whether the HTTP stream succeeded or failed, and
15:12 lifeless it passes in the clientStream callbacks - esiBufferRecipient, esiBufferDetach, the streamdata (stream.getRaw()), the
15:12 nicholas oh right, this code. Yes, I cut'n'pasted this in, but I never got it working for me.
15:12 lifeless esiBuffer recipient copies the object back into the ESI master document.
15:12 lifeless so it has to do a bunch more work than you'll need to.
15:13 nicholas stream.getRaw() is a pointer to the node, yes? I could the code around that confusing.
15:14 lifeless stream is a ESIStreamContext which is a clientStream node that pulls data from a clientstream, instances of which are
15:14 lifeless (different instances, but hte logic is shared by composition)
15:14 lifeless that is pased into ESIInclude::Start because ESI includes have a primary include and an 'alternate' include.
15:16 lifeless so all you need to start the chain is:
15:16 nicholas I see. I won't need to worry about any of that.
15:16 lifeless HttpHeader tempheaders(hoRequest);
15:17 lifeless if (clientBeginRequest(METHOD_GET, url, aBufferRecipient, aBufferDetach, aStreamInstance, &tempheaders,
15:17 lifeless {
15:17 lifeless /* handle failure */
15:17 lifeless }
15:17 lifeless httpHeaderClean (&tempheaders);
15:18 lifeless that will cause callbacks to aBufferRecipient, aBufferDetach to occur
15:19 lifeless then in the buffer recipient you throw them away, just check for status codes etc.
15:19 lifeless and I've given you the skeleton for detach above.
15:20 lifeless aStreamInstance is just a cbdata class that has your context.
15:20 lifeless i.e.
15:21 lifeless class myStream {
15:21 lifeless public
15:21 lifeless :
15:21 lifeless static void BufferData (clientStreamNode *, ClientHttpRequest *, HttpReply *, StoreIOBuffer);
15:21 lifeless static void Detach (clientStreamNode *, ClientHttpRequest *);
15:22 lifeless private:
15:22 lifeless CBDATA_CLASS2(myStream);
15:22 lifeless void buferData (clientStreamNode *, ClientHttpRequest *, HttpReply *, StoreIOBuffer);
15:22 lifeless void detach (clientStreamNode *, ClientHttpRequest *);
15:22 lifeless }
15:22 lifeless ;
15:23 lifeless then in your .cc file...
15:23 lifeless CBDATA_CLASS_INIT(myStream);
15:23 nicholas the cbdata init line, i presume?
15:23 lifeless those CBDATA macros setup new and delete to do the right thing.
15:23 lifeless then your static functions are just
15:23 nicholas i don't need to write my own void *operator new?
15:24 lifeless no, you don't.
15:24 lifeless void
15:24 nicholas phew. :)
15:24 lifeless myStream::BufferData (clientStreamNode *node, ClientHttpRequest *, HttpReply *, StoreIOBuffer)
15:24 lifeless {
15:24 lifeless if (!cbdataReferenceValid(node->data))
15:25 lifeless /* something weird has happened - your data has been freed, but a callback has still been issued. deal here */
15:25 lifeless static_cast<myStream *>(node->data)->bufferData(node, ...);
15:25 lifeless }
15:25 lifeless and likewise for the Detach static method
15:26 lifeless is this making sense ?
15:27 nicholas yes, but just let me reread a litt.e
15:27 lifeless ok, theres one more important thing :)
15:27 nicholas "static_cast<myStream *>(node->data)->bufferData(node, ...)" calls myStream::BufferData doesn't it? So why am I calling
15:28 lifeless lowercase bufferData :)
15:28 nicholas oh man, i thought that was just a typo. now i have to reread all of it!
15:28 lifeless the static functions (denoted with the initial Capital) are thunks into the actual instance methods.
15:29 nicholas which makes sense. yes.
15:29 lifeless http://www.squid-cache.org/~robertc/squid-3-style.txt
15:29 nicholas but what does bufferData actually do? let's see if i do understand this ...
15:29 nicholas ... it'll receive the contents of the page that I requested from clientBeginRequest, so I just discard them. check?
15:29 lifeless bufferData needs to do two things. it needs to check the status of node->next()
15:30 lifeless and on everything other than error or end-of-stream, it needs to issue a new read.
15:30 nicholas hm, ok.
15:31 lifeless if something like a 404 occurs, you'll get that as the HttpReply in the first call to bufferData.
15:31 nicholas and it will already be (negatively) entered into the cache for me
15:31 nicholas so i just ... don't do anything.
15:31 lifeless exactly.
15:31 lifeless just swallow the data until node->next()->status() returns an error.
15:32 nicholas if it was a successful read, but the connection is still open, i read more.
15:32 nicholas ok.
15:32 nicholas now let me ask you about the other half: analyzing pages that come in.
15:32 lifeless if its not an error, to swallow more data you call ->readfunc()
15:32 lifeless you'll need a buffer area in your class instance.
15:32 lifeless (although to be tricky you could use a static buffer in your class, as you don't care about the data)
15:33 nicholas (ah, nice trick! didn't think of that.)
15:33 nicholas I told you earlier that I just hacked my analyzer into http.cc. While this works for me, is there a better place to put
15:34 lifeless wbut I wouldn't worry about that - just have a HTTP_REQBUF_SZ char array in your private data.
15:34 nicholas I was using SM_PAGE_SIZE.
15:35 lifeless ok, where to put the analyzer ? we've got some rework we want to do in the request flow that would make this a lot
15:35 lifeless I think that the right place for now, is exactly where esi goes, and after esi in the chain.
15:35 lifeless the problem with where you are is that ftp pages won't be analysed. and if its an esi upstream then the urls could be
15:35 nicholas http requests that come in from clients have a client stream chain?
15:36 lifeless yup
15:36 nicholas hunh. i didn't even notice.
15:36 lifeless client_side_reply.cc line 1927
15:36 nicholas who installs ESIs ...
15:36 -!- Irssi: Pasting 11 lines to #squiddev. Press Ctrl-K if you wish to do this or Ctrl-C to cancel.
15:36 lifeless #if ESI
15:36 lifeless if (http->flags.accel && rep->sline.status != HTTP_FORBIDDEN &&
15:36 lifeless !alwaysAllowResponse(rep->sline.status) &&
15:36 lifeless esiEnableProcessing(rep)) {
15:36 lifeless debug(88, 2) ("Enabling ESI processing for %s\n", http->uri);
15:36 lifeless clientStreamInsertHead(&http->client_stream, esiStreamRead,
15:36 lifeless esiProcessStream, esiStreamDetach, esiStreamStatus, NULL);
15:36 lifeless }
15:36 lifeless #endif
15:36 nicholas yep, i've got the code up here.
15:37 nicholas clientStreamInsertHead. awesome.
15:37 lifeless this says - if its an accelerated request that isn't an deny-error page, and its a response that is amenable to
15:37 nicholas Nod. For me, I just need to know whether the mime-type is HTML or not.
15:38 lifeless you'll want to add your head before esi, so that you come after esi in the processing.
15:38 nicholas So the headers need to be complete and processed before I know whether to add myself.
15:38 lifeless so right before that #if ESI line.
15:39 nicholas Oh, I see it has the body at this point already?
15:39 nicholas Or does it just have a partial body?
15:39 lifeless it may have some body, but it definately has the reply metadata
15:39 nicholas Because my code is rigged to work with partial data.
15:39 nicholas ok, good.
15:39 nicholas Then that's *exactly* right.
15:39 lifeless so you can just look in rep-> to get the headers already parsed.
15:39 nicholas yep.
15:40 lifeless and you'll get called with whatever data is available in your buffer function.
15:40 nicholas Perfect.
15:40 lifeless your buffer function should analyse, then call node->next()->callback(node->next(), ...)
15:41 lifeless when a read is issued, there is one complication :
15:41 nicholas So that ESI or whomever can do it.
15:41 nicholas s/it/their thing/
15:41 lifeless if the client wants a range request, the read issued to you may be for partial data.
15:41 nicholas Will there be a flag on those? So I can avoid them?
15:42 lifeless so you have a choice. like ESI you can force ranges off for what you request, and filter out what you supply according
15:42 lifeless alternatively, and for you I think better, just don't add yourself to the chain at all if its a range request.
15:42 nicholas Well, what I request will never be ranged. But, what I analyze isn't necessarily what I requested.
15:43 nicholas It will normally be the request from the user agent. That's the point.
15:43 lifeless in your if block in client_side_reply just check http->request->range

ClientStreams (last edited 2008-05-18 19:38:55 by localhost)