ClientStreams provides an API to retrieve and manipulate data from squid, from inside squid. Squid's ClientSide processing uses ClientStreams to fulfill standard client HTTP requests.

What follows is a very slightly edited transcript (with permission) of an IRC chat about ClientStreams, it needs to be cleaned up and made more organised...

   1 14:48 < nicholas> Hi. I'm working on bug 1160 (analyze HTML to prefetch embedded objects). I can't figure out why, but even though it 
   2                   fetches the pages, it doesn't cache the result! The fetch is initiated with "fwdState(-1, entry, request);".
   3 14:49 < lifeless> I'd use the same mechanism ESI does.
   4 14:49 < nicholas> Ok, that's client streams.
   5 14:49 < lifeless> the fwdState api is on the wrong side of the store
   6 14:49 < nicholas> doh!
   7 14:49 < lifeless> so it doesn't have any of the required logic - cachability, vary handling, updates of existing opbjects...
   8 14:50 < lifeless> things like store digests just haven't been updated to use client streams yet.
   9 14:50 < nicholas> What, concisely, is a store digest?
  10 14:51 < lifeless> a bitmap that lossilly represents the contents of an entire squid cache, biased to hits.
  11 14:51 < lifeless> uses a thing called a bloom filter
  12 14:52 < lifeless> it lets squid predict that another cache will have a given object, for predictive routing (as opposed to ICP which is 
  13                   reactive)
  14 14:52 < nicholas> That's strange, but ok. I suppose it's necessary for performance when you have a large number of cached objects.
  15 14:52 < lifeless> well its an optional feature.
  16 14:53 < nicholas> Ok, I tried to track a standard request through the code and it runs through http.cc. Http.cc uses the store, but it 
  17                   doesn't actually insert the object into the cache?
  18 14:53 < lifeless> right.
  19 14:54 < lifeless> http.cc just retrieves http objects, like ftp.cc retrieves ftp objects.
  20 14:54 < nicholas> I'm on the wrong side of the fence. Gotcha.
  21 14:54 < nicholas> Honestly, I spent about 5 days trying to understand the client stream API.
  22 14:55 < nicholas> Concisely, what is a client stream? I suggested that they're a chain of observers to the results of a request. Is that 
  23                   accurate?
  24 14:56 < lifeless> http://www.squid-cache.org/Doc/FAQ/FAQ-16.html
  25 14:57 < lifeless> client streams..
  26 14:57 < lifeless> they are a 'chain of responsibility' pattern.
  27 14:58 < lifeless> sortof.
  28 14:58 < lifeless> the clientStream code was started in C in the squid 2.6 timeframe, it needs an overhaul badly, now we can actually 
  29                   write this sort of code more cleanly.
  30 14:59 < nicholas> Right, I noticed that the code is in flux. Might I add that I don't like CBDATA either ... not that I'm offering to do 
  31                   better.
  32 15:00 < nicholas> For a ClientStreamData, I'm supposed to create my own Data class which is derived from, er, Refcountable? Then let the 
  33                   ClientStreamData's internal pointer point to my object, then upcast it when my callbacks are called?
  34 15:01 < nicholas> See, I don't really understand what my callbacks are really supposed to do, since I only want "default" behaviour. As 
  35                   in, whatever squid normally does to cache/handle a request, expect that there's no sender to send it to.
  36 15:02 < lifeless> well you don't want that.
  37 15:02 < lifeless> because you don't want to parse requests.
  38 15:02 < lifeless> ClientSocketContext is likely to be the closest thing to what you want though.
  39 15:03 < lifeless> so your readfunc needs to eat all the data it recieves.
  40 15:04 < lifeless> you can throw it away.
  41 15:04 < lifeless> your detach function can just call clientStreamDetach(node, http);
  42 15:04 < nicholas> so do I add my function into ClientSocketContext's read function?
  43 15:04 < lifeless> see clientSocketDetach
  44 15:04 < nicholas> or do I add another node in the clientStream?
  45 15:04 < lifeless> no, you should have all your stuff in its own .cc file.
  46 15:04 < lifeless> you'll construct a new clientStream to service your requests.
  47 15:05 < nicholas> Oh it is, but somebody has to enter my .cc file at some point, right?
  48 15:05 < lifeless> right, you should have that already written though - whatever is doing the parsing should already be a clientStream
  49 15:06 < nicholas> Nope. I just hacked it into http.cc.
  50 15:06 < lifeless> if its not, then don't worry for now, get it working is the first step.
  51 15:06 < nicholas> Not that I can't move it pretty easily.
  52 15:06 < nicholas> Everything works, except that it doesn't cache what it fetches. And now I know why.
  53 15:06 < lifeless> your Status calls should always return prev()->status()
  54 15:07 < lifeless> the callback call is the one that is given the data, it too should throw it away.
  55 15:08 < nicholas> and someone else will cache it?
  56 15:08 < lifeless> yes
  57 15:08 < nicholas> ok, I assume you're talking about just the fetching part?
  58 15:08 < lifeless> I'm talking about the clientStream node you need to implement.
  59 15:09 < nicholas> so when I know a URL that I want to prefetch, I create my clientStream with this one node that you just described.
  60 15:10 < lifeless> ESIInclude.cc shows this well
  61 15:10 < nicholas> I've spent a lot of time reading it, but since I didn't understand clientStreams, I never managed to quite figure it 
  62                   out.
  63 15:11 < lifeless> ok, start with ESIInclude::Start
  64 15:11 < lifeless> this calls clientBeginRequest
  65 15:12 < nicholas> esiBufferRecipient seems to do a lot of work, including checking whether the HTTP stream succeeded or failed, and 
  66                   loading it into the store  (maybe, I'm not clear on the store API either).
  67 15:12 < lifeless> it passes in the clientStream callbacks - esiBufferRecipient, esiBufferDetach, the streamdata (stream.getRaw()), the 
  68                   http headers its synthetic request needs.
  69 15:12 < nicholas> oh right, this code. Yes, I cut'n'pasted this in, but I never got it working for me.
  70 15:12 < lifeless> esiBuffer recipient copies the object back into the ESI master document.
  71 15:12 < lifeless> so it has to do a bunch more work than you'll need to.
  72 15:13 < nicholas> stream.getRaw() is a pointer to the node, yes? I could the code around that confusing.
  73 15:14 < lifeless> stream is a ESIStreamContext which is a clientStream node that pulls data from a clientstream, instances of which are 
  74                   used by both the master esi document and includes
  75 15:14 < lifeless> (different instances, but hte logic is shared by composition)
  76 15:14 < lifeless> that is pased into ESIInclude::Start because ESI includes have a primary include and an 'alternate' include.
  77 15:16 < lifeless> so all you need to start the chain is:
  78 15:16 < nicholas> I see. I won't need to worry about any of that.
  79 15:16 < lifeless> HttpHeader tempheaders(hoRequest);
  80 15:17 < lifeless> if (clientBeginRequest(METHOD_GET, url, aBufferRecipient, aBufferDetach, aStreamInstance, &tempheaders, 
  81                   aStreamInstance->buffer->buf, HTTP_REQBUF_SZ)) 
  82 15:17 < lifeless>   {
  83 15:17 < lifeless>   /* handle failure */
  84 15:17 < lifeless> }
  85 15:17 < lifeless> httpHeaderClean (&tempheaders);
  86 15:18 < lifeless> that will cause callbacks to aBufferRecipient, aBufferDetach to occur
  87 15:19 < lifeless> then in the buffer recipient you throw them away, just check for status codes etc.
  88 15:19 < lifeless> and I've given you the skeleton for detach above.
  89 15:20 < lifeless> aStreamInstance is just a cbdata class that has your context.
  90 15:20 < lifeless> i.e.
  91 15:21 < lifeless> class myStream {
  92 15:21 < lifeless> public
  93 15:21 < lifeless> :
  94 15:21 < lifeless> static void BufferData (clientStreamNode *, ClientHttpRequest *, HttpReply *, StoreIOBuffer);
  95 15:21 < lifeless> static void Detach (clientStreamNode *, ClientHttpRequest *);
  96 15:22 < lifeless> private:
  97 15:22 < lifeless> CBDATA_CLASS2(myStream);
  98 15:22 < lifeless> void buferData (clientStreamNode *, ClientHttpRequest *, HttpReply *, StoreIOBuffer);
  99 15:22 < lifeless> void detach (clientStreamNode *, ClientHttpRequest *);
 100 15:22 < lifeless> }
 101 15:22 < lifeless> ;
 102 15:23 < lifeless> then in your .cc file...
 103 15:23 < lifeless> CBDATA_CLASS_INIT(myStream);
 104 15:23 < nicholas> the cbdata init line, i presume?
 105 15:23 < lifeless> those CBDATA macros setup new and delete to do the right thing.
 106 15:23 < lifeless> then your static functions are just
 107 15:23 < nicholas> i don't need to write my own void *operator new?
 108 15:24 < lifeless> no, you don't.
 109 15:24 < lifeless> void
 110 15:24 < nicholas> phew. :)
 111 15:24 < lifeless> myStream::BufferData (clientStreamNode *node, ClientHttpRequest *, HttpReply *, StoreIOBuffer)
 112 15:24 < lifeless> {
 113 15:24 < lifeless> if (!cbdataReferenceValid(node->data))
 114 15:25 < lifeless>  /* something weird has happened - your data has been freed, but a callback has still been issued. deal here */
 115 15:25 < lifeless> static_cast<myStream *>(node->data)->bufferData(node, ...);
 116 15:25 < lifeless> }
 117 15:25 < lifeless> and likewise for the Detach static method
 118 15:26 < lifeless> is this making sense ?
 119 15:27 < nicholas> yes, but just let me reread a litt.e
 120 15:27 < lifeless> ok, theres one more important thing :)
 121 15:27 < nicholas> "static_cast<myStream *>(node->data)->bufferData(node, ...)" calls myStream::BufferData doesn't it? So why am I calling 
 122                   myself?
 123 15:28 < lifeless> lowercase bufferData :)
 124 15:28 < nicholas> oh man, i thought that was just a typo. now i have to reread all of it!
 125 15:28 < lifeless> the static functions (denoted with the initial Capital) are thunks into the actual instance methods.
 126 15:29 < nicholas> which makes sense. yes.
 127 15:29 < lifeless> http://www.squid-cache.org/~robertc/squid-3-style.txt
 128 15:29 < nicholas> but what does bufferData actually do? let's see if i do understand this ...
 129 15:29 < nicholas> ... it'll receive the contents of the page that I requested from clientBeginRequest, so I just discard them. check?
 130 15:29 < lifeless> bufferData needs to do two things. it needs to check the status of node->next()
 131 15:30 < lifeless> and on everything other than error or end-of-stream, it needs to issue a new read.
 132 15:30 < nicholas> hm, ok.
 133 15:31 < lifeless> if something like a 404 occurs, you'll get that as the HttpReply in the first call to bufferData.
 134 15:31 < nicholas> and it will already be (negatively) entered into the cache for me
 135 15:31 < nicholas> so i just ... don't do anything.
 136 15:31 < lifeless> exactly.
 137 15:31 < lifeless> just swallow the data until node->next()->status() returns an error.
 138 15:32 < nicholas> if it was a successful read, but the connection is still open, i read more.
 139 15:32 < nicholas> ok.
 140 15:32 < nicholas> now let me ask you about the other half: analyzing pages that come in.
 141 15:32 < lifeless> if its not an error, to swallow more data you call ->readfunc()
 142 15:32 < lifeless> you'll need a buffer area in your class instance.
 143 15:32 < lifeless> (although to be tricky you could use a static buffer in your class, as you don't care about the data)
 144 15:33 < nicholas> (ah, nice trick! didn't think of that.)
 145 15:33 < nicholas> I told you earlier that I just hacked my analyzer into http.cc. While this works for me, is there a better place to put 
 146                   it? Especially if I want you devs to accept the patch?
 147 15:34 < lifeless> wbut I wouldn't worry about that - just have a HTTP_REQBUF_SZ char array in your private data.
 148 15:34 < nicholas> I was using SM_PAGE_SIZE.
 149 15:35 < lifeless> ok, where to put the analyzer ? we've got some rework we want to do in the request flow that would make this a lot 
 150                   easier to answer.
 151 15:35 < lifeless> I think that the right place for now, is exactly where esi goes, and after esi in the chain.
 152 15:35 < lifeless> the problem with where you are is that ftp pages won't be analysed. and if its an esi upstream then the urls could be 
 153                   wrong (for instance)
 154 15:35 < nicholas> http requests that come in from clients have a client stream chain?
 155 15:36 < lifeless> yup
 156 15:36 < nicholas> hunh. i didn't even notice.
 157 15:36 < lifeless> client_side_reply.cc line 1927
 158 15:36 < nicholas> who installs ESIs ...
 159 15:36 -!- Irssi: Pasting 11 lines to #squiddev. Press Ctrl-K if you wish to do this or Ctrl-C to cancel.
 160 15:36 < lifeless> #if ESI
 161 15:36 < lifeless>     if (http->flags.accel && rep->sline.status != HTTP_FORBIDDEN &&
 162 15:36 < lifeless>             !alwaysAllowResponse(rep->sline.status) &&
 163 15:36 < lifeless>             esiEnableProcessing(rep)) {
 164 15:36 < lifeless>         debug(88, 2) ("Enabling ESI processing for %s\n", http->uri);
 165 15:36 < lifeless>         clientStreamInsertHead(&http->client_stream, esiStreamRead,
 166 15:36 < lifeless>                                esiProcessStream, esiStreamDetach, esiStreamStatus, NULL);
 167 15:36 < lifeless>     }
 168 15:36 < lifeless> #endif
 169 15:36 < nicholas> yep, i've got the code up here.
 170 15:37 < nicholas> clientStreamInsertHead. awesome.
 171 15:37 < lifeless>  this says - if its an accelerated request that isn't an deny-error page, and its a response that is amenable to 
 172                   processing, and it passes the esi logic checks.. then add a new head.
 173 15:37 < nicholas> Nod. For me, I just need to know whether the mime-type is HTML or not.
 174 15:38 < lifeless> you'll want to add your head before esi, so that you come after esi in the processing.
 175 15:38 < nicholas> So the headers need to be complete and processed before I know whether to add myself.
 176 15:38 < lifeless> so right before that #if ESI line.
 177 15:39 < nicholas> Oh, I see it has the body at this point already?
 178 15:39 < nicholas> Or does it just have a partial body?
 179 15:39 < lifeless> it may have some body, but it definately has the reply metadata
 180 15:39 < nicholas> Because my code is rigged to work with partial data.
 181 15:39 < nicholas> ok, good.
 182 15:39 < nicholas> Then that's *exactly* right.
 183 15:39 < lifeless> so you can just look in rep-> to get the headers already parsed.
 184 15:39 < nicholas> yep.
 185 15:40 < lifeless> and you'll get called with whatever data is available in your buffer function.
 186 15:40 < nicholas> Perfect.
 187 15:40 < lifeless> your buffer function should analyse, then call node->next()->callback(node->next(), ...)
 188 15:41 < lifeless> when a read is issued, there is one complication :
 189 15:41 < nicholas> So that ESI or whomever can do it.
 190 15:41 < nicholas> s/it/their thing/
 191 15:41 < lifeless> if the client wants a range request, the read issued to you may be for partial data.
 192 15:41 < nicholas> Will there be a flag on those? So I can avoid them?
 193 15:42 < lifeless> so you have a choice. like ESI you can force ranges off for what you request, and filter out what you supply according 
 194                   to what is requested from you.
 195 15:42 < lifeless> alternatively, and for you I think better, just don't add yourself to the chain at all if its a range request.
 196 15:42 < nicholas> Well, what I request will never be ranged. But, what I analyze isn't necessarily what I requested.
 197 15:43 < nicholas> It will normally be the request from the user agent. That's the point.
 198 15:43 < lifeless> in your if block in client_side_reply just check http->request->range

ClientStreams (last edited 2008-05-18 19:38:55 by localhost)