See Features/StoreID
... lets figure out what the hell is going on with Google Video and Youtube stuff so we can cache the current setup.
-- AdrianChadd 2008-09-11 08:17:06
A way to show you what Dynamic Content is really
http://www.youtube.com/watch?v=gGJvDEDN9mE
http://freevideolectures.com/Course/2712/Human-Computer-Interaction-Seminar-2009-2010/9
-- Eliezer Croitoru 2013-05-23 07:07:44
Caching YT is impossible with Squid only now
Here is explanation, why.
Look at this examples:
This is the same 5 sec video piece, got during day.
Note: All video ID's is unique. But this is the same clip.
Question: Does anybody seen permanent part of all URL's?
Answer: No. Google CDN generates unique ID for streaming servers (googlevideo) on JSON starting clip page. This ID is unchanged during watch, but changes in next runs. Also, audio and video now delivers separately, with independent streams. This ID decodes by HTML5 JS-based player, which is delivers to client browser at session starts. Player changes, like crypto algo, every week or two. So, most of all all YT caching solution on market are fake. They can't REALLY caching YT.
In theory, there is possible to write special store ID rewrite helper for YT. All we need - associate external video ID in youtube/watch/v=abcdefg with temporary session streams ID for following googlevideo gets and save it in cache with replaced real ID. Just extract generated ID from JSON starting page structure and replace backend servers ID before storing files.
In practice, best solution for today I found is NOT caching googlevideo.com domain, NOT caching youtube.com/watch/v= pages (try and see why). The only solution is caching images/css/js from YT with store ID. If Google return static video ID, we can cache YT video again. But now it is impossible by any way.
-- YuriVoinov 2015-08-16 01:17:00
Knowing what to cache
My example is my favorite band;
http://www.youtube.com/watch?v=pNL7nHWhMh0&feature=PlayList&p=E5F2BD7B040088AA&index=0
The video file and header below.
http://www.youtube.com/get_video?video_id=pNL7nHWhMh0&t=OEgsToPDskJNnO0O5GuQtKoNgB-xSmhH' Date: Thu, 11 Sep 2008 16:03:46 GMT Server: Apache Expires: Tue, 27 Apr 1971 19:44:06 EST Cache-Control: no-cache Location: http://v19.cache.googlevideo.com/get_video?video_id=pNL7nHWhMh0&origin=ash-v98.ash.youtube.com&signature=8CF859579781C2A297786C0433EFD3D0DA77985A.907C75B4F75160E1B33A82CB1B294D462B2324D9&ip=125.60.228.22&ipbits=2&expire=1221159826&key=yt1&sver=2 Keep-Alive: timeout=300 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8
Above header means redirect and it should not be cache. The Cache-Control:no-cache insures that. Now we follow redirect and we get the file. The reply header showed below. Which is the file we need to cache.
Expires: Thu, 11 Sep 2008 17:03:50 GMT Cache-Control: max-age=86400 Content-Type: video/flv Accept-Ranges: bytes Etag: "1903944549" Content-Length: 7949664 Server: lighttpd/1.4.18 Last-Modified: Thu, 09 Aug 2007 16:18:19 GMT Connection: close Date: Thu, 11 Sep 2008 16:03:50 GMT
To cache that content:
add this to squid.conf
# The keyword for all youtube video files are "get_video?", "videodownload?" and "videoplayback" plus the id, acl store_rewrite_list urlpath_regex \/(get_video\?|videodownload\?|videoplayback.*id)
[ UPDATE: if you still have cache deny QUERY line. Go do this: ConfigExamples/DynamicContent ]
and the storeurl feature
storeurl_access allow store_rewrite_list storeurl_access deny all storeurl_rewrite_program /usr/local/etc/squid/storeurl.pl storeurl_rewrite_children 1 storeurl_rewrite_concurrency 10
and refresh pattern
#youtube's videos refresh_pattern (get_video\?|videoplayback\?|videodownload\?) 5259487 99999999% 5259487 override-expire ignore-reload ignore-private negative-ttl=0
Storeurl script(where concurrency is > 0) or the storeurl.pl above. concurrency 10 is faster than children 10.
#your perl location in here, mine is #!/usr/bin/perl $|=1; while (<>) { @X = split; $x = $X[0] . " "; if ($X[1] =~ /(youtube|google).*videoplayback\?/){ @itag = m/[&?](itag=[0-9]*)/; @id = m/[&?](id=[^\&]*)/; @range = m/[&?](range=[^\&\s]*)/; print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/@id&@itag@range\n"; } else { print $x . $X[1] . "\n"; } }
[UPDATE: &range suppose to be partial contents... you may redirect them without "&range=xxx-xxx" to cache the whole content]
The bug
It happens when the redirect content has no Cache-Control:no-cache header
http://www.youtube.com/watch?v=mfHlA3fmJG0&feature=related http://www.youtube.com/get_video?video_id=mfHlA3fmJG0&t=OEgsToPDskK2_KHdgtTJ7LFT8pxWayTb Date: Thu, 11 Sep 2008 15:33:23 GMT Server: Apache Expires: Tue, 27 Apr 1971 19:44:06 EST Cache-Control: no-cache Location: http://v18.cache.googlevideo.com/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Keep-Alive: timeout=300 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 http://v18.cache.googlevideo.com/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Location: http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Expires: Thu, 11 Sep 2008 15:48:25 GMT Cache-Control: public,max-age=900 Connection: close Date: Thu, 11 Sep 2008 15:33:25 GMT Server: gvs 1.0 http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Expires: Thu, 11 Sep 2008 16:33:26 GMT Cache-Control: public,max-age=3600 Content-Type: video/flv Accept-Ranges: bytes Etag: "765088821" Content-Length: 10357890 Server: lighttpd/1.4.18 Last-Modified: Sat, 13 Oct 2007 10:58:26 GMT Connection: close Date: Thu, 11 Sep 2008 15:33:26 GMT
This is the header that will compromise. Uses redirect without no-cache
http://v18.cache.googlevideo.com/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Location: http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=046AAA380AE72BD92666F04FE5E6421EEAA8C035.B87EDB4B5C2F7731E25DE61B0C81937A0134ADD1&ip=125.60.228.22&ipbits=2&expire=1221158003&key=yt1&sver=2 Expires: Thu, 11 Sep 2008 15:48:25 GMT Cache-Control: public,max-age=900 Connection: close Date: Thu, 11 Sep 2008 15:33:25 GMT Server: gvs 1.0
And the result is
http://www.youtube.com/get_video?video_id=mfHlA3fmJG0&t=OEgsToPDskL6YzrwgHy6u70-jZ1DC_el Location: http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=2E57B84A8F23742666E884CF3B2C51A4277EBB2C.126363C8AFBDD2DBD3312BB8911EA2364F723561&ip=125.60.228.22&ipbits=2&expire=1221157983&key=yt1&sver=2 Expires: Thu, 11 Sep 2008 15:48:03 GMT Cache-Control: public,max-age=900 Date: Thu, 11 Sep 2008 15:33:03 GMT Server: gvs 1.0 Age: 5356 Content-Length: 0 X-Cache: HIT from Server Connection: keep-alive Proxy-Connection: keep-alive http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=2E57B84A8F23742666E884CF3B2C51A4277EBB2C.126363C8AFBDD2DBD3312BB8911EA2364F723561&ip=125.60.228.22&ipbits=2&expire=1221157983&key=yt1&sver=2 Location: http://208.117.253.103/get_video?video_id=mfHlA3fmJG0&origin=sjl-v120.sjl.youtube.com&signature=2E57B84A8F23742666E884CF3B2C51A4277EBB2C.126363C8AFBDD2DBD3312BB8911EA2364F723561&ip=125.60.228.22&ipbits=2&expire=1221157983&key=yt1&sver=2 Expires: Thu, 11 Sep 2008 15:48:03 GMT Cache-Control: public,max-age=900 Date: Thu, 11 Sep 2008 15:33:03 GMT Server: gvs 1.0 Age: 5356 Content-Length: 0 X-Cache: HIT from Server Connection: keep-alive Proxy-Connection: keep-alive
The content that is being cache is the redirect file which is empty. Which will also loop back to redirect content.
If only we could deny these Location reply header to storeurl will solve the problem and for additional tuning for its performance if we only pass bigger files to storeurl.
Temporary work around
change this on your squid.conf
minimum_object_size 512 bytes
This will ignore content 512 bytes and below. Since redirect file is smaller. The Disadvantage is this will ignore all content below 512 bytes in your cache.
If you have other idea that could help please email me chudy_fernandez@yahoo.com .
Right, so you need to deny caching the temporary redirect from Google so you can always hit your local cache for the initial URL? The problem is that the store URL stuff is rewriting the URL on the -request-. Its pointless to rewrite the store URL on -reply- because you'd not be able to handle a cache hit that way.
This could be done separately from the store URL stuff. Whats needed is a way to set the cachability of something based on a -reply- ACL.
That way you could match on the HTTP status code and the Location URL; and just say "don't bother caching this"; the client would then request the redirected URL (which is presumably the video) from you.
Do you think that'd be enough?
-- AdrianChadd 2008-09-14 09:20:00
Fixed
Diff file below..
Index: src/client_side.c =================================================================== --- src/client_side.c (revision 134) +++ src/client_side.c (working copy) @@ -2408,6 +2408,17 @@ is_modified = 0; } } + /* bug fix for 302 moved_temporarily loop bug when using storeurl*/ + if (mem->reply->sline.status >= 300 && mem->reply->sline.status < 400) { + if (httpHeaderHas(&e->mem_obj->reply->header, HDR_LOCATION)) + if (!strcmp(http->uri,httpHeaderGetStr(&e->mem_obj->reply->header, HDR_LOCATION))) { + debug(33, 2) ("clientCacheHit: Redirect Loop Detected: %s\n",http->uri); + http->log_type = LOG_TCP_MISS; + clientProcessMiss(http); + return; + } + } + /* bug fix end here*/ stale = refreshCheckHTTPStale(e, r); debug(33, 2) ("clientCacheHit: refreshCheckHTTPStale returned %d\n", stale); if (stale == 0) {
Squid version: squid-2.HEAD-20081105 also works on 2.7 series
Good luck!