YouTube Watch Stats Collection tools

This helper is a part of a suite that analyze requests and schedules a download of a specific vidoe into a VOD solution. This is a helper that receives requests url when these flow into squid and increments the video ID counter. External tools can fetch from the local or remote redis server the stats and decide if it's worth to cache or download a specific youtube or other sites videos.

This code was prettified using one of the Atom Editor plugins which utilzezs Rubocop and the code syntax might be a bit confusing and if you have any question just send me an email at: eliezer@ngtech.co.il or post a question at the squid-users list (squid-users@lists.squid-cache.org)

yt-counter.rb

   1 #!/usr/bin/ruby
   2 # encoding: utf-8
   3 
   4 =begin
   5 license note
   6 Copyright (c) 2017, Eliezer Croitoru
   7 All rights reserved.
   8 
   9 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
  10 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  11 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  12 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
  13 
  14 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  15 =end
  16 
  17 require 'rubygems'
  18 require 'open-uri'
  19 require 'redis'
  20 require 'syslog'
  21 
  22 def statsTest(requestUrl)
  23   case requestUrl
  24   when /^https?\:\/\/[\.0-9a-zA-Z\-\_]+\.imdb\.com\/title\/([a-zA-Z0-9\-\_]+)\//
  25     if Regexp.last_match(1)
  26       STDERR.puts "Incerementing ID => #{'imdb-title-' + Regexp.last_match(1)}" if $debug
  27       $redisdbconn.incr('imdb-title-' + Regexp.last_match(1))
  28       STDERR.puts("Current stats for: #{'imdb-title-' + Regexp.last_match(1)} => #{$redisdbconn.get('imdb-title-' + Regexp.last_match(1))}") if $debug
  29     end
  30   when /^https?\:\/\/www\.youtube\.com\/watch\?.*(v)\=([a-zA-Z0-9\-\_]+)/
  31     if Regexp.last_match(2)
  32       STDERR.puts "Incerementing ID => #{'yt-videoid-' + Regexp.last_match(2)}" if $debug
  33       $redisdbconn.incr('yt-videoid-' + Regexp.last_match(2))
  34       STDERR.puts("Current stats for: #{'yt-videoid-' + Regexp.last_match(2)} => #{$redisdbconn.get('yt-videoid-' + Regexp.last_match(2))}") if $debug
  35     end
  36   when /^https?\:\/\/[a-zA-Z0-9\-\_]+\.(ytimg)\.com\/vi\/([a-zA-Z0-9\-\_]+)\//
  37     if Regexp.last_match(2)
  38       STDERR.puts "Incerementing ID => #{'ytimg-videoid-' + Regexp.last_match(2)}" if $debug
  39       $redisdbconn.incr('ytimg-videoid-' + Regexp.last_match(2))
  40       STDERR.puts("Current stats for: #{'ytimg-videoid-' + Regexp.last_match(2)} => #{$redisdbconn.get('ytimg-videoid-' + Regexp.last_match(2))}") if $debug
  41     end
  42   when /^quit.*/
  43     exit 0
  44   else
  45     ''
  46   end
  47 end
  48 
  49 def log(msg)
  50   Syslog.log(Syslog::LOG_ERR, '%s', msg)
  51 end
  52 
  53 def evalulateConc
  54   request = gets
  55   if request && (request.match /^[0-9]+\ /)
  56     conc(request)
  57     return true
  58   else
  59     noconc(request)
  60     return false
  61   end
  62 end
  63 
  64 def conc(request)
  65   return unless request
  66   request = request.split
  67   if request[0] && request[1]
  68     log("original request [#{request.join(' ')}].") if $debug
  69     result = statsTest(request[1])
  70     puts request[0] + ' ERR'
  71   else
  72     log('original request [had a problem].') if $debug
  73     puts 'ERR'
  74   end
  75 end
  76 
  77 def noconc(request)
  78   return unless request
  79   request = request.split
  80   if request[0]
  81     log("Original request [#{request.join(' ')}].") if $debug
  82     result = statsTest(request[0])
  83     puts 'ERR'
  84   else
  85     log('Original request [had a problem].') if $debug
  86     puts 'ERR'
  87   end
  88 end
  89 
  90 def validr?(request)
  91   if request.ascii_only? && request.valid_encoding?
  92     true
  93   else
  94     STDERR.puts("errorness line#{request}")
  95     # sleep 2
  96     false
  97   end
  98 end
  99 
 100 def main
 101   Syslog.open('yt-counter.rb', Syslog::LOG_PID)
 102   log('Started')
 103   redishost = 'localhost'
 104   redisdb = '0'
 105   redisport = 6379
 106   $redisdbconn = Redis.new(host: redishost, port: redisport)
 107   $redisdbconn.select redisdb
 108 
 109   c = evalulateConc
 110 
 111   if c
 112     while request = gets
 113       conc(request) if validr?(request)
 114     end
 115   else
 116     while request = gets
 117       noconc(request) if validr?(request)
 118     end
 119   end
 120 end
 121 
 122 $debug = false
 123 STDOUT.sync = true
 124 main

stats-collector.cgi

   1 #!/usr/bin/env ruby
   2 # encoding: utf-8
   3 
   4 # license note
   5 # Copyright (c) 2017, Eliezer Croitoru
   6 # All rights reserved.
   7 #
   8 # Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
   9 # 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  10 # 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  11 # 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
  12 #
  13 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  14 
  15 require 'rubygems'
  16 require 'open-uri'
  17 require 'redis'
  18 require 'syslog'
  19 require 'yaml'
  20 require 'json'
  21 require 'cgi'
  22 
  23 $cgi = CGI.new
  24 
  25 $params = $cgi.params
  26 
  27 def log(msg)
  28   Syslog.log(Syslog::LOG_ERR, '%s', msg)
  29 end
  30 
  31 def main
  32   Syslog.open('stats-collector.rb', Syslog::LOG_PID)
  33   log('Started')
  34   redishost = 'localhost'
  35   redisdb = '0'
  36   redisport = 6379
  37   $redisdbconn = Redis.new(host: redishost, port: redisport)
  38   $redisdbconn.select redisdb
  39 
  40   statsCollection = {}
  41   statsCollection['youtube-videos-ids'] = {}
  42 
  43   $redisdbconn.scan_each(match: 'yt-videoid-*') do |key_name|
  44     res = $redisdbconn.get(key_name)
  45     statsCollection['youtube-videos-ids'][key_name[11..-1]] = res.to_i
  46   end
  47 
  48   statsCollection['youtube-img-videos-ids'] = {}
  49 
  50   $redisdbconn.scan_each(match: 'ytimg-videoid-*') do |key_name|
  51     res = $redisdbconn.get(key_name)
  52     statsCollection['youtube-img-videos-ids'][key_name[14..-1]] = res.to_i
  53   end
  54 
  55   statsCollection['imdb-title-ids'] = {}
  56 
  57   $redisdbconn.scan_each(match: 'imdb-title-*') do |key_name|
  58     res = $redisdbconn.get(key_name)
  59     statsCollection['imdb-title-ids'][key_name[11..-1]] = res.to_i
  60   end
  61   output = ''
  62   outputFileExtention = 'yaml'
  63   outputFormat = 'application/x-yaml'
  64   case $params['format'][0]
  65   when nil
  66     output = statsCollection.to_yaml(Indent: 4, UseHeader: true, UseVersion: true)
  67   when 'json'
  68     outputFormat = 'application/json'
  69     outputFileExtention = 'json'
  70     output = JSON.pretty_generate(statsCollection)
  71   else
  72     output = statsCollection.to_yaml(Indent: 4, UseHeader: true, UseVersion: true)
  73   end
  74   output += "\n"
  75   if $params['text'] && $params['text'][0]
  76     print $cgi.header('type' => 'text/plain',
  77                       'expires' => Time.now - (3 * 24 * 60 * 60),
  78                       'Cache-Control' => 'no-cache',
  79                       'Content-Length' => output.size)
  80   else
  81     print $cgi.header('type' => outputFormat,
  82                       'expires' => Time.now - (3 * 24 * 60 * 60),
  83                       'Cache-Control' => 'no-cache',
  84                       'Content-Length' => output.size,
  85                       'Content-Disposition' => "attachment; filename=\"stats.#{outputFileExtention}\"")
  86 
  87   end
  88   print output
  89 end
  90 
  91 $debug = false
  92 STDOUT.sync = true
  93 main

yt-counter.go

   1 package main
   2 
   3 /*
   4 license note
   5 Copyright (c) 2017, Eliezer Croitoru
   6 All rights reserved.
   7 
   8 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
   9 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  10 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  11 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
  12 
  13 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  14 */
  15 import (
  16         "bufio"
  17         "flag"
  18         "fmt"
  19         "github.com/monnand/goredis"
  20         "os"
  21         "regexp"
  22         "strings"
  23         "sync"
  24 )
  25 
  26 var debug *bool
  27 var db_address *string
  28 var db_port *string
  29 var active *string
  30 var database goredis.Client
  31 var err error
  32 var world = []byte("session")
  33 var re [256]*regexp.Regexp
  34 
  35 func process_request(line string, wg *sync.WaitGroup) {
  36         defer wg.Done()
  37         answer := "ERR comment=yt-counter"
  38 
  39         lparts := strings.Split(strings.TrimRight(line, "\n"), " ")
  40         if len(lparts) > 1 && len(lparts[0]) > 0 && len(lparts[1]) > 0 {
  41                 if *debug {
  42                         fmt.Fprintln(os.Stderr, "ERRlog: Proccessing request => \""+strings.TrimRight(line, "\n")+"\"")
  43                 }
  44                 switch {
  45                 case re[0].MatchString(lparts[1]):
  46                         if *debug {
  47                                 fmt.Fprintln(os.Stderr, "URL Match for", re[0])
  48                         }
  49                         regexpRes := re[0].FindAllStringSubmatch(lparts[1], -1)
  50                         id := "imdb-title-" + regexpRes[0][1]
  51                         res, err := database.Incr(id)
  52                         if err != nil {
  53                                 fmt.Fprintln(os.Stderr, err)
  54                         }
  55                         if *debug {
  56                                 fmt.Fprintln(os.Stderr, "Current Counter state for URL", lparts[1], ", id =>", id, ", Counter =>", res)
  57                         }
  58                 case re[1].MatchString(lparts[1]):
  59                         if *debug {
  60                                 fmt.Fprintln(os.Stderr, "URL Match for", re[1])
  61                         }
  62                         regexpRes := re[1].FindAllStringSubmatch(lparts[1], -1)
  63                         id := "yt-videoid-" + regexpRes[0][2]
  64                         res, err := database.Incr(id)
  65                         if err != nil {
  66                                 fmt.Fprintln(os.Stderr, err)
  67                         }
  68                         if *debug {
  69                                 fmt.Fprintln(os.Stderr, "Current Counter state for URL", lparts[1], ", id =>", id, ", Counter =>", res)
  70                         }
  71                 case re[2].MatchString(lparts[1]):
  72                         if *debug {
  73                                 fmt.Fprintln(os.Stderr, "URL Match for", re[1])
  74                         }
  75                         regexpRes := re[2].FindAllStringSubmatch(lparts[1], -1)
  76                         id := "ytimg-videoid-" + regexpRes[0][2]
  77                         res, err := database.Incr(id)
  78                         if err != nil {
  79                                 fmt.Fprintln(os.Stderr, err)
  80                         }
  81                         if *debug {
  82                                 fmt.Fprintln(os.Stderr, "Current Counter state for URL", lparts[1], ", id =>", id, ", Counter =>", res)
  83                         }
  84                 default:
  85                         if *debug {
  86                                 fmt.Fprintln(os.Stderr, "No Match for URL", lparts[1])
  87                         }
  88                 }
  89 
  90         }
  91 
  92         fmt.Println(lparts[0] + " " + answer)
  93 }
  94 
  95 func main() {
  96         fmt.Fprintln(os.Stderr, "ERRlog: Starting yt-counter.go")
  97 
  98         debug = flag.Bool("d", false, "Debug mode can be \"yes\" or \"1\" for on and something else for off")
  99         db_address = flag.String("b", "127.0.0.1", "Db ip address")
 100         db_port = flag.String("p", "6379", "DB tcp port")
 101 
 102         flag.Parse()
 103 
 104         re[0] = regexp.MustCompile("^https?\\:\\/\\/[\\.0-9a-zA-Z\\-\\_]+\\.imdb\\.com\\/title\\/([a-zA-Z0-9\\-\\_]+)")
 105         re[1] = regexp.MustCompile("^https?\\:\\/\\/www\\.youtube\\.com\\/watch\\?.*(v)\\=([a-zA-Z0-9\\-\\_]+)")
 106         re[2] = regexp.MustCompile("^https?\\:\\/\\/[a-zA-Z0-9\\-\\_]+\\.(ytimg)\\.com\\/vi\\/([a-zA-Z0-9\\-\\_]+)\\/")
 107 
 108         database.Addr = *db_address + ":" + *db_port
 109         var wg sync.WaitGroup
 110         reader := bufio.NewReader(os.Stdin)
 111         for {
 112                 line, err := reader.ReadString('\n')
 113                 if err != nil {
 114                         // You may check here if err == io.EOF
 115                         break
 116                 }
 117                 wg.Add(1)
 118                 go process_request(line, &wg)
 119         }
 120         wg.Wait()
 121 }

stats output example

yaml

   1 ---
   2 youtube-videos-ids:
   3   -RSe8aOuZMQ: 1
   4   8UM6Pc0LkDw: 1
   5   80GtXgCSYJw: 1
   6   gA1WcPP9uLk: 1
   7 youtube-img-videos-ids:
   8   jfjGA8TyWXw: 1
   9   weeI1G46q0o: 1
  10   VD1ftHpJSu4: 1
  11   WsDDhm0dAkU: 1
  12   YoOXmuCRiQU: 1
  13   XuMjCRlAmjc: 1
  14   hvS8pjM8YVg: 1
  15   aatr_2MstrI: 1
  16   ax9ge-ymWIQ: 1
  17   C311vyA1Ta0: 1
  18   Y9LHxGuMb2A: 1
  19   LoXubLsml4A: 1
  20   KcLORGq2OiA: 1
  21   fyaI4-5849w: 1
  22   tdHSPnKDMZU: 1
  23   8UM6Pc0LkDw: 2
  24   80GtXgCSYJw: 5
  25   byYBGEE8NCM: 1
  26   her0dWH3svI: 1
  27   RgKAFK5djSk: 1
  28   dqsWzI4Hoss: 1
  29   EKbWvGLC97Q: 1
  30   N6-_gkIpL1E: 1
  31   BxuY9FET9Y4: 1
  32   87gWaABqGYs: 1
  33   OvtwV1vXnfc: 1
  34   kt3BrBYWUhs: 1
  35   -RSe8aOuZMQ: 4
  36   UVsRlX_skeU: 1
  37   QtxvPRev3I8: 1
  38   K0ibBPhiaG0: 1
  39   i_yLpCLMaKk: 1
  40   igNVdlXhKcI: 1
  41   gA1WcPP9uLk: 4
  42   Q6rTY4XH6TU: 2
  43   3VT3VIRQPKA: 1
  44   ejqrzU64dYQ: 1
  45   dMaUNdXs6-w: 1
  46   1UQzJfsT2eo: 1
  47   3AtDnEC4zak: 1
  48   jHMJrjkFwbI: 1
  49   rIPgru7JiYQ: 1
  50   TIrxyVt4jvQ: 1
  51   Ey_K97x15ek: 1
  52   GMwO-k8f9Hg: 1
  53   nfs8NYg7yQM: 1
  54   Gc15rdaxGMA: 1
  55   v-Dur3uXXCQ: 1
  56   TPxuhhC5OXE: 1
  57   5YEqcrtsdR0: 1
  58   JGwWNGJdvx8: 1
  59   lp-EO5I60KA: 1
  60   nSDgHBxUbVQ: 1
  61   RuNnmzGi4Ao: 1
  62   _dK2tDK9grQ: 1
  63   MZX_2sczkmo: 1
  64 imdb-title-ids: {}

EliezerCroitoru/Helpers/YT-Watch-Stats (last edited 2017-07-13 02:04:30 by Eliezer Croitoru)