spiralofhope logo
spiralofhope logo

S
piral of Hope
Better software is possible.
Styles
Table of Contents

Internet > Polipo

http://www.pps.jussieu.fr/~jch/software/polipo/

docs mailing list

A small and fast caching web proxy (a web cache, an HTTP proxy, a proxy server).


Review

1.0.4

Tested 2009-08-12 on Unity Linux 0.99-alpha2, updated 2009-08-12

Firefox's skipscreen extension has issues on http://rapidshare.com. I regularly get an error message. It'll happen on different files or the same files.. it's on-and-off.

504 Connect to <url>:80 failed: Connection reset by client

The following error occurred while trying to access <url>:

504 Connect to url.rapidshare.com:80 failed: Connection reset by client
Generated Wed, 12 Aug 2009 01:14:25 PDT by Polipo on polipo:8123.
x

SkipScreen will download your file as soon as it is available!

Tested earlier than 2009-08-12 on Unity Linux 0.99-alpha1, updated at various times.

Works just fine.

Tested 2009-07-16 on Slackware 12.2, updated 2009-07-16.

While it seems to run, I'm having issues getting Firefox to actually use it properly. It needs to be troubleshooted further.

Tested 2009-05-08 on PCLinuxOS 2007, updated 2009-04

Installation

  1. Download
  2. Uncompress
  3. make
  4. ./polipo
Configure your browser:

Firefox 3.0.8:

  1. Top menu: Edit > Preferences
  2. Advanced tab > Network sub-tab
  3. Settings button
  4. Manual proxy configuration
    1. HTTP Proxy: localhost
    2. Port: 8123
    3. Change "No proxy for" from localhost, 127.0.0.1 to be blank. I prefer this for a local server. You do as you wish.
  5. Save
  6. Visit about:config
network.http.pipelining                             true
network.http.proxy.pipelining                       true
network.http.pipelining.maxrequests                 15
network.http.max-persistent-connections-per-proxy   16
browser.cache.disk.enable                           false
Check these:
network.http.proxy.version                          1.1
network.http.proxy.keep-alive                       true
network.http.use-cache                              false
Update: I have some websites which do not work in polipo, and so I do keep cache enabled in Firefox for them.

Usage

I start it with:

mkdir -p "/tmp/polipo/"
  1. I'm noticing too many broken websites..
  2. pmmSize=8192 \
polipo daemonise=true \ censorReferer=maybe \ censoredHeaders="From, Accept-Language" \ pidFile="/tmp/polipo.pid" \ diskCacheRoot="/tmp/polipo/" \ chunkCriticalMark=0 \ chunkLowMark=0 \ disableConfiguration=true \ disableIndexing=false \ disableLocalInterface=true \ cacheIsShared=false
I added this in my /etc/rc.d/rc.local

Information:

  • http://localhost:8123/polipo/config? - You can also reconfigure it while it's running.
  • http://localhost:8123/polipo/status?
  • http://localhost:8123/polipo/servers?
  • http://localhost:8123/polipo/index?http://www.example.com/
  • The pages starting with ‘http://localhost:8123/polipo/recursive-index?’ contain recursive indices of various servers. This functionality is disabled by default, and can be enabled by setting the variable disableIndexing.
Stop it with:
kill `cat "/tmp/polipo.pid"`
Reload the forbidden URLs file:
  • write out all the in-memory data to disk (but won't discard them)
  • reopen the log file
  • reload the forbidden URLs file
kill -s SIGUSR1 ``cat "/tmp/polipo.pid"``
Restart:
  • write out all the in-memory data to disk
  • discard as much of the memory cache as possible
  • reopen the log file
  • reload the forbidden URLs file
kill -s SIGUSR2 ``cat "/tmp/polipo.pid"``
Purge the disk cache:
kill -USR1 ``cat "/tmp/polipo.pid"``
sleep 1
polipo -x diskCacheRoot="/tmp/polipo/"
kill -USR2 ``cat "/tmp/polipo.pid"``
I added this in my /etc/rc.d/rc.local

Logging specified by logFacility, and the logging location is set with logFile, defaulting to /var/log/polipo

todo

forbiddenFile (defaults to ~/.polipo-forbidden or /etc/polipo/forbidden, whichever exists) specifies the set of URLs that should never be fetched. If forbiddenFile is a directory, it will be recursively searched for files with forbidden URLs.

Every line in a file listing forbidden URLs can either be

  1. a domain name
  2. a string that doesn't contain any of ‘/’, ‘*’ or ‘\’ —
  3. or a POSIX extended regular expression.
Blank lines are ignored, as are those that start with a hash sign ‘#’.

  • [1] easylist - won't work as a forbiddenFile.
    • Convert the first character of "!" to "#"
    • Anything without a "/" "*" or "\" should be ok as-is, but Convert everything else to a proper regular expression.
  • get Firefox's type-and-search feature going again.. it won't work on single words. =/
  • Somewhere in the Polipo website there is a mention of a smarter cache cleaning tool. Get/use it. Otherwise rig a cron job to purge things.

local web server

http://localhost:8123/
Uses the variable localDocumentRoot, which defaults to /usr/share/polipo/www/

notes

A number of websites incorrectly mark variable resources as cachable; such issues can be worked around in polipo by manually marking given categories of objects as uncachable. If dontCacheCookies is true, all pages carrying HTTP cookies will be treated as uncachable. If dontCacheRedirects is true, all redirects (301 and 302) will be treated as uncachable. Finally, if everything else fails, a list of uncachable URLs can be given in the file specified by uncachableFile, which has the same format as the forbiddenFile (see Internal forbidden list). If not specified, its location defaults to ‘~/.polipo-uncachable’ or ‘/etc/polipo/uncachable’, whichever exists.

sites which won't work with it

nonoba.com, groups.google.com, linuxquestions.org

Ad-Blocking

Polipo is a fast local proxy that does on-disk caching (by default, at least). Privoxy is another local proxy, with a focus on privacy and ad-blocking. Due to the nature and purpose of Privoxy, it has to buffer portions of the page (to check for content it should block) before serving it to the browser. This makes it a bit slower than Polipo. You could always use Polipo in front of Privoxy (see here, middle of the page), but that is a bit much if you don't need fancy filtering and simply want a domain/regex blocklist.

You could build up the blocklist yourself, by hand, but that would be a pain. Instead, we'll just convert an adblock filterset to a format that Polipo can understand. Since Polipo is blocking by matching the URL only, we don't have the same fine-grained control as Adblock rules, but I personally don't require that level of control.

Firstly, grab an adblock filterset (e.g., easylist.txt). Next, grab either adblock2polipo.py (python) or adblock2polipo.rb (ruby). Then run whichever script you downloaded with the filerset file as the first parameter. The script will dump the re-written rules to the console so they can be inspected or be redirected to the file polipo uses to load its blocking rules from ~/.polipo-forbidden or /etc/polipo/forbidden on *nix systems. Restart Polipo and the new blocking rules should take effect.

PS. If you're like me, you'd rather have Polipo serve up a blank page for blocked URLs rather than a 403 error page. To accomplish this you need to edit the Polipo config file and add a forbiddenUrl option pointing to an empty image, such as this one. One good option for this is the following: make sure localDocumentRoot is either set to a real path (such as /usr/share/polipo/www) or commented out completely. Then create an empty file in that directory:

sudo wget -O /usr/share/polipo/www/empty.gif \
http://upload.wikimedia.org/wikipedia/commons/4/4b/Empty.gif

Then point forbiddenUrl to http://127.0.0.1:8123/empty.gif. Restart Polipo. That's it. :)

NB. Firefox users will need to add port 8123 to the allowed ports list, or they'll get an equally annoying error message from Firefox instead of a blank page. To do this, you need to open about:config in a new tab/window. Right-click and go to New->String, and for the property name put network.security.ports.banned.override, and for the value put 8123. It should work properly after that.

adblock2polipo.py

#!/usr/bin/python

  1. convert adblock ruleset into polipo-forbidden format
if __name__ == "__main__": import os import sys import re if len(sys.argv) == 1: sys.exit("Usage: %s " % os.path.basename(sys.argv[0])) if not os.path.exists(sys.argv[1]): sys.exit("The rules file (%s) doesn't exist" % sys.argv[1]) fhandle = file(sys.argv[1]) lines = fhandle.readlines() fhandle.close() dollar_re = re.compile("(.*?)\$.*") for line in lines: if line: if (line[0] in ("[", "!", "~", "#", "@") or line.startswith("/adverti") or "##" in line): continue line = dollar_re.sub(r"\1", line) # line = line.replace("|http://", "") line = line.replace("|", "") line = line.replace("||", "") line = line.replace(".", r"\.") line = line.replace("*", ".*") line = line.replace("?", r"\?") line = line.replace("^", r"[\/:\.=&\?\+\-\ ]+") # line = line.replace("&", r"\&") # line = line.replace("+", r"\+") # line = line.replace("-", r"\-") # line = line.replace(";", r"\;") # line = line.replace("=", r"\=") # line = line.replace("/", r"\/") print(line.strip()) print("")

adblock2polipo.rb

#!/usr/bin/ruby

  1. convert adblock ruleset into polipo-forbidden format
if __FILE__ == $0 if ARGV.length == 0 exit("Usage: #{File.basename($0)} ") end if not File.exist?(ARGV[0]) exit("The rules file (#{ARGV[0]}) doesn't exist") end dollar_re = Regexp.new(/(.*?)\$.*/) File.readlines(ARGV[0]).each { | line | unless line.empty? if (["[", "!", "~", "#", "@"].include?(line[0]) or line[0, 8] == "/adverti" or line.include?("##")) next end line = line.gsub(dollar_re, "\\1") # line = line.gsub("|http://", "") line = line.gsub("|", "") line = line.gsub("||", "") line = line.gsub(".", "\\.") line = line.gsub("*", ".*") line = line.gsub("?", "\\?") line = line.gsub("^", "[\\/:\\.=&\\?\\\\+\\-\\ ]+") # line = line.gsub("&", "\\&") # line = line.gsub("+", "\\+") # line = line.gsub("-", "\\-") # line = line.gsub(";", "\\;") # line = line.gsub("=", "\\=") # line = line.gsub("/", "\\/") puts(line.strip) end } puts("") end