Internet > Polipo
http://www.pps.jussieu.fr/~jch/software/polipo/
A small and fast caching web proxy (a web cache, an HTTP proxy, a proxy server).
Review
1.0.4
Tested 2009-08-12 on Unity Linux 0.99-alpha2, updated 2009-08-12
Firefox's skipscreen extension has issues on http://rapidshare.com. I regularly get an error message. It'll happen on different files or the same files.. it's on-and-off.
504 Connect to <url>:80 failed: Connection reset by client The following error occurred while trying to access <url>: 504 Connect to url.rapidshare.com:80 failed: Connection reset by client Generated Wed, 12 Aug 2009 01:14:25 PDT by Polipo on polipo:8123. x SkipScreen will download your file as soon as it is available!
Tested earlier than 2009-08-12 on Unity Linux 0.99-alpha1, updated at various times.
Works just fine.
Tested 2009-07-16 on Slackware 12.2, updated 2009-07-16.
While it seems to run, I'm having issues getting Firefox to actually use it properly. It needs to be troubleshooted further.
Tested 2009-05-08 on PCLinuxOS 2007, updated 2009-04
Installation
- Download
- Uncompress
- make
- ./polipo
Firefox 3.0.8:
- Top menu: Edit > Preferences
- Advanced tab > Network sub-tab
- Settings button
- Manual proxy configuration
- HTTP Proxy: localhost
- Port: 8123
- Change "No proxy for" from localhost, 127.0.0.1 to be blank. I prefer this for a local server. You do as you wish.
- Save
- Visit about:config
network.http.pipelining true network.http.proxy.pipelining true network.http.pipelining.maxrequests 15 network.http.max-persistent-connections-per-proxy 16 browser.cache.disk.enable falseCheck these:
network.http.proxy.version 1.1 network.http.proxy.keep-alive true network.http.use-cache falseUpdate: I have some websites which do not work in polipo, and so I do keep cache enabled in Firefox for them.
Usage
I start it with:
mkdir -p "/tmp/polipo/"I added this in my /etc/rc.d/rc.localpolipo daemonise=true \ censorReferer=maybe \ censoredHeaders="From, Accept-Language" \ pidFile="/tmp/polipo.pid" \ diskCacheRoot="/tmp/polipo/" \ chunkCriticalMark=0 \ chunkLowMark=0 \ disableConfiguration=true \ disableIndexing=false \ disableLocalInterface=true \ cacheIsShared=false
- I'm noticing too many broken websites..
- pmmSize=8192 \
Information:
- http://localhost:8123/polipo/config? - You can also reconfigure it while it's running.
- http://localhost:8123/polipo/status?
- http://localhost:8123/polipo/servers?
- http://localhost:8123/polipo/index?http://www.example.com/
- The pages starting with ‘http://localhost:8123/polipo/recursive-index?’ contain recursive indices of various servers. This functionality is disabled by default, and can be enabled by setting the variable disableIndexing.
kill `cat "/tmp/polipo.pid"`Reload the forbidden URLs file:
- write out all the in-memory data to disk (but won't discard them)
- reopen the log file
- reload the forbidden URLs file
kill -s SIGUSR1 ``cat "/tmp/polipo.pid"``Restart:
- write out all the in-memory data to disk
- discard as much of the memory cache as possible
- reopen the log file
- reload the forbidden URLs file
kill -s SIGUSR2 ``cat "/tmp/polipo.pid"``Purge the disk cache:
kill -USR1 ``cat "/tmp/polipo.pid"`` sleep 1 polipo -x diskCacheRoot="/tmp/polipo/" kill -USR2 ``cat "/tmp/polipo.pid"``I added this in my /etc/rc.d/rc.local
Logging specified by logFacility, and the logging location is set with logFile, defaulting to /var/log/polipo
todo
forbiddenFile (defaults to ~/.polipo-forbidden or /etc/polipo/forbidden, whichever exists) specifies the set of URLs that should never be fetched. If forbiddenFile is a directory, it will be recursively searched for files with forbidden URLs.
Every line in a file listing forbidden URLs can either be
- a domain name
- a string that doesn't contain any of ‘/’, ‘*’ or ‘\’ —
- or a POSIX extended regular expression.
- [1] easylist - won't work as a forbiddenFile.
- Convert the first character of "!" to "#"
- Anything without a "/" "*" or "\" should be ok as-is, but Convert everything else to a proper regular expression.
- get Firefox's type-and-search feature going again.. it won't work on single words. =/
- Somewhere in the Polipo website there is a mention of a smarter cache cleaning tool. Get/use it. Otherwise rig a cron job to purge things.
local web server
- It can be used as an external web server too!
http://localhost:8123/Uses the variable localDocumentRoot, which defaults to /usr/share/polipo/www/
notes
A number of websites incorrectly mark variable resources as cachable; such issues can be worked around in polipo by manually marking given categories of objects as uncachable. If dontCacheCookies is true, all pages carrying HTTP cookies will be treated as uncachable. If dontCacheRedirects is true, all redirects (301 and 302) will be treated as uncachable. Finally, if everything else fails, a list of uncachable URLs can be given in the file specified by uncachableFile, which has the same format as the forbiddenFile (see Internal forbidden list). If not specified, its location defaults to ‘~/.polipo-uncachable’ or ‘/etc/polipo/uncachable’, whichever exists.
sites which won't work with it
- http://nonoba.com - inconsistent issues with the style showing up and with the flash portion of a game page not appearing.
- http://groups.google.com - inconsistent issues with refreshing into a blank page, and with the style not showing up on cached versions.
- http://www.linuxquestions.org - inconsistent failure to load pages. I got an infinite redirect loop.
nonoba.com, groups.google.com, linuxquestions.org
Ad-Blocking
- [2] Ad Blocking with Polipo
You could build up the blocklist yourself, by hand, but that would be a pain. Instead, we'll just convert an adblock filterset to a format that Polipo can understand. Since Polipo is blocking by matching the URL only, we don't have the same fine-grained control as Adblock rules, but I personally don't require that level of control.
Firstly, grab an adblock filterset (e.g., easylist.txt). Next, grab either adblock2polipo.py (python) or adblock2polipo.rb (ruby). Then run whichever script you downloaded with the filerset file as the first parameter. The script will dump the re-written rules to the console so they can be inspected or be redirected to the file polipo uses to load its blocking rules from ~/.polipo-forbidden or /etc/polipo/forbidden on *nix systems. Restart Polipo and the new blocking rules should take effect.
PS. If you're like me, you'd rather have Polipo serve up a blank page for blocked URLs rather than a 403 error page. To accomplish this you need to edit the Polipo config file and add a forbiddenUrl option pointing to an empty image, such as this one. One good option for this is the following: make sure localDocumentRoot is either set to a real path (such as /usr/share/polipo/www) or commented out completely. Then create an empty file in that directory:
sudo wget -O /usr/share/polipo/www/empty.gif \ http://upload.wikimedia.org/wikipedia/commons/4/4b/Empty.gif
Then point forbiddenUrl to http://127.0.0.1:8123/empty.gif. Restart Polipo. That's it. :)
NB. Firefox users will need to add port 8123 to the allowed ports list, or they'll get an equally annoying error message from Firefox instead of a blank page. To do this, you need to open about:config in a new tab/window. Right-click and go to New->String, and for the property name put network.security.ports.banned.override, and for the value put 8123. It should work properly after that.
adblock2polipo.py
#!/usr/bin/pythonif __name__ == "__main__": import os import sys import re if len(sys.argv) == 1: sys.exit("Usage: %s
- convert adblock ruleset into polipo-forbidden format
" % os.path.basename(sys.argv[0])) if not os.path.exists(sys.argv[1]): sys.exit("The rules file (%s) doesn't exist" % sys.argv[1]) fhandle = file(sys.argv[1]) lines = fhandle.readlines() fhandle.close() dollar_re = re.compile("(.*?)\$.*") for line in lines: if line: if (line[0] in ("[", "!", "~", "#", "@") or line.startswith("/adverti") or "##" in line): continue line = dollar_re.sub(r"\1", line) # line = line.replace("|http://", "") line = line.replace("|", "") line = line.replace("||", "") line = line.replace(".", r"\.") line = line.replace("*", ".*") line = line.replace("?", r"\?") line = line.replace("^", r"[\/:\.=&\?\+\-\ ]+") # line = line.replace("&", r"\&") # line = line.replace("+", r"\+") # line = line.replace("-", r"\-") # line = line.replace(";", r"\;") # line = line.replace("=", r"\=") # line = line.replace("/", r"\/") print(line.strip()) print("")
adblock2polipo.rb
#!/usr/bin/rubyif __FILE__ == $0 if ARGV.length == 0 exit("Usage: #{File.basename($0)}
- convert adblock ruleset into polipo-forbidden format
") end if not File.exist?(ARGV[0]) exit("The rules file (#{ARGV[0]}) doesn't exist") end dollar_re = Regexp.new(/(.*?)\$.*/) File.readlines(ARGV[0]).each { | line | unless line.empty? if (["[", "!", "~", "#", "@"].include?(line[0]) or line[0, 8] == "/adverti" or line.include?("##")) next end line = line.gsub(dollar_re, "\\1") # line = line.gsub("|http://", "") line = line.gsub("|", "") line = line.gsub("||", "") line = line.gsub(".", "\\.") line = line.gsub("*", ".*") line = line.gsub("?", "\\?") line = line.gsub("^", "[\\/:\\.=&\\?\\\\+\\-\\ ]+") # line = line.gsub("&", "\\&") # line = line.gsub("+", "\\+") # line = line.gsub("-", "\\-") # line = line.gsub(";", "\\;") # line = line.gsub("=", "\\=") # line = line.gsub("/", "\\/") puts(line.strip) end } puts("") end