See also search engine optimization (SEO)
- After various validations pass, and I hand-check things according to [1], add Viewable With Your Favorite Browser into he footer.
- Later, write an essay on this topic and link to it from the footer, and instead link to anybrowser.org from the essay.
- RSS
- http://www.w3.org/RDF/Validator/
- http://rss.scripting.com/
- http://feedvalidator.org/
HTML
HTML Tidy
The executable, not the Ruby library.
I pass everything through HTML Tidy. It's actually extremely difficult to build everything from scratch and be compliant. There are a number of things I do which are clumsy right now. HTML Tidy is a cheat.
I build everything to be XHTML 1.0 strict.
Tested and works on Unity Linux 64bit rc1 as of 2010-04-25:cvs -d:pserver:anonymous@tidy.cvs.sourceforge.net:/cvsroot/tidy loginThe command I use is:cvs -z3 -d:pserver:anonymous@tidy.cvs.sourceforge.net:/cvsroot/tidy co -P tidy cd tidy/build/gmake make su smart install libxslt-proc make install
- press enter
system( 'tidy', '-clean', '-quiet', '-omit', '-asxhtml', '-access', '-modify', '--drop-empty-paras', 'true', '--indent', 'true', '--indent-spaces', '2', '--keep-time', 'true', '--wrap', '0', '--force-output', 'true', '--show-errors', '0', '--show-warnings', 'false', '--break-before-br', 'true', '--tidy-mark', 'false', '--output-encoding', 'utf8', '--escape-cdata', 'false', '--indent-cdata', 'true', '--hide-comments', 'true', '--join-classes', 'true', '--join-styles', 'true', source_file_full_path )For additional options, check out tidy -help-config
CSS
http://jigsaw.w3.org/css-validator/
I use the Mozilla-only rounded corners CSS, so I fail validation.
Links
I don't understand why their checker doesn't allow checking of its own links, probably for traffic reasons. I have no robots.txt disallow for them.
JavaScript
See JavaScript for the list of features used with it.
My knowledge of JavaScript is sorely lacking. There are a few things I've copied from elsewhere which are pretty critical which are not standards-compliant. I just don't know enough to fix or replace what I'm doing..
A HREF function-links
JavaScript function-links are not actually valid. I've tried all sorts of stuff, but the best I can do is to wrap such things inside of <script type="text/javascript"> so that it'll only appear when JavaScript is enabled. Example link:
<a href="javascript:toggle('styles')">
To fix the validation issue, I started doing:
<a accesskey="t" href="/javascript.html#s0" onClick="javascript:toggle('styles');return false">Styles</a>
.. this forces me to have the link though. While this isn't really what I wanted, it does give an opportunity to link to another page explaining what JavaScript would have allowed the user to do.
Using HTML with document.write
JavaScript document.write technically shouldn't have HTML opening tags within it. Tidy HTML will escape any forward slashes in opening tags, ruining the code. I've never found a way around this. If I put only ending HTML tags inside the JavaScript - which is valid - and the other text outside, I still get validation errors. I have found no way around this. Example code:
<script type="text/javascript"><!--
var heredoc = (<r><![CDATA[
<p>some example text</p>
]]></r>).toString();
document.write(heredoc);
//--></script>
<noscript>
</noscript>
Robots.txt
- http://spiralofhope.com/robots.txt
- http://en.wikipedia.org/wiki/Robots_exclusion_standard
- http://www.robotstxt.org/robotstxt.html
- Those with a Google Account can use their Webmaster Tools to test robots.txt, see [2]
- live.com users have a similar tool in their Webmaster Center. But holy shit are the URLs fugly: [3] [4]
- http://tool.motoricerca.info/robots-checker.phtml
- http://www.invision-graphics.com/robotstxt_validator.html
- http://www.targetable.com/scripts/robotstxt.html
- http://tool.motoricerca.info/robots-checker.phtml
Notes on specific robots
Internet Archive
- http://archive.org
- info: http://www.alexa.com/help/webmasters
- User-agent: ia_archiver
- http://google.com, http://www.google.com/imghp, http://video.google.com/, etc: http://www.google.com/intl/en/options/
- http://search.aol.com/, http://search.aol.com/aol/imagehome, http://video.aol.com/
- info: http://en.wikipedia.org/wiki/Googlebot
- I can't find direct information, try around here: [5]
- submit: http://www.google.com/addurl/
- User-agent: Googlebot
- http://www.bing.com / http://live.com
- http://yahoo.com
- http://altavista.com
- info: http://en.wikipedia.org/wiki/Msnbot
- I can't even give a direct URL to official information on it.
- submit: http://www.bing.com/webmaster/SubmitSitePage.aspx
- User-agent: msnbot
- http://yahoo.com
- http://altavista.com
- info: http://help.yahoo.com/l/us/yahoo/search/webcrawler/index.html
- submit (requires registration): http://search.yahoo.com/info/submit.html
- Yahoo directory submit: https://ecom.yahoo.com/dir/submit/intro/
- User-agent: Slurp
- http://cuil.com
- info: http://www.cuil.com/info/webmaster_info/
- submit: http://www.cuil.com/info/contact_us/feedback/crawl_me
- User-agent: Twiceler
Sitemap
I have a plain html sitemap file, but there is also a sitemap XML standard:
- http://en.wikipedia.org/wiki/Sitemaps
- http://www.google.com/support/webmasters/bin/topic.py?topic=8476
Sitemap:Multiple lines for multiple sitemaps is allowed.http://example.com/sitemap.xml
RSS
(Not implemented yet)
Server
Since this engine doesn't really care about the functionality of the server, I don't have much to say about it.
Misc. notes:
- Custom error documents
- Logging and statistics
- Security settings
I'm not using self-hosted email right now.