How to mirror (statically) a website?

If you want to copy a website really really fast use HTTrack with following switches:

httrack http://website.com/ -K --sockets=50 --disable-security-limits --max-rate=0

It works on Linux (there’s a native Debian package) and Windows.

Check documentation for other options. K is very important because it affects link generation. If you want original links without html extensions use it like above.

K option cheatsheet:

−K0foo.cgi?q=45 −> foo4B54.html?q=45 (relative URI, default)
−K−> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (−−keep−links[=N])
−K3−> /folder/foo.cgi?q=45 (absolute URI)
−K4−> foo.cgi?q=45 (original URL)
−K5−> http://www.foobar.com/folder/foo4B54.html?q=45 (transparent proxy URL)

And last but not least…

If you want to rewrite original links to html static links use this:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)/$ $1.html
RewriteRule ^([^/]+)/([^/]+)/$ /$1/$2.html
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/$ /$1/$2/$3.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}|/)$
RewriteRule (.*)$ /$1/ [R=301,L]


Posted

in