Wget is a pretty powerful utility that allows you to download an entire website, with css, js, images and internal links intact. After the download, the website can be viewed locally.
Installation via Cygwin
To run wget, you will need to install Cygwin (unless you already have it installed). Cygwin is
a collection of tools which provide a Linux look and feel environment for Windows.
Download the setup.exe file and save it somewhere on your harddrive. Run it to install Cygwin for the first time, and to update it at any time in the future.
After you have double-clicked on the setup.exe
file, follow the on-screen instructions. After a couple of screens, you will be presented with the “Select Packages” screen.
Expand the “Web” option, and scroll down until you find the “wget” package. Click on it until it says “Install”.
Complete the installation.
Usage
Open Cygwin and navigate to a folder where you want to save a website, for example:
cd /cygdrive/c/dev/testfolder
cygdrive
is Cygwin’s wrapper for your computer, and c
represents your c:\
drive.
Tip: hit TAB
to “auto-complete” file and folder names
Tip: type ls
and press Enter
to see the contents of a folder
The basic syntax for wget website download:
wget --recursive --level=2 --convert-links http://www.ie6death.com/
--recursive
turns on recursive retrieving with the default maximum depth of 5
--level
specifies recursion maximum depth level
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-html content, etc.
There are many more options available, they are all listed at the Wget Manual page.
Do a test run first, to make sure that you have the correct options before you proceed with the whole website download. This can save you time since a download can take some time. Set the retrieval level to 1 or 2 to limit the number of downloaded files.
Tip: Hit UP
and DOWN
keys to repeat a previously typed command and to scroll through the command history.
If the website blocks the wget spider
Sometimes a website that you are trying to download will block Wget spider. You can go around that by setting a referer
option and pretend to be a regular web browser:
--referer="http://www.google.com" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"