Download a website with Wget on Windows

Wget is a pretty powerful utility that allows you to download an entire website, with css, js, images and internal links intact. After the download, the website can be viewed locally.

Installation via Cygwin

To run wget, you will need to install Cygwin (unless you already have it installed). Cygwin is

a collection of tools which provide a Linux look and feel environment for Windows.

Download the setup.exe file and save it somewhere on your harddrive. Run it to install Cygwin for the first time, and to update it at any time in the future.

After you have double-clicked on the setup.exe file, follow the on-screen instructions. After a couple of screens, you will be presented with the “Select Packages” screen.

Expand the “Web” option, and scroll down until you find the “wget” package. Click on it until it says “Install”.

Complete the installation.

Usage

Open Cygwin and navigate to a folder where you want to save a website, for example:

cd /cygdrive/c/dev/testfolder

cygdrive is Cygwin’s wrapper for your computer, and c represents your c:\ drive.

Tip: hit TAB to “auto-complete” file and folder names

Tip: type ls and press Enter to see the contents of a folder

The basic syntax for wget website download:

wget --recursive --level=2 --convert-links http://www.ie6death.com/

--recursive turns on recursive retrieving with the default maximum depth of 5

--level specifies recursion maximum depth level

--convert-links After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-html content, etc.

There are many more options available, they are all listed at the Wget Manual page.

Do a test run first, to make sure that you have the correct options before you proceed with the whole website download. This can save you time since a download can take some time. Set the retrieval level to 1 or 2 to limit the number of downloaded files.

Tip: Hit UP and DOWN keys to repeat a previously typed command and to scroll through the command history.

If the website blocks the wget spider

Sometimes a website that you are trying to download will block Wget spider. You can go around that by setting a referer option and pretend to be a regular web browser:

--referer="http://www.google.com" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"

Installation via Cygwin

Usage

The basic syntax for wget website download:

If the website blocks the wget spider

Leave a Comment Cancel Reply