How to archive an Entire Web Site

To archive a single Web site, use the `−m' ("mirror") option, which saves files with the exact timestamp of the original, if possible, and sets the "recursive retrieval" option to download everything. To specify the number of retries to use when an error occurs in retrieval, use the `−t' option with a numeric argument−−−`−t3' is usually good for safely retrieving across the net; use `−t0' to specify an infinite number of retries, good for when a network connection is really bad but you really want to archive something, regardless of how long it takes. Finally, use the `−o' with a file name as an argument to write a progress log to the file−−examining it can be useful in the event that something goes wrong during the archiving; once the archival process is complete and you've determined that it was successful, you can delete the log file.

To mirror the Web site at http://www.bloofga.org/, giving up to three retries for retrieving files andputting error messages in a log file called `mirror.log', type:

$ wget −m −t3 http://www.bloofga.org/ −o mirror.log RET

This command makes an archive of the Web site at `www.bloofga.org' in a subdirectory called `www.bloofga.org' in the current directory. Log messages are written to a file in the current directory called `mirror.log'. To continue an archive that you've left off, use the `−nc' ("no clobber") option; it doesn't retrieve files that have already been downloaded. For this option to work the way you want it to, be sure that you are in the same directory that you were in when you originally began archiving the site.To continue an interrupted mirror of the Web site at http://www.bloofga.org/ and make sure that existing files are not downloaded, giving up to three retries for retrieval of files and putting error messages in a log file called `mirror.log', type:

$ wget −nc −m −t3 http://www.bloofga.org/ −o mirror.log RET

Posted on: 17/12/2009








0 Comments
If you want to leave a comment please Login or Register