Copyright 2001-2004, Brian Johnson, Department of Architecture, University of Washington, Seattle
What it does
This script acts as a web crawler, or spider. Starting with a particular URL it retrieves the web page, scans it for links, and then attempts to retrieve all files linked to the page. This behavior repeats for each file retrieved and continues until one of several stop criteria is reached.
- Number of hops (links) from the start page.
- Links go "up" rather than "down".
- Links go to disallowed servers.
If desired, the application will rewrite absolute URLs relative to the download hierarchy, producing a completely self-sufficient archive.
Easy to use!
TheArchivist should be quite easy to install and use in a variety of ways on your desktop Mac. Just download and unStuff the archive. On OS-9, drop the Dialog Director (0.7) and Tanaka's (1.3) OSAXen in your System Folder if you don't already have them. Finally, double-click TheArchivist to start.
- January 8, 2005 - version 2.1x (b10)
- February 10, 2004 - version 2.1x
Re-written for OS-X. Better speed, better crawl status display, better handling of some legal-but-oddly-formed URLs.
- January 8, 2003 - version 2.1
Fixed problems when processing complex relative links, and a problem with quotes.
- April 20, 2002 - version 2.0
- March 21, 2001 - version 1.6
Corrects handling of urls containing single quotes. Provides dialog box for specifying default file name for walkable servers. Fixes bug when 'localizing' very short urls. Displays status info while processing in foreground.
- January, 2001 - version 1.5
This software is provided as "postcardware". You may download and use the software without cost so long as you register your use. You may register by email to firstname.lastname@example.org. You may not redistribute this software, nor include it in any collection. All installations should be made using a fresh download from http://www.caup.washington.edu/software/.
This software is provided as-is. By downloading and installing it you indicate that you accept all risks associated with use of the software and agree not to hold the author or the University of Washington liable in any fashion.
Last updated: January 8, 2005