Goodbye Datacloud and How to Put Up a Permanent Static HTML Archive Using HTTrack

Johndan announced the end of Datacloud today. I've been reading Datacloud regularly, so I'll be sad to see it go. Hopefully he'll get back online with another incarnation so that we can continue to read his insightful thoughts.

In his post, Johndan mentions that he will "be keeping the archives online, at least for the near future." I would encourage Johndan and everyone else who intends to stop writing on their blog to keep those archives available over the long term. Many of these are valuable resources and/or a good read that shouldn't go into link rot oblivion.

However, one obviously does not want to keep up with installing security fixes and other updates for a software driven weblog for years; an old weblog can fall into disrepair or become a target for hackers. Here's a sugestion for how to avoid that. Use HTTrack to create a static HTML mirror of your site and either replace the current location with it or put up the archive in a new spot. HTTRack is a GPL licensed software tool that spiders an existing site and creates a version of it as static HTML pages on your computer for offline viewing. In order for that to work, HTTrack rewrites all internal URL's so that they are relative to the root location, then making the site suitable for posting elsewhere online.

This is a very easy process, and something even those with more limited technological skills can do. In addition to running the software and creating the mirror, I also recommend that the weblog owner prep the site initially by

  • Turning off any interactive elements or other dynamically generated content that won't make sense in an archive of the original site. For example, a login link which will obviously no longer work in the static version, a list of recent referrers, which would no longer be acccurate, or your list of recently read books which link to Amazon.
  • Post a message to the weblog describing how the site has been permanently archived so that site vistors understand that the weblog will not be updated. Even better is to have the archive notice as a sidebar block on each page.
  • If the archive site needs to be moved to a newer, more permanent home on the web--such as another domain--include a link to that location on the old site and leave it up for a while. This way search engine spiders will find the new site location.

Incidentally, this is being done with Drupal class sites at Purdue University. Since class sites are not used beyond a single semester, there was no sense in continuing to run and maintain Drupal for older sites; furthermore, this reduces the load on the server. Instructions on how teachers are asked to do it with Drupal--each teacher produces the old archive--are available on the Professional Writing Program site at Purdue. See an example of an archived class site from the summer of 2005 which was originally hosted on my cyberdash domain. Not much visible difference between that and the original, and HTTrack got over 99% of the links to continue to work properly.

So Johndan and everyone else who decides to end a weblog, don't disappoint your present and future readers. Keep those posts alive using HTTrack :-)

tags:  

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Samantha's picture

I am sad to see datacloud

I am sad to see datacloud go. I'm wondering what the next iteration is gonna be. I'm anxious to see what Johndan is working on now!

Dr. B.'s Blog