!!!! Need help restoring HTF files to our website from another.

Discussion in 'Computers' started by Ronald Epstein, Jan 17, 2006.

    Here's our situation....

    Back in October of 2005 we had a major hard drive
    crash that resulted in us losing a lot of HTF history.

    This history included chat transcripts, meet photos,
    Rons reviews and feature articles. There was so much
    HTF history lost between 1997-2004 in one single act.

    One of our Moderators was able to use a software
    program to rescue all the files that was on that
    hard disc. That's the good news. The bad news
    is that whenever a hard drive goes defunct (and
    possibly through the rescue process), every single
    file gets renamed to something obscure.

    In other words, we have the files -- but it's
    one big soupy mess where everything has been
    renamed so we can't easily match the photo files
    to the corresponding coded HTML webpage.


    I think I may have stumbled onto something rather
    incredible as far as restoring all the data we lost
    from that hard drive crash.

    Many of you may be aware of the Internet's WAYBACK
    MACHINE that archives website material.

    Looking around I was able to find these links:

    HTF CIRCA October 2004


    HTF MEETS PAGE (minus new meet photos)

    Rons Reviews

    In fact, you can reference as much as you want of HTF
    over the past 6-7 years by CLICKING HERE

    You'll notice some of the links work. Some do not.

    At least 75% of what we lost has been
    archived on that website.

    The challenge for us is to figure out the
    easiest way to import all that stuff onto
    our server. If there were a simple cut & paste
    method of transferring the text and photos
    to identical webpages on our server, we would
    jump at the opportunity to do it today.

    At least we know the material is out there
    and once we figure out how best to do it, we
    will begin importing all that stuff back.

    It's a shame that the MEET PHOTOS after 2003
    have been lost. On the other hand, the files
    are on our rescued hard drive and perhaps we
    can do some reconstruction from that.

    One of our members, Linda Thompson offered this
    possible solution that looks very promising....

    If you have a collection of the relevant links
    you've found at archive.org, and if they seem
    to be stored in a common location there (or if
    there's at least some recognizable pattern to
    the locations of the stored archives), perhaps
    some fishing with a program like HTTrack Website
    Copier or one of the Teleport products might help
    to gather them up for you and allow you to "import"
    or mirror them on the new server.

    Worth a shot? Maybe??

    If you're not familiar with the programs:

    http://www.httrack.com (free)

    http://www.tenmax.com/ (requires license purchase)

    I honestly don't think you've got anything to lose
    by at least trying the HTTrack, since it's free,
    if the file locations at archive.org have
    enough of a structure or recognizable pattern to
    make it feasible (i.e. "target-able")... Who
    knows what it might be able to retrieve (and
    HELP organize)? And, if the process itself
    shows promise, perhaps the more sophisticated
    Teleport products might be able to handle
    anything that HTTrack misses.

    If anybody has any ideas on how we can best
    accomplish our task, please feel free to
    contribute them.

    Thank You!
    The pages look fairly simple in design. Assuming you're not talking about actual forum content (php), why not just recode from scratch?

    It will take a couple of weeks if you do several pages a day, but it's a sure-fire solution. I'd start by sorting out the images into different directories.
    Those links all have the format:


    where TIMESTAMP seems to be of the form YYYYMMDDTTTTTT. My guess is that they're timestamping to 1/100ths of a second, or that they're encoding a numeric time zone.

    Browsing web.archive.org, I found a message thread that talks about reconstructing Web sites from the Internet Archive and search engine caches. I don't know how useful the tools and papers mentioned there are, but it might be worth a look.

