Well-Actually War Trall Ishad Nha Posted December 13, 2010 Share Posted December 13, 2010 One problem that I frequently hit is the 404 Not Found error. A site is defunct and can't be accessed at all. There are several solutions, one is Google caching which I know nothing about. Another is the Wayback Machine, a site that will resurrect a dead link, see: http://web.archive.org/collections/web.html Right click a dead link, choose Copy Shorcut and paste into the Wayback Machine's "Enter Web Address:" box. You will first need to delete the pre-existing "http://". Facebook pages: it seems that it is usually blocked by "Robots.txt"! It will show all the major revisions to a web site. You may or may not hit an error with attempts to access the very last revision, if so choose the second last revision. Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Alorael at Large Posted December 14, 2010 Share Posted December 14, 2010 Robots.txt is a file that can be added to a website that gives instructions to various bots and crawlers that troll the internet for information. It's strictly voluntary, but most reputable bot users, including Google and Wayback Machine, honor requests not to archive a website's contents. —Alorael, who believes Google caches are just a byproduct of the search engine. In order to know what's on the internet, Google's spiders copy and store information. It's actually the caches that are often searched when you enter a query. Since they're around and potentially useful, Google kindly makes them available as well. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.