Sunday 18 December 2016

Using Wayback Machine's Wayback Save function appears to be really slow

I tried putting together a function that makes use of Wayback Machine's "save" URL to save multiple pages as part of a recursively crawling script. The idea is that the script visits the "save" URL, then waits five seconds to see if the page was actually saved. If not, it waits five more seconds.Thing is, it takes up to two minutes for a proper signal that the page has actually been saved, which is bizarre since using Wayback Machine's save function in an actual browser only takes a few seconds.function savePage(page) { console.log("Saving",page); request("http://ift.tt/MV4y8z" + page, function (err, response, body) { if(err != null) throw err; else checkPage(); }); function checkPage() { setTimeout(function() { request("http://ift.tt/2hygcZI" + page, function (err, response, body) { if(err != null) throw err; else if(response.statusCode == 200) processResponse(response); else if(response.statusCode == 404 || !response.request.uri.href.includes("web/2016")) { //Double check to see if the page is actually saved checkPage(); } else throw new Error("Unknown status code: " + response.statusCode + " " + response.statusMessage); }); }, 5000); } } If I were to put a console.log under the "is status code 404?" clause, I will in fact see the message pop up several times.What am I doing wrong?

Submitted December 19, 2016 at 01:54AM by DoomTay

No comments:

Post a Comment