Sunday Quicky #5: A Handy Bookmarklet for Archive.org Analysis

I’m not going to lie, I hadn’t heard of bookmarklets until earlier this year at one of the SANS summits. They were quite the revelation. Their potential at automating collection and analysis of data was very obvious and very powerful. However, up until Today I hadn’t come across a compelling reason to make a bookmarklet. If you already know what bookmarklets are skip the next section to see what I made.

What are bookmarklets anyway?

Bookmarklets allow you to use javascript code in a bookmark in order to enable automation of some action. That’s a very brief summary and it’s not a whole deal more complicated than that but for a bigger explanation and a few more examples go visit this freecodecamp.org article on bookmarklets.

What bookmarklet did I make?

Essentially I made a bookmarklet to automate visiting a URL I came across while preparing for an upcoming certification attempt (hopefully, more on that soon). The URL was for archive.org to request a list of all URLs it had indexed for a particular domain. The URL to request this list is long and cumbersome and nobody should be expected to mess around with it. The URL is https://web.archive.org/cdx/search?url=example.com&matchType=prefix&collapse=urlkey&fl=timestamp,original&limit=100000, and when visited it will give you a two column listing of all URLs archive.org has indexed, with column one being the timestamp of the visit and column two being the URL. I’m not sure if it only lists the first time a URL was cached or the most recent time and it’s beyond the scope of this post, but the results will look as follows:

Figure 1: Sample of archive.org listing of URLs indexed at domain example.com

The archive.org address used for this query interacts with the API made available by archive.org. As you can see there are a number of options you can use, again they’re beyond the scope of this post but more detail can be found over on the Github Wayback CDX Server API page. The only one I want to highlight is the limit option, which has a max value of 150000.

Why do I need a Bookmarklet to visit that URL?

Well, obviously, you don’t need it. However, rather than have to find the URL copy and paste it into the address bar and then edit the target domain from example.com to whatever domain is the actual target, wouldn’t it be better if you just had a bookmark to click which prompts you for your desired domain and then just takes you there? Yes, yes it would. And that’s why you will want this bookmarklet.

Cut to the chase already!

Okay, okay, the bookmarklet is below:

javascript: (() => {
window.location.href="https://web.archive.org/cdx/search?url="+prompt("Please enter the domain you want to query on archive.org", "example.com")+"&matchType=prefix&collapse=urlkey&fl=timestamp,original&limit=100000";
})();

If you add the text above as a new bookmark, when you click it you will prompted to enter a target domain (example.com will be pre-populated) and you will then be taken to the archive.org listing of URLs for that domain, and voila! You’re done.

But it doesn’t work on the New Tab page!

Yes, I discovered this during testing too, so I did what most people do when they run into such issues and I went to Stack Overflow where I found this question and answer. It led to this newer bookmarklet code:

data:text/html,
<script>
window.location.href="https://web.archive.org/cdx/search?url="+prompt("Please enter the domain you want to query on archive.org", "example.com")+"&matchType=prefix&collapse=urlkey&fl=timestamp,original&limit=100000";
</script>

It does the exact same thing as above, except it now will work on the New Tab page. Apparently javascript is disabled on the New Tab page for security reasons, so it may not be advisable to use this version. You have been warned!

The Stack Overflow answer has links to a discussion on the security issue, feel free to visit it and make your own decision as to which one to use. It’s beyond the scope.

Leave a Reply

Your email address will not be published.