I learned a cool trick over the weekend – recovering images from the browser cache, specifically from Google Chrome. That may seem a little obscure or unhelpful but believe me, it can be necessary. Right now you’re probably wondering to yourself “but why would anyone need to do that? I don’t really see the purpose here”. Valid question.
The ability to recover images from cache becomes a pretty valuable skill when you COMPLETELY ERASE A WEBSITE WITH NO BACKUP.
Who would do such a thing? Me. On Friday. In the course of testing a WordPress backup plugin I lost our new blog and all of the content – the irony of this is not lost on me. But enough about how we got here, let’s learn something from it.
But before you get too judgmental or start shaking your head, I offer up this link on StackExchange by none other than Jeff Atwood himself:
Unfortunately, our hosting provider experienced 100% data loss, so I’ve lost all content for two hosted blog websites:
- http://blog.stackoverflow.com
- http://www.codinghorror.com
(Yes, yes, I absolutely should have done complete offsite backups. Unfortunately, all my backups were on the server itself. So save the lecture; you’re 100% absolutely right, but that doesn’t help me at the moment. Let’s stay focused on the question here!)
Luckily I had a copy of all the text content in a WordPress backup XML, that I had generated from the standard Tools -> Export feature in WordPress. I also had a database backup from following the guidance from WordPress. That covered restoring blog posts, comments, etc., but I was still left without any of the images or galleries.
The options at this point revolve around cache copies, of which there are two main repositories (for me at least): Google’s search engine cache and browser cache.
To see what Google has cached for a specific URL use the following URL:
http://webcache.googleusercontent.com/search?q=cache:
Which for this blog, would look like this:
http://webcache.googleusercontent.com/search?q=cache:www.sparxeng.com/blog
Unfortunately, Google didn’t have have much content for me to scrape. My last option was browser cache.
To see your cache files in Chrome type the following in the address bar:
about:cache
or
chrome://cache/
Google Chrome stores all cached data files in raw form, with the http headers intact. This means that the content you want is there, but you can’t just right click the link and save the file. What you see when you click the link is an HTML report file that is human readable but isn’t the final binary. Example graphic below is from a blog by the guys at Frozax Games:
Luckily, there is one more free and easy online tool available to convert these HTML report files into the original binary form. The folks at Senseful Solutions have provided a browser-based conversion tool for rebuilding the binary file. You can copy/paste the html page into their web form and the file will magically appear below. Fantastic tool. Thanks guys.
tl;dr
I erased all our blog images and restored them using browser cache and this link.
Backup your stuff!
8 Responses
i went through a similar problem before the days of google chrome. Luckily google’s search engine also keeps a pretty decent cache as well that I used for backing up things
if the site has been archived / crawled in anyway you could try the “waybackwhen machine” located at http://archive.org/
cachecopy is your friend.
thank you very very much for this helpful information. I really appreciate it :)))))))))
This is a great infor, I just spent an hour trying to find a cached image in Google after having deleted it from a forum post… Luckily I had it in my Chrome cache…
Dont use such a light colored text, with the white in the back it’s really irritating to read.
Agreed on the light colored text – VERY hard to read.
Also, the senseful solutions thing no longer works, in case anyone’s wondering.
Thanks so much for this article :D!!! Found an image that I lost thanks to your “webcache.googleusercontent…” tip :)!