All Posts Tagged With: "archive"

How To Easily Archive Web Pages Using MHT Files

If there was any universal immutable truth to the internet, it’s that things vanish from it all the time. Those pages you bookmarked last year? They may be gone. Those forums posts that contained a wealth of useful information? They may be gone as well.

There are several different ways to archive web pages.

You could use ScreenGrab for Firefox. But the problem is that you can text-search anything in an image.

You could use PDF Creator and "print" pages to PDF. This does allow text searching, but the PDF rarely looks anything like the original page and any images present look "off."

What truly works are MHT files. I’ve mentioned this before but have a few extra goodies to make it even easier.

What’s the difference between an MHT and a regular "Save Page As.."? The MHT is an actual single-file archive that contains all the code and images. It’s a great way to archive web pages that contain information you want to save.

Firefox does not have native ability to read or save MHT files, however with UnMHT, you can. It will even read MHTs saved by Internet Explorer, and IE will also read MHTs saved by Firefox. In addition to that, UnMHT has the ability to save all open tabs at once – something that IE 8 doesn’t do.

See video below for details on how it all works.

What’s The Best Way To Save A Web Page?

People save web pages to ensure they can retrieve information later without having to load it on the internet. It also is a way of retrieving a web page just in case the original web site has an outage or goes offline for whatever reason.

There are two basic ways of saving web pages, that being via the browser or "printing it" to a PDF.

Via the browser

The browser that has the absolute best web page save feature is Internet Explorer 8, due to the fact it can save entire web pages as a "Web Archive." When you click File/Save As (if you don’t see that in your IE 8, press ALT on your keyboard to bring up that menu,) you’ll see it as a save option:

image

When you choose to save it will "crunch" everything into a single file:

image

Why is this the best? Because it’s a single file that contains everything (and that’s why it’s labeled as an archive.) All the text, all the images and everything included. If you load it afterward, it looks exactly the way it was originally. It is to the best of my knowledge the only browser that does it right.

Other browsers, such as Firefox, save as "Web page, complete" and it’s nothing but a huge mess. An HTML file will be saved which is the web page, but a subfolder will also be created with all the images, JavaScript files, etc. You can literally get 20+ files out of a single web page save.

Love or hate IE 8, it rules the roost when it comes to web page archiving.

Drawbacks:

  • Only one – it’s proprietary to IE 8. Otherwise it’s the best way to archive a web page.

Via PDF Creator

If you don’t use IE 8 and want a web to save web pages a single files that include images and so on, the best way to do this is to use PDF Creator to create PDF files. This is free software that will install a virtual print driver and can be used in your web browser of choice.

Once installed, go to any web page, load it, then click File/Print or press CTRL+P. 

Choose PDF Creator from the window that appears:

image

..click OK.

The page will be crunched and made ready for PDF rendering:

image

You’ll see this:

image

Click the Save button at bottom right. You’ll be asked to name the file and where you want to save it to. Once done, the page is archived as a PDF.

Drawbacks:

  • Many times the PDF creator will default to a serif font (Times New Roman) instead of the font seen on the original web page.
  • Any links in the web page will not work in the PDF.

These drawbacks are usually acceptable being it’s the text you care about the most when it comes to a web page. Any images on the page will be embedded in the PDF; all text is searchable as well.

In addition, the PDF created even for very large web pages will be small in file size, suitable for sending in email if you want to send it off to a friend.

Via ScreenGrab

This is for Firefox only.

ScreenGrab is a FireFox plugin. It allows you to save a PNG or JPEG screen shot of any web page, but does so far better than ALT+PrintScreen. ScreenGrab will take an image of the entire page including the full length. The screen shot taken will look identical to what you see on-screen.

Drawbacks:

  • Since the output file is an image, none of the text can be searched and links won’t work either.
  • The default output file is a PNG. If the web page you save is very long, the file saved will be enormous.
  • On very large web pages it can cause Firefox to freeze up when attempting to take a full screen shot, particularly on slower computers.

You can make the screen shot ScreenGrab takes to be smaller by purposely not using the browser maximized, because yes, ScreenGrab captures everything – including all the white space on the sides.

To use ScreenGrab, install the add-on, then on any web page, right-click and choose ScreenGrab:

image 

"Complete Page/Frame" will save the entire page, length and all.

"Visible portion" only captures what the browser is displaying at that moment.

"Selection" allows you to select what you want captured.

"Window" acts like ALT+PrintScreen does.

Choosing to Save will save the file. Choosing to Copy will copy the image to the clipboard buffer where you can paste into another program such as an image editor, Word, etc.

21 Windows Apps – 7-Zip

image7-Zip is a file archiving application. Remember WinZIP or PKZIP? Think of 7-Zip like that.

This app is not pretty but it sure is easy. In fact, it’s so easy that it sometimes confuses people. When you install it, all you have to do to create a ZIP or 7z file is just right-click a file, a selection of files or a folder, choose 7-Zip from the context menu and create your ZIP file. That’s all there is to it.

7-Zip will easily open files created by other archive programs such as WinZIP. It will also recognize archive types from Linux such as TAR and GZIP (very handy).

The only archive type others use that 7-Zip does not support (to the best of my knowledge) is RAR. WinRAR is available for that (but it’s not free).