Paperless home

THINGS easily own you! Having too much of it will inevitably make you loose order and you need mental indexes in order to keep track. One of the things that has the mental footprint is PAPERS!

First;

You can’t get rid of all paper – for legal reasons you must of course still keep agreements with ink on them. Scan but keep these originals. Papers where the texture or physical dimension of the document itself has a special value, then you must keep it. This is a guide ONLY for the information part of documents.

Second;

Prerequisites – I want to be able to fast and efficiently scan documents into a local folder structure. Scanning must be fast and efficient and the documents shall be stored in space efficient PDF and documents shall be OCR:er and the result be in the file. Metadata tagging is also key. I run Windows so that’s also a prerequisite. Also, document management including workflows for document sharing, online access and other enterprise/large group features are well beyond the scope and need.

Hardware

In order to be efficient, get a scanner with a paper feeder. The scanning itself must be FAST and support double sided printing.

I got myself a Canon MX895. Don’t need the fax and ink is criminally expensive for minuscule cartridges, but both the scanner and printer is great stuff. There are of course faster scanners, and the all-in-one units are generally not recommendable (if one part breaks, you lose all the other features as well).

Software

I’m a windows guy so I seek Windows solutions. I envy friends Stefan Görling who in this post (SWE) describes his process using DEVONthink for Mac but that’s not an option for me.

It used to be a special feature to make the documents available on the web, but to be honest Dropbox, Google Drive or Microsoft Skydrive make that totally redundant. (I tried Bitcasa but that never really worked for me so I am not referencing it). No need to pay extra for this. Scan to the folder of one of these services and you’re good to go.

Evernote – Webservice as key feature. Quite strangely the optional client feature no scanning option, but you can scan to Evernote using scanner programs. Metadata works well. My key complaint is that OCR parsing is done on the server and that you don’t really work with a set of local copies. The free service is restricted to how much you upload but once it’s there you can have unrestricted volumes. Why you still need to pay is that you want the OCR of the documents. (Link)

Adobe Acrobat – State of the art scanning, and state of the art OCR. No document management system, no tagging with metadata and it’s also REALLY expensive. (Link)

Paperport – An industry oldie, now in version 14. Scanning works great, OCR as well and you can manager the folder structure of what you want to  work with. REALLY rudimentary metadata – file name and a window after scanning where you can fill in a few keywords. It’s quite bloated and you need to keep track of the installation as it wants to install a lot of things you might easily live without. (Link)

Sohodox – Shockingly good program. Scanning is easy, and the metadata and tagging system is really second to none. Marking a document as invoice allows you to make key fields including value of the invoice. I was then annoyed that search didn’t really work as expected. This until I realised that scanning is poor and OCR is really disastrously bad. Looking at the text it thinks it has found makes me realise that this part of the program is a joke. (Link)

Scanitto – Very lightweight program that can scan and OCR parse. Not very workflow oriented but only targeted at scanning, OCR and nothing else. Even at this, there is no automated selection of best colour/BW and no automated selection of picture restoration. You have the option of saving as PDF (including OCR but there is not option of “despeckle”), multipage TIFF and Recongize to text (Saves to RTF which was far from what I wanted) Once scanned, the document leaves your view and you need to manage and search using other means. Don’t know why – if also refused to use my feeder unless I selected Duplex. (Link)

FileCenter – In very many ways a really good program! GUI is very windows aligned and the look and feel is hence spot on a very boring windows program with ribbon bar navigation. It does work really well and at a very small foot print, very few useless additional features it is to the point what one would want. Manage (including scanning and file management), Edit and Search are the three categories, and that is EXACTLY what I want to do. Great features on template for naming files, A few things that annoyed me was that even the pro OCR didn’t allow me to select Swedish, which I realise is a KEY restriction when almost all documents are Swedish. Meta data is as poor as in Paperport and despite looking like a slick and fairly cheap program, it’s actually the same price as Paperport which does give you additional features (that you admittingly might not care about). (Link)

eDOC Organizer – The user interface is very “photoshoppish”. Scanning is straight forward (again: not really automatic but that’s available in the TWAIN driver of my printer so I really don’t need the programs logic here). OCR parsing is REALLY slow and cannot do Swedish. The tagging is quite innovative and works well – flexible and still easy to understand, even I think they oversell the value of the coloured square marking category. It does categorization but not metadata to the level of Sohodox with invoices also containing for example the invoiced value. However, it’s WAY ahead of paperport and FileCenter on this. It can import stuff from other folders, including a full folder with all subfolders, so you can easily scan and organize using Paperport and then allow eDOC to chew on it. Key disadvantage is that  it uses it’s own structure, so importing means that it stores the stuff in it’s folder in a flat mode with alien names, so forget a nice sync to the cloud for easy access as you would only see a set of files with excessively long file names. (Link)

Summary

None of these work for my purpose. Sohodox is SO close to being really, really good. If it could use another scanning engine it would be my champ by a mile. eDOC is very nice and sufficiently feature rich. No Swedish OCR and the fact that it stores it’s PDF files in an internal format killing clouding is such a shame. FileCenter is also killed by not having Swedish OCR. So, I simply had to go for Paperport 14 Pro. I still seriously hope Sohodox can get their acts together on the scanning and OCR side – if they do, they have a winner. For now Paperport will have to do, but next I will look for a separate tagging engine. eDOC Organizer could have been it, if had just allowed me to store the documents in the original location with the original names.

Next I guess I have to look at file tagging software. Tabbles looks nice from the video but I don’t understand it while running it. Will be back when I have figures this out. 😉

 

2 Comments on “Paperless home

  1. Hi,
    first of all this is an interesting and competent analysis that looks beyond the usual mainstream.

    I know that this article was published almost two years ago, but the quest of finding the right document management software is still highly relevant. I’m no expert in this field but so far I haven’t found a tool that totally fulfilled my expectations and feature requests. Well, until I found Sohodox. Apart from being expensive it seems to be a really good software being also reduced to the core features I need. Scanning the web for some background information about this software I found this article hear.

    Have you by any chance experience one of the newer versions of Sohodox. A few weeks back version 9 was published and I guess that at least version 8 has beeb published as well within the last two years. If you had a chance to take a look on one of the newer versions could you offer an opinion on the scan engine? Has it improved in terms of scan quality?

    I would be delighted to get an answer despite being a bit older.

    Thanks a lot,
    rfuellner

  2. Haven’t tested it but am keen to do it. They must have improved the scanner part for me to find it relevant, as there it fell short. Watching the release notes I don’t see any change in this aspect. Did you download the evaluation version and got acceptable scan and OCR results? Please mind that I also use Swedish which makes OCR more challenging – you can’t just assume that 7 bit ASCII will cover all.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.