This is a picture of my desk. Depressing, isn’t it? I have so many loose documents lying around, it’s just about impossible to find anything. In the old days (i.e.: Windows), I’d just use some nifty scanning software to scan, OCR, and save the documents as PDFs on my hard drive. Then, using nifty desktop searching tools (such as Google Desktop Search), it’d be easy to find what I need. Where’s that invoice to Widgets Corp from November? Search "Widgets Corp invoice November" and most likely it’d be the first result. Nifty, huh? Ah, organizational nirvana for disorganized people like me …
However, since switching almost exclusively to Ubuntu, I haven’t found any good scanning and archiving software. Yes, there are programs such as Xsane for scanning, and other programs such as gocr for optical character recognition, but nothing that’s even close to integrated. Therefore, the mess on the desk just keeps getting taller and taller.
Till now. I found gscan2pdf, a great little utility written by Jeffrey Ratcliffe (ra28145 at users dot sf dot net). Basically, it’s a nice GUI shell surrounding a variety of different UNIX programs allowing someone just press a button to scan a document, OCR it, and save it. Plus, it embeds the OCR text into the file’s meta data which, in english, allows existing indexing tools (such as Beagle or, my personal favorite, Tracker) to search. It also supports an optional program called unpaper which can clean up your document scans.
It’s not perfect yet. My main complaint is that you can’t control the level of compression for PDF files, so sometimes the files get fairly large. However, it’s being worked on as we speak.
If you have a scanner, and you need to archive your paper documents, check it out. Better yet, if you like the program, give Jeff a little donation. It’s well worth it.
Blogged with Flock