Paperless

As I’ve written before, I maintain an almost exclusively digital workflow. About 8 years ago I banished pads, pens, and pencils from my desk and started taking notes and doing other record keeping chores on my computers and iOS devices. I write very few checks and other than signing charge slips—when Apple Pay isn’t available—I hardly ever write anything.

Somewhere along the line, I replaced my slow and tedious flatbed scanner with a snappy Fujitsu ScanSnap S1500M scanner and even allowed a pen back on my desk for those rare checks. Almost all our bills are paid online through our bank. Whatever paper we do get is scanned and shredded.

In my post linked above, I point to a post by Steve Losh in which he describes how he deals with his scanned documents. His idea is to run OCR on the documents, throw them all into a single directory and use his system’s search capabilities to find the ones he wants. This is simple and has the advantage that everything but the actual scanning can be automated. My ScanSnap takes care of the OCR for me but I still have to deal with filing the scanned documents. Because a lot of my scanned documents are tax related, I like to keep them filed by tax year. Other than a few other specialized documents the scans are mostly filed in a scanned-documents directory.

The fact that there’s more than one destination for them means that it’s hard to automate the task as Losh did. It turns out there’s a nice Emacs solution for this. Anthony Green has the Paperless app (also available on Melpa) that almost automates the filing process. You dump the scanned documents in a staging directory and paperless gives you a list of documents and target destinations. You can display the document if you need to before choosing a destination.

My only complaint is that all the target destinations have to live in a single hierarchy in which all the files are possible targets. That’s not an insurmountable problem, of course, even for someone like me who has an existing setup. If you’re interested in this sort of thing, take a look at the README on Github to get the details. This is, I think, a really nice solution and moves one more chore into Emacs. What’s not to like?

UPDATE [2017-02-15 Wed 19:49]: Karl Voit, who I think can be fairly characterized as a researcher in digital workflow, has a comment below that contains some links to his solutions to the problems discussed in this post. I’ve read most of these before and can testify that they’re well worth your time.

UPDATE [2017-02-22 Wed 15:01]: you’re → your.

This entry was posted in General and tagged . Bookmark the permalink.
  • About the single hierarchy: I use to use symbolic links on Linux for this, so the files can stay wherever I had them before and are also reachable from where the new configuration wants to find it.

    • jcs

      I haven't spent much time thinking about the matter but this idea is much better than what I was tentatively considering.

  • I use this https://smilesoftware.com/pdfpenpro with my slow archaic scanner but the workflow still rocks. Thanks for posting the link: that scanner does 25 page duple or 50 page single sided. Awesome. Also awesome idea bout letting it OCR the document and relying on OS text searching.

    • jcs

      Yup. It's a bit pricey and I had to gulp before I ordered it but I'm really glad I did every time I use it. The big thing for me is that it sits on my desk and is always ready. With my old flatbed, I had to get it out, plug it in and spend what seemed like a huge amount of time doing the scanning. Because it was so much trouble, I'd let the scanning pile up, which only made things worse.

      If you do a lot of scanning and can afford the freight it's a great tool and I recommend it unhesitatingly.

      • I'm going to start saving. AirPrint makes it easier but yes it is a flatbed and it is slow. It is usually faster to take a picture of it with my phone instead but not ideal.

    • Alan Shutko

      The Scansnap scanners are awesome. I got mine seven or eight years ago and it's going strong.

      One extra tweak on the Mac is an app called Hazel (https://www.noodlesoft.com) which I use to autoname scans. It sees the PDF, can run rules based on the PDF contents, pull information like dates, and rename appropriately. It isn't an Emacs solution, but having all my bank statements automatically named and filed is pretty great.

      • jcs

        Yes, hazel is what Steve Losh uses to automate his scan processing. If you're on macOS, it looks like a really nice solution.

        I concur, completely, about the ScanSnap. It eliminated virtually all of the scanning friction in my workflow.

  • Karl

    Thanks for this great blog posting. I also had to make the same decisions, learned a lot on the way, coded some tools to make my digital life as easy as possible. So here is my input mostly in form of links to my blog since this is the stuff I am writing about:

    I wrote a thesis on the topic of organizing files in folders vs. tags: http://karl-voit.at/tagstore/downloads/Voit2012b.pdf

    You can even try out my research tool I used to generate navigational paths from tags: tagstore: http://karl-voit.at/tagstore/

    I blogged about my process of digitizing many hundreds of documents: http://www.karl-voit.at/2015/04/05/digitizing-paper/

    My personal method to manage files (not only scanner files) and a description of a set of command line tools to ease the process: http://www.karl-voit.at/managing-digital-photographs/

    I built a set of tools that do the job of Hazel (mentioned in the comments) mostly independent of the operating system:

    My tool to semi-automatically generate file names: https://github.com/novoid/guess-filename.py

    I maintain a rather simple guess-targetfolder.sh file with regular expression matching to move files to their standard directories: https://gist.github.com/novoid/c4a239abc4027ecfd14e9904da88e6a1

    • jcs

      Thanks for the links. I've added an update to the original post pointing to this comment.

      • Karl

        Thanks for the flowers (again) ;-)

  • atgreen

    Thanks for mentioning paperless. I use a ScanSnap as well -- a great product.