Indexing Org Files with a Database

The other day, I wrote about how Karl Voit uses an Org file to index his other Org files. It’s a nice scheme that’s light weight and easy to understand and it fits in well with his Memacs system. John Kitchin has a similar problem. He uses Org mode for writing papers, taking meeting notes, writing letters of recommendation, taking notes on papers he’s reading, keeping TODO lists, maintaining help files on software, writing lecture notes, running his courses, and many other chores.

Kitchin has about 5 years of Org files that are scattered across Dropbox, Google Drive, Git repos, and his local file system. It’s hard for him to locate a particular file, especially if he hasn’t accessed it in a while. His solution is to index everything in an SQLite database. He indexes the headlines, properties, tags, links, and even (provisionally) content. He uses EmacSQL to access the data and has hooks that (re)index any Org file he opens.

The system is still in the proof of concept stage but appears to be working well. Kitchin has stopped indexing the content because it slows things down too much and generates about a half gig of data. Even so, he can generate amazingly fine grained queries. For example, it’s trivial to find all his files that cite a certain paper. His post has other examples of queries like that.

If you have a huge collection of Org files, indexing them in a database like Kitchin does might be a good strategy. As usual, his code is available and although you probably won’t want his exact database schema, his is a good jumping off point for your own needs. Adapting his system will require a bit of knowledge of SQL but even generating non-trivial queries requires basic SQL knowledge so it’s definitely not for casual users.

This post is yet another example of Kitchin leveraging the power of Org mode to get his primary job of a Chemical Engineering researcher and teacher done. As always, his work is full of useful ideas that many of us can adopt or adapt.

This entry was posted in General and tagged , . Bookmark the permalink.
  • Kim Allamandola

    My personal solution is simply root all org-files somewhere and index it with Recoll, integration in Emacs is done via counsel (snippets on http://oremacs.com/2015/07/27/counsel-recoll/ my semi-identical version is here https://paste2.org/NBLjLFwF) super-fast and well integrated...

    • JohnKitchin

      I have tried recoll too. It is a good full text search tool, but I couldn't find a way to get it to search for only headlines or links.