Tabula

Back in early July, I wrote a rant disagreeing with Robert Zaremba about retiring the use of PDFs. Zaremba believes that PDFs are no longer a good fit for today’s devices and that we should stop using them. I strongly disagreed. In the comments to my post, Mike Zamansky zinged me by noting that “PDFs are where data goes to die.” His point, of course, is that it’s pretty hard to get data out of a PDF.

Now, finally, I have a rejoinder: Tabula, a tool for extracting PDF tables into CSV data. Once you’ve got it in that format, it’s easy to convert it into others such as an Org mode table. That should be especially handy for researchers who like to write their papers in Org mode.

The problem of extracting table data from a PDF turns out to be surprisingly difficult. Follow the Tabula link to read about some of the problems. Regardless, Tabula can (usually) do it and help researchers capture data from a PDF in a relatively painless and accurate way.

Posted in General | Tagged , , | Leave a comment

Cash and Smart Phones in China

Back in May, I wrote about the central role that Smart Phones play in the day-to-day life of those in mainland China. In particular, you are considered odd if you try to pay for just about anything with cash. Now, The New York Times has an article that serves as a nice followup.

According to the Times, cash in urban China is rapidly becoming obsolete. This evolution has been remarkably swift: three years ago, cash was used for most transactions. Now, as I said, its use is considered odd. The secret to the fast uptake and near universal acceptance of digital payments is the use of QR codes. Instead of expensive card readers, merchants simply provide a print out of a QR code that the customer scans with his smart phone. That means that even very small merchants—even street musicians—can accept digital payments with virtually no cost other than the transaction fee.

The author of the article says that, so far, this trend has not moved outside of mainland China, not even to Hong Kong. I look forward to seeing its use spread to the West, although as I said in my original post, I hope it’s not through single portal like WeChat that has privacy implications.

Posted in General | Tagged | Leave a comment

Goodhart's Law

Kontra has a useful reminder:

We see applications of this all the time. One famous example is the notion that we should use standardized tests to measure teacher/school effectiveness and base compensation/budgets appropriately. It’s one of those ideas that sounds good when you first hear it—who isn’t for rewarding more effective teachers, after all—but in reality those being measured soon learn how to game the system and we end up with “teaching to the test.”

We see it in our own industry every time some bean counter decides to evaluate programmer productivity by counting lines of code committed or some equally silly measure. Whatever the metric, programmers soon learn to maximize it even if the result is less and poorer quality work getting done.

The idea that making some measurement a target results in the measurement becoming useless is so common that it even has a name, Goodhart’s Law. It’s one of those things that everyone knows but always fail to take into consideration when starting out to measure some human activity with the aim of affecting policy in some way that matters to those being measured.

Posted in General | 2 Comments

John Wiegley and Sacha Chua on use-package

A couple years ago, John Wiegley and Sacha Chua made a video about Wiegley’s use-package package. I’ve mentioned it a couple of times in passing but never written about it. Recently, I stumbled across it again and rewatched it.

There’s a lot of interesting material in the video and I decided that even though it’s old it was worth writing about. If you aren’t already using use-package, you should definitely take a look. The package really does make your configuration simpler and more logical. It can even be used to speed up Emacs’ start time: Wiegley reports that he has over 200 package and his Emacs starts in about a third of a second.

Even if, like me, you’re already using use-package, there’s still a good reason to spend a half hour watching the video. Wiegley discusses some things I didn’t know about. First is the macrostep package. If you only use it to see exactly what use-package is doing, it’s worth installing. The use-package macro confuses many people. If you watch the video and then expand an invocation of the macro in your configuration, you’ll easily understand what’s going on. It’s really easy to use and if you bind a key sequence to invoke it1, it won’t be loaded until you need it.

Another thing I learned is that you can fold the text in a buffer so that any text that starts more than n columns from the left margin isn’t shown. Oddly, this is a built in functionality (bound to Ctrl+x $) so you don’t need to get anything to use it.

Once you start using use-package, you’ll automatically pull in bind-key so you’ll have describe-personal-keybindings available. When you invoke it, you’ll get a nice list of all the keybindings you’ve defined along with what, if anything, they replaced. That can be really handy for organizing your configuration.

Finally, one of the nice things about use-package is that you can configure it to record how long it takes to load and configure the packages it loads. That can really be handy for tracking down the hot spots in your configuration load time. It all goes into the *Message* buffer so you can ignore it except when you need it.

I really like use-package and have converted my entire init.el to use it. If you aren’t already using it, you should start and the video will go a long way towards convincing you of that.

UPDATE [2017-07-21 Fri 11:58]: Added link to video.

Footnotes:

1

Or you can ask use-package to delay loading it explicitly.

Posted in General | Tagged | 6 Comments

A New Blog Post Searching Protocol

It’s odd how discovering some small Emacs feature can completely change your workflow. Yesterday, I wrote about counsel-git-grep and how it made searching entire repositories easy. Almost without me being aware of it, it changed my blogging workflow and allowed me to bring one more task into Emacs.

When writing a post, I often want to refer to one or more older posts that are related in some way. My normal way of doing that was to use the search feature of Irreal to locate the posts and then leave them up in my browser so I could review them and then link to them from within Emacs. That meant I had to leave Emacs to interact with the blog itself and, of course, it meant I had to be on-line.

That procedure’s a bit silly because I write my posts in Org mode and have the original source immediately available to me in Emacs. Still, searching the source wasn’t really that convenient and I’d end up with a lot of buffers open I didn’t need. Counsel-git-grep changed that. I do the search and can see the results in the minibuffer. I can open the one(s) I want easily, review the contents, and then link to them from within my current post.

I use two methods for linking to the old posts. The first makes use of irreal-link, shown below, to add a link without bothering to open the old post in Irreal. When org2blog publishes a new blog post, one of the things it does is put the page ID into the source buffer. That makes it easy to link to the post by referring to its ID (see the irreal-link code). Notice that there’s no browser interaction involved with this process at all.

Sometimes, it’s convenient to have the actual page up in the browser. For those occasions, I use irreal-by-id, which opens the page in the browser from within Emacs. I send the results to Safari but I could just as easily open it in EWW. Since it is open in Safari, I can add the link in the way I described in the “link to them from with Emacs” link above.

Here’s the code for irreal-link and irreal-by-id. As you can see they are trivial but make linking to the old post easy and convenient.

(defun irreal-link (id label)
  "Make a link to an Irreal post given its ID number and a LABEL."
  (interactive "sID: \nsLabel: ")
  (insert (concat "[[http://irreal.org/blog/?p=" id "][" label "]]")))

(defun irreal-by-id (id)
  "Open an Irreal page by its page ID."
  (interactive "sID: ")
  (browse-url (concat "http://irreal.org/blog/?p=" id)))
Posted in General | Tagged | Leave a comment

Video on counsel-git-grep

After rewatching abo-abo’s refactoring video I decided check out his other videos. I’ve seen most of them, of course, but mostly forgotten the details. One video, counsel git grep demo, is really informative and worth your time.

If you have the Ivy/Counsel/Swiper suite installed, you already have counsel-git-grep. It’s a really handy way of grepping through a git repository. That may sound a bit limited but it’s not. What it does is grep through all the files in the current git repository where “current git repository” means the repository that has a .git directory in the current directory hierarchy. It works by calling git grep so in a sense it’s nothing new but it’s really handy to be able to invoke from within Emacs without having to shell out. All the results are put in the minibuffer in the usual Ivy way so it’s easy to find the right match or to put the results in a separate buffer by calling ivy-occur and deal with the matches one-by-one or even en masse as described in the refactoring video.

After watching the video, I added bindings for counsel-git-grep to my configuration so that I can call it easily. The video is only 4 minutes so you can watch it twice while the coffee’s brewing.

Posted in General | Tagged | 1 Comment

Changing the Recentering Order

I was trolling through some of abo-abo’s old posts over at (or emacs and came upon this gem concerning the order of positions for recenter-top-bottom. I use this command all the time; it’s really handy when a search result leaves the target at the bottom of the screen. A simple Ctrl+l and the line is repositioned to the middle of the screen. Another Ctrl+l and it’s positioned at the top. A third invocation returns it to the bottom.

As abo-abo says, this is a bit counter intuitive and not as handy as you’d like it to be. In the common case of searching for a function, you’d like to position it at the top so as much of the function as possible is visible. Of course you can do that by pressing Ctrl+l twice but it would make more sense for it to start at the top.

Being Emacs, the behavior is, of course, completely configurable. You can set the order to be anything you like. If, as abo-abo says, you believe in gravity and want as much of the result showing as possible, you can set the order to be top-middle-bottom with

(setq recenter-positions '(top middle bottom))

I like that better but others may disagree. The beauty of Emacs, as always, is that you can have it your way.

Posted in General | Tagged | 6 Comments

Collapsing Empty Dired Paths

One annoying feature of many modern development systems and sometimes even single programs is that they hide files or directories in a long, nested series of otherwise empty directories. That is, each directory other than the last contains only a single entry. GitHub has a nice solution for this: they collapse the entire path of otherwise empty directories into a single path. That makes it easy to get to the target without navigating through a bunch of essentially unused directories.

Fuco1, who has little tolerance for such infelicities, has introduced dired-collapse-mode to bring the same functionality to dired. Follow the link to Fuco1’s post to see an example of the mode in action. It’s on Melpa so it’s easy to install and try out. If you have a lot of these nested directories ending in a single target, you should give dired-collapse-mode a try. You having nothing to lose but your frustration.

Posted in General | Tagged | 2 Comments

Claude Shannon

Like most tech people, I know of Claude Shannon but not much about him. We all know he is the father of Information Theory but how many know anything about the man himself? Jimmy Soni and Rod Goodman have written A Mind at Play: How Claude Shannon Invented the Information Age, a biography of Shannon and his work.

The IEEE Spectrum has a nice article based on excerpts from the book. I always appreciated that Shannon was smart but he actually had a genius level intellect. He, along with Barney Oliver and John Pierce were legendary at the Bell Labs headquarters in Manhattan during the Second World War. One interesting fact is that although Shannon slaved away on war related projects during the day, at night he was working on his ground breaking theories about information. Virtually no one knew anything about his work or the results he had obtained until he published them after a decade of labor.

When he published his work, scientists were amazed. No one had anticipated his results or even thought very much about exactly what “information” is. Read the Spectrum article and you will have a new appreciation for how groundbreaking his work was and what a genius he was.

Posted in General | Tagged | Leave a comment

Removing Org Source Block Results

Grant Rettke over at Wisdom and Wonder has posted a nice bit of Elisp that runs a function on every source block in an Org file. His use case for this is getting rid of the results from running the source blocks. As he says, sometimes you accidently evaluate the source blocks—usually as a result of exporting—and then have (potentially) large results cluttering up your document.

Rettke solves that by running his source block code with a function that calls org-babel-remove-result. It’s a cute trick and introduced me to org-babel-remove-result, which I wasn’t familiar with. Hop on over and take a look if you have a need for cleaning up your results or need to run a function on all your source blocks.

Posted in General | Tagged , | Leave a comment