Red Meat Friday: Give Nothing, Expect Nothing

Joe Brockmeier over at Dissociated Press makes a point about free software that may make some folks uncomfortable but that I’ve long felt is true: Give nothing, expect nothing. The TL;DR is that if you are using a free service such as GitLab you really don’t have much standing to complain when those services are modified or even eliminated. The companies providing these services are for profit. How else are they going to pay for their infrastructure and engineers?

My longstanding policy is to pay for any service I find useful. That’s not because I’m an especially virtuous person but because if the service is useful to me, I want it to continue to be available. It’s fine to try out a service for free but as soon as it becomes vital to your company or workflow, you better start paying for it to ensure it will still be available tomorrow.

If you don’t pay, then you don’t get to whine when the service is restricted or eliminated. Even if you can’t contribute financially, you can help by providing code or documentation or otherwise supporting the product. Regardless, don’t be an entitled nitwit who thinks the companies providing these services owe you something. They don’t.

Posted in General | Tagged | Leave a comment

Chris Wellons on Hashing

The first data structure I learned and really internalized was the hash table. I’ve been using them my entire career and, given when I started, that meant implementing them myself. This was before I learned C and even C doesn’t have a built-in hash table so it’s always been a matter of build it or do without.

These days, it’s hard to find a (newer) language that doesn’t have a built-in hash table or dictionary data type. There’s a downside to that, however. For many younger engineers, hash tables are mysterious entities that seem like magic. There are, in fact, many strategies for implementing a hash table each with their own tradeoff and which is best depends on the particular application. The problem with a built-in type is that it’s one size fits all.

Chris Wellons writes mostly in C so he has implemented scores of hash tables. He’s settled on his own implementation method which is basically open addressing with double hashing. You can check his post for what exactly that means but it’s one of the main implementation methods. I’ve always preferred the chaining method, which is usually faster but takes more storage. Like Wellons, I’ve done scores of implementations.

Why do that? After all, most languages have a built-in type and even C probably has a library that implements hash tables. Lots of people think there IS no point but I disagree: if you know how things work, you know how to pick the best implementation strategy for your particular application.

Take a look at Wellons’ post for some of those details. Section 6.4 of Volume 3 of Knuth’s AOCP has all the details on the various methods and their tradeoffs. If you’re serious about software engineering, it will repay your attention.

Posted in General | Tagged | Leave a comment

More On Google’s False Accusation

I thought I was finished with the story of Google falsely accusing one of their users of child molestation but Ben Thompson of Stratechery has his own take on the matter that he describes in a long blog post on the controversy.

He’s ambivalent on the matter of whether or not Google should be scanning their users’ photos. He wonders if the harm that child pornography does to children justifies Google’s spying on people. He’s not sure. I am: nothing justifies the spying and the argument that child pornography does is too easily extended to other public safety issues and eventually to any reason at all.

Thompson is sure of one thing: Google should have restored the user’s account and files and he’s astounded that they haven’t. It’s hard to understand why not. Perhaps they’re afraid of admitting guilt in case there’s future legal action.

In every story like this the apologists always say that Google or whoever is, after all, a private company and they can do as they like as long as it’s not illegal. Google, Facebook, Twitter, Apple, and the other big players have worked hard to become essential utilities and they’ve largely succeeded. But if they want to be like the electric or telephone utilities then they should operate under the same sorts of rules. Yes, they are private companies but they have civic responsibilities. They shouldn’t be able, for example, to refuse service to someone because they don’t like what they say or do as long as it’s legal. It’s time to stop letting these companies have it both ways.

Posted in General | Tagged | Leave a comment

Mastering Eshell

Mickey has (re)posted his article on Eshell. I think he’s updated it a bit because he talks about Emacs 28 but regardless, it’s a great article and every Emacser should read it. The problem with Eshell is that there’s not much documentation so Mickey’s article is especially useful.

There’s too much information packed into the article to give a reasonable précis here but there are a couple of capabilities—one of which I use, and the other that I keep promising myself that I’ll learn and use—that are worth calling out. The first is the ability to cd to a directory on a remote machine when you’re in Eshell. It almost seems like magic and works completely transparently except, of course, that you have use the standard Tramp notation for describing a remote file/directory.

The second really great feature is that Eshell implements zsh’s argument predicates. The later Apple OSs now use zsh by default so there’s even less excuse for me not to become adept at their use. The idea is easy, it’s just a matter of learning what the modifiers and predicates are. Fortunately, there are a couple of Emacs commands, eshell-display-predicate-help and eshell-display-modifier-help that pops up a window with lists of the predicates and modifiers. The commands put focus in the window so it’s easy to quit out of it with q. This is especially handy for me because I run Eshell as a full-frame window. As Mickey says, if you use argument predicates a lot it’s probably worthwhile adding an alias for these commands.

There is much more information in the article so be sure to take a look.

Posted in General | Tagged | Leave a comment

Quick Capture of Project Notes

Ben Simon has a nice example of leveraging Emacs to solve a problem and reduce his workflow friction. During his video calls he would bring up a notepad file to capture his notes and share them with his the other callers as a sort of whiteboard. The problem was that he ended up with a mishmash of files with ambiguous names that made organizing his notes difficult.

Like me, one day he realized this process was silly and he decided to move the task to Emacs. He determined he should file his notes in a file hierarchy along the lines of notes/customer/project/date.md. Simon’s desired process is to invoke a single function that will bring up a in the desired hierarchy. That means he needs to determine

  1. His current customer
  2. That customer’s project
  3. Today’s date

Simon shows how he determines those parameters. The date, of course, is easy but the others are a little bit tricky, especially since he’s using both Git and Subversion to host his project files. Still, it takes just a little bit of Elisp to bring up the required file. His post is worth taking a look at just to see how he interacts with his file hierarchy to get the information needed to accomplish the task.

Update [2022-08-30 Tue 16:57]: Added link to Simon’s post.

Posted in General | Tagged | Leave a comment

New Federal Open Access Policy

Finally some sense from the US government. According to this White House news release, research funded by the government must be available to the public without cost or delay. In particular, the current policy of a one year embargo on published research will be discontinued. Of course, being the government, the new policy will not be fully in force until 2026.

As I’ve written many times, it’s immoral to ask the American public to pay for research and then lock up the results of that research behind a paywall. Worse, subscriptions to a single journal can run into the hundreds or even thousands of dollars per year.

The “normal” citizen will have no interest in access to these journals of course but there are plenty of people with the ability and interest to read and use the research but who don’t have access to a university library or other source of the journals. And, of course, researchers from third world countries who are fully capable of using and extending the research have no access at all,

The completely normalized, but illegal, solution is sites like Sci-Hub that curate the papers and make them available for free. This infuriates the publishers whose copyrights are being violated but most of the scientific community—fed up with the publishers’ greed and refusal to remedy the situation—seem fine with Sci-Hub and other pirate sites.

Added before publication

Here’s a couple more articles on the change. One from Science and the other from Ars Technica. These articles looks briefly at some of the consequences of the new rule.

Posted in General | Tagged | Leave a comment

Refiling Org Headline Nodes

Mario Jason Braganza has a useful post that considers moving a headline node from one org file to another. His use case is moving items from his TODO file to his current task file as he acts on TODO items.

Org mode, of course, has an easy way of doing that. You can move nodes to another location from within the current file or to another file altogether using the org-refile command. It’s bound to Ctrl+c Ctrl+w so it’s easy to invoke.

The problem is that it can be a bit fiddly to set up. The reason for that is that you have to specify potential targets for the refiling. There are two aspects to that:

  1. Possible files to contain the refiled node
  2. Headings within the target file to contain the refiled node

Braganza explains how to set all this up. Oddly, he arrives at the exact configuration I have except that I consider more subheadings in the target file than he does. I’ve had it set up for so long I no longer remember configuring it. Lately, I’ve been using it more and more instead of just cutting and pasting nodes.

If you sometimes move nodes from your Org files and want to something move sensible than cutting and pasting, take a look at Braganza’s post to see how to set up org-refile.

Posted in General | Tagged , | Leave a comment

Red Meat Friday: Knuth versus McIlroy

One of our cherished stories, at least of one of my cherished stories, is the account of Don Knuth and Doug McIlroy solving the same problem. The problem was

Given a text file and integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.

The TL;DR is that Knuth wrote a 10 page WEB program to solve the problem while McIlroy solved the same problem with a 6 line Unix shell script.

You can imagine the lessons that were drawn but they all omit a crucial bit on context: Knuth was asked to demonstrate literate programming by solving the problem so McIlroy’s solution wasn’t really on-point.

These days whenever the story comes up the usual reaction is to cry foul because of that missing context—here’s a representative example—but I draw a different conclusion and always have. The two solutions represent two ways of approaching a problem: (1) write a program de novo to arrive at a solution or (2) leverage existing software to solve it.

Many of us—me included—tend to reach for Knuth’s solution first and even think of the quick and easy shell solution as cheating. That’s silly, of course. The point is to solve the problem not to write code. Yes, in the particular case of the story, Knuth’s answer was the best one but usually, absent other special circumstances, McIlroy’s is clearly superior. It’s worth remembering that we have tools other than the hammer of writing code and that not every problem is a nail requiring that hammer.

Posted in General | Tagged | Leave a comment

An Afterword to Yesterday’s Post on Google’s False Accusation

After I wrote yesterday’s post on Google falsely accusing an innocent man of child molestation, I saw this post on the story by John Gruber. My first thought was that I would fold whatever Gruber had to say into my post.

I noticed when reading the post, however, that there a couple of problems with it so I decided on a separate Irreal post. The first problem was annoying. Despite the fact that the user lost a decade’s worth of email, photos, his cellular plan, his email address, and other valuable assets, Gruber fails to draw the obvious conclusion: Don’t commit anything valuable to Google’s care. In fact, don’t use Google’s services at all if you don’t want to lose your data and run the risk of Google informing on you to the police. Your best bet is to tell Google to “lose your number”.

The second problem was infuriating. Gruber, while acknowledging that it was a terrible story and an injustice, excuses Google on the grounds of “good intentions”. You see that a lot whenever child pornography—or other viscerally disturbing subjects, or even more mundane reasons for spying on users—is discussed. “Yes, it’s terrible that innocent people got caught up in Google’s (or whomever’s) surveillance net but it’s okay because Google had good intentions.” The idea is that it’s okay to spy on users in the service of combating the scourge of child pornography.

It’s not. It’s saying, “We don’t have any reason to think you’ve done anything wrong but we’re going to spy on you just to make sure.” In an earlier time we used to call those people nosey parkers and shunned them. Or perhaps bloodied their noses or had them arrested. That’s extreme but there was a lot less gratuitous snooping into other people’s business.

Posted in General | Tagged | Leave a comment

Google Falsely Accuses a Father of Molesting His Son

What!? You’re stilling using Google services? Haven’t you been listening to anything I’ve told you? I’m sorry but I’m over being polite. If you’re still using Gmail, Google Docs, or Google Photos, then you’re being naively foolish and deserve whatever bad thing happens to you as a result.

Google, apparently not content with simply arbitrarily closing users’ accounts and seizing their data when one of their automating scanning tools finds something they disapprove of, have started reporting them to the police. Read this post from Cory Doctorow on how Google closed a user’s account and reported him to the police for child abuse and even after he was cleared of wrong doing refused to restore his data.

The TL;DR is that the user’s son had swollen genitals and when the parents consulted his doctor, the doctor asked them to take a picture of the boys genitals and send them to him because he wasn’t seeing patients in person due to COVID. Since the photo was automatically synced to Google Photos, it was scanned and Google, clutching their pearls, was sure a crime had been committed and notified the police giving them access to all of the user’s files.

The police soon realized that there was no crime or dubious behavior but Google is still refusing to admit they were wrong and won’t restore the data. Other than the loss of data—and other problems described in the post—you could, I guess, say the story had a happy ending but it’s not at all hard to imagine that it might not have.

If you deal with Google you’re putting yourself at risk of not just losing your data but of serious legal difficulties. Stop being stupid. There are plenty of good alternatives.

Posted in General | Tagged | Leave a comment