summaryrefslogtreecommitdiff
path: root/content/writings/hands-on-unix-philosophy.md
diff options
context:
space:
mode:
authorJohannes Herman <johannes.herman@gmail.com>2026-03-16 11:14:21 +0100
committerJohannes Herman <johannes.herman@gmail.com>2026-03-16 11:14:21 +0100
commitcaa30dd45d3ae76405a7f4ecc9641521c1fb8895 (patch)
tree5641956935e8ef6cb487d47456b884443e5118e8 /content/writings/hands-on-unix-philosophy.md
initial
Diffstat (limited to 'content/writings/hands-on-unix-philosophy.md')
-rw-r--r--content/writings/hands-on-unix-philosophy.md105
1 files changed, 105 insertions, 0 deletions
diff --git a/content/writings/hands-on-unix-philosophy.md b/content/writings/hands-on-unix-philosophy.md
new file mode 100644
index 0000000..1619e1d
--- /dev/null
+++ b/content/writings/hands-on-unix-philosophy.md
@@ -0,0 +1,105 @@
+; Hands on Unix Philosophy
+; tech
+; Tue Mar 10 12:29:22 PM CET 2026
+
+# Hands on Unix Philosophy
+
+I sometimes find myself in a position of trying to explain to someone why I use the software that I do. Two common examples are Linux and "all those text programs". These questions might come from a curious person who has no idea what Unix or a terminal is, or from a technically savvy person who thinks people who use the terminal are hipsters trying to impress themselves. In this post I will defend my choice of software by quickly explaining the Unix philosophy and demonstrating it through a practical example.
+
+## The Philosophy
+
+I won't wade too deep in my explanation of the Unix philosophy. There are countless other, better resources on the internet available for anyone who wants a deeper understanding. But for those who only want to dip their toes in the water, here is the gist of it.
+
+Programs are more useful when they are simple, small in scope, and allow for easy interoperability with other programs. Simple programs are good because they have fewer bugs, are easier to maintain, and probably won't become resource-hungry behemoths. This goes hand in hand with being small in scope. Programs that have a small, well-defined function are easier to reason about,thus making them easier to use. The last piece of the puzzle is where the magic happens: connecting these small programs. [Malcolm Douglas McIlroy](https://www.cs.dartmouth.edu/~doug/) of Unix heritage elegantly captured this idea when he said:
+
+"""
+This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.
+"""
+
+"Do one thing and do it well" is perhaps the phrase most associated with the Unix philosophy. I like it because it implies the existence of software that does a lot of things poorly, which sure seems to describe a lot of software.[^1]
+
+## Practical Example
+
+The other day I found myself doing a repetitive task I had done hundreds of times before. I came across a script that used an option flag that I was unfamiliar with, and started wondering what it did. I used the program **man**, which shows the manual page for a program[^2], and searched for the unfamiliar option flag and read about it. This is a trivial operation, easily done in a couple of seconds, yet it can be surprisingly frustrating at times. Certain options are often referenced before their own section, or are a substring of another option, so finding it can involve pressing '''n''' multiple times before reaching the correct place.
+
+My go-to solution for these small inconveniences is usually to just go about my day. Then it happens again, and I think, "I have been here before," and then it happens again and again, until eventually I make a mental note that I could improve my experience. For a problem as trivial as this one, I also know immediately what the solution is. I want to make a program that takes a program and an option and prints the section of the manual describing that option.
+
+On a Unix-y operating system, writing a script that does this is fairly trivial. All we need are two small programs, one that prints the entire manual, and one that filters it. The first program has already been mentioned, **man**, which prints the manual page in plain text formatting.[^3] The next program should read this output (a text stream) and filter it the way we want. I chose **awk** for this task, which allows you to take text input and transform it in various ways by filtering and rearranging it into a new text. The resulting shell script ended up looking like this:
+
+'''
+#!/bin/sh
+
+set -eu
+
+[ "$#" -ne 2 ] && echo "usage: ${0##*/} PAGE OPT" >&2 && exit 1
+
+man "$1" | awk -v opt="$2" '
+ /^[[:space:]]*-/ {
+ hit = ($0 ~ "(^|[ ,])" opt "([ ,=[]|$)")
+ }
+ hit
+'
+'''
+
+There is some extra fluff before the core functionality of the script, which I will go over in detail in the next part. The important part, however, is that by taking two programs, both using plaintext as their interoperable interface, we have created a new program which solves the problem. By spending 15 minutes writing a script, I have been able to improve my workflow, saving me a bit of time, and more importantly, the cognitive effort of manually scanning the page for the correct section.
+
+Importantly, I would argue that this could not have been done as elegantly or easily using other programs that do not follow the Unix philosophy. Perhaps this use case is common enough that a fully fledged manual viewer would include it, but for more niche use cases, that might not be true. One thing that I know my new program has an edge over the theoretical manual viewer is that it outputs a text steam, which makes it extendable just like **man** and **awk**.
+
+## Script Specifics
+
+For those interested, I wanted to write about some of the decisions I made when writing the script (I call it **boy**; get it?) Here’s a line-by-line explanation of the script:
+
+'''
+#!/bin/sh
+'''
+
+I try to write most, if not all of my scripts in POSIX compliant shell for portability and simplicity sake. This is fairly trivial in small scripts like this, which don't require much from the shell other than running programs.
+
+A simple '''/bin/sh''' shebang is preferable to '''/usr/bin/env sh''' for POSIX scripts, since '''/bin/sh''' is already mandated by POSIX. Adding another dependency via **env** would therefore be unnecessary, and if the system's '''/bin/sh''' either does not exist or is not POSIX-compliant, there are larger problems afoot.
+
+'''
+set -eu
+'''
+
+Setting exit on error ('''-e''') and unset variables ('''-u''') is usually a good practice, although in this specific script neither will do much. The only program that can realistically fail is **man**, but '''-e''' does not protect against failure in pipes, so the script won't exit if **man** fails. We are also checking for required arguments in the next line, so '''-u''' will not catch any usage of unset variables. I've included them mainly in case I make edits to the script in the future, and as I mentioned, it is good practice.
+
+'''
+[ "$#" -ne 2 ] && echo "usage: ${0##*/} PAGE OPT" >&2 && exit 1
+'''
+
+This line checks if the correct amount of arguments is given to the program (2 in this case). If it is not 2, the usage message will be printed to stderr, and the program will exit with status code 1, indicating an error. I try to include a usage message in most of my scripts, as it provides helpful documentation for the future. The '''${0##*/}''' part is a useful shell syntax called pattern removal, which removes everything up to and including the last '''/''' in the '''$0''' variable. This is an alternative to using the **basename** command, e.g, '''$(basename "$0")'''.
+
+'''
+man "$1" | awk -v opt="$2" '
+ /^[[:space:]]*-/ {
+ hit = ($0 ~ "(^|[ ,])" opt "([ ,=[]|$)")
+ }
+ hit
+'
+'''
+
+This line is the main part of the script. It runs the **man** command, and pipes the output into **awk**, which is passed the second argument as '''opt'''. The awk script can look intimidating if you’re not familiar with awk or regular expressions, but I will try to explain it without getting bogged down in language-specific syntax.[^4] The script can be read as follows: for every line, if the line starts with whitespace followed by a *-*, check if '''opt''' appears as a word in that line. If that is true, '''hit''' is set to true, and if '''hit''' is true, the line is printed.
+
+Summarized, the logic basically sets hit to true if the section corresponds to the given '''opt''', and sets '''hit''' to false when the section is not for '''opt'''. It then prints the line if hit is true.
+
+An unintended side effect of this script is that it allows for regular expressions in the '''opt''' input via **awk**, e.g.:
+
+'''
+$ boy ls '--time.*'
+ --time=WORD
+ select which timestamp used to display or sort; access time (-u): atime, access, use; metadata change time (-c):
+ ctime, status; modified time (default): mtime, modification; birth time: birth, creation; with -l, WORD determines
+ which time to show; with --sort=time, sort by WORD (newest first)
+
+ --time-style=TIME_STYLE
+ time/date format with -l; see TIME_STYLE below
+
+ -t sort by time, newest first; see --time
+'''
+
+As a last aside, I am sure that there exists some edge cases that this script will not handle correctly. Man pages can be formatted in all sorts of different ways, and there is no standard for how options should be documented. The point of this post is to demonstrate how following the Unix philosophy allows for useful extensibility, with only a small amount of effort.
+
+[^1]: I am sometimes "forced" to use Microsoft Teams for university collaborations. It has mediocre file sharing, poor document editing, and frustrating formatting in Microsoft Word, all while being so horribly slow that it takes 20 seconds to launch, and stresses my CPU cores just trying to open a folder.
+[^2]: In a hidden example of the Unix philosophy, **man** does not actually display the manual page in a scrollable, searchable interface. It simply prints the contents of the man page and hands it off to another program, in most cases **less**, which provides an interactive interface to the wall of text.
+[^3]: The term "plain text" hides a lot of assumptions, which can lead to a lot of harmful consequences. [Here](https://www.youtube.com/watch?v=_mZBa3sqTrI) is an entertaining talk all about it. Luckily for our purposes we can be quite naïve, and just assume that text is text.
+[^4]: For a nice introduction to awk, I recommend [this](https://www.grymoire.com/Unix/Awk.html) old tutorial / introduction. Some of it might be a bit dated, but I think you are better off knowing some of the old idiosyncrasies/intricacies of the Unix world.