blogstrapping

Write Filters

Filters -- more specifically, command line filters -- are underrated. Many software developers who use them every day do not even know "command line filter" is a type of program. The Unix pipeline is built around the use of filters, and anyone who understands the Unix philosophy or has read Eric Raymond's The Art Of Unix Programming (aka TAOUP) knows the Unix pipleline is a brilliant little bit of simplicity that enables a heck of a lot of software power from a collection of small parts.

A command line filter is, at its simplest, a program that reads a text stream from STDIN (standard input) and writes a text stream to STDOUT (standard output). This is not the place for defining STDIN and STDOUT, but it seems worth sharing some examples of filters that you may find on most, if not all, Unix-like systems:

This list is not exhaustive. It barely scratches the surface of tools in various software management system archives, on code hosting sites, or written by people for their own use and never shared with the wider world. There could be many, many more filter programs out there if more people knew how to create them, instead of hard-coding a script to only operate on text from a specific, local source, and write output to a specific, local destination.

Conveniently, writing filters is easy. Many programming languages come with a simple idiom for reading from STDIN, and have another (often print) for writing to STDOUT. Use a loop or recursion to sequentially operate on incoming data between reading and writing, and you have yourself a filter.

The following code samples offer examples of how to write a filter in different languages:

Perl

#!/usr/bin/env perl

while (<>) {
  print;
}

In Perl, the while (<>) { ... } idiom will take input from a Unix pipe, will open an interactive STDIN stream so you can enter lines of text one at a time, or will read from files specified as arguments, automatically. In this case, through the magic of Perl's contextual interpretation, the program implicitly sends each line of STDIN to the print function without having to specify a variable to hold the relevant data.

The while loop continues as long as it can keep pulling data from STDIN, represented by the <> file descriptor.

Ruby

#!/usr/bin/env ruby

$<.each do |line|
  print line
end

Much like the Perl example, this Ruby version will take input from a Unix pipe, will open an interactive STDIN stream so you can enter lines of text one at a time, or will read from files specified as arguments, automatically.

The each block in this example uses a block variable named line to represent each line of text it reads from input, with $< being the built-in representation of STDIN. The block exits when it no longer reads any data.

A real-world Ruby filter implementation is the redrug command line utility, part of the RedRug project. Ignoring a lot of other code (including command line option handling and the implementation of the RedRug library itself, which the command line utility uses), the filter implementation itself looks something like this:

#!/usr/bin/env Ruby

content = String.new

$<.each {|line| content << line }

puts RedRug.to_html content, options

You can see the full source of the redrug command line utility online, at https://fossrec.com/u/apotheon/redrug/index.cgi/artifact?ci=tip&filename=bin/redrug.

Filter Operations

The most obvious use of this idiom, for many, will be to transform text with a regular expression before printing to standard output. One could also perform operations in batch, rather than on an ongoing stream, by first collecting all the input at once, operating on it as a whole, then producing output. An approach commonly taken by wc implementations is a hybrid of these two approaches; it reads the input a piece at a time, increments the counted values, and prints out a numerical report all at once when input has been exhausted, based on the counts it accumulated.

The more you work in the command line, and learn about whatever programming languages you use that make this kind of operation easy, the more opportunities you may find to make your life easier by writing a simple filter. This also helps inexperienced programmers practice the craft of programming with an easy, low-pressure, simple approach to finding useful things to do with code.