Looks like using Linux is really bound to slowly – but steadily – improve your commandline-fu. As evidence, today I wanted to share a little piece of shell acolyte’s magic that I managed to craft without very big trouble. It’s about counting lines in files – code lines in code files, to be specific.
For a single file, getting the number of text rows is very simple:
Although the name wc
comes from “word count”, the -l
switch changes its mode of operation into counting rows. The flexibility of this little program doesn’t end here; for example, it can also accept piped input (as stdin):
as well as multiple files:
or even wildcards, such as wc -l *.file
. With these we could rather easily count the number of lines of code in our project:
Unfortunately, the exact interpretation of **/*
wildcard seems to vary between shells. In zsh it works as shown above, but in bash I had it omit files from current directory. While it might make some sense here (as it would give a total without setup script and tests), I’m sure it won’t be the case all projects.
And so we need something smarter.
A certain way to list all files matching given property (e.g. name) in current and (recursively) child directories is to use the find
command:
Can we feed such a list into command taking multiple files, like wc
? As it turns out, it is perfectly possible, and the utility that allows to do this is called xargs
. Numerous are its features, of course, but the simplest usage is totally option-less; we only need to supply the target command and pipe our list to xargs
‘ standard input:
This is how we get all the lines printed, so counting them is trivial now:
The real power of this technique lies in the fact that we can inject additional modifiers, or filters, at any stage. We can, for instance, eliminate some files we are not interested in by using grep -v
:
Likewise, we can get rid of comment lines if we push the output of cat
through another regex-based filter:
Obviously, both grep
s can be present at once:
Complexity of this command likely exceeds many typical use cases for find
or grep
, although seasoned shell hackers may think otherwise. In any case, I think the power of this technique is very evident, and not only for counting lines.
or even wildcards, such as wc -l *.file.
nope! wildcards are expanded by shell, it’s not a wc feature.
Pax has right. Unescaped wildcards are expanded by shell, for this reason find with name or path option must have quoted or escaped wildcards.
“Pipes are good for simple hacks, like passing around simple text streams,
but not for building robust software.”
When it comes to pipes, it’s actually PowerShell that does it better. Since what’s being pushed through pipes there is not text but objects, you don’t have to pay attention (or specify) what format should a particular part of the chain output their data in. Unfortunately, in case of *nix there is no standarized object model (like COM or .NET for Windows) that would allow to do that.