Even though it’s not a part of the Zen of Python, there is a widely accepted principle in Python community that reads:
It’s better to ask for forgiveness rather than permission.
For code, it generally means try
ing to do something and seeing whether it succeeded, rather than checking if
it should succeed and then doing it. The first approach produces something like this:
and it’s preferred over the more cautious alternative:
What I realized recently is that there is much more general level of software development where the exact same principle can be applied. It’s not just about individual pieces of code, or catching exceptions versus checking preconditions. It extends to functions, modules, classes and packages – but more importantly, to the whole process of adding features, extending functionality or even conceiving totally new projects.
Taking a bit simplistic view, this process can be thought of having two extremes. On one end, there is a concept of pure “waterfall”. Every bit of work has to fit into predefined set of phases and no later phase can start before the previous one has completely finished. In this setting, design must be done upfront and all knowledge about the future product has to be gathered before coding starts.
By even this short description, it’s hard to stop thinking about all the hilarious ways the waterfall approach can end badly. Curiously, though, there are still companies that not only follow it but flaunt the fact publicly.
The other end is sometimes called “cowboy coding”, an almost derogatory term for churning out code without regard to pretty much any methodology whatsoever. Like in case of waterfall, it sometimes (very rarely) works, but it’s more so by accident rather than by design. And like pure waterfall, it also doesn’t really exist in nature, as long as more than one programmer is involved.
If we frame the spectrum like this, with the two completely opposite points at the edges, the natural tendency is to search for the middle ground. But if we subscribe to the titular principle of asking for forgiveness rather than permission, the right answer will clearly be shifted towards the “lean” side.
What is permission here? That’s a green light from design, architecture, requirements, UX and similar standpoints. You wouldn’t mind having all these before you start coding, but it is not absolutely crucial. Not having it, you can still do things: code, prototype, explore, and see how things pan out.
Even if you end up ripping it all apart, or maybe even fail to produce anything notable, you will still accrue useful knowledge. Say you have to throw away most of the week’s work because you’ve learned you need to do it differently. So what? With the insight you have gained along the way, it may very well take just a few hours now.
And so thou shalt be forgiven.
As a fact of life, in bigger projects you often cannot just delete something – be it function, method, class or module. Replacing all its usages with whatever is the new recommendation – if any! – is typically outside of your influence, capabilities or priorities. By no means it should be treated as lost cause, though; any codebase would be quickly overwhelmed by kludges if there were no way to jettison them.
To reconcile those two opposing needs – compatibility and cleanliness – the typical approach involves a transition period. During that time, the particular piece of API shall be marked as deprecated, which is a slightly theatrical term for ‘obsolete’ and ‘not intended for new code’. How effective this is depends strongly on target audience – for publicly available APIs, someone will always wake up and start screaming when the transition period ends.
For in-project interfaces, however, the blow may be effectively cushioned by using certain features of the language, IDE, source control, continuous integration, and so on. As an example, Java has the @Deprecated
annotation that can be applied to functions or classes:
If the symbol is then used somewhere else, it produces a compiler warning (and visual cue in most IDEs). These can be suppressed, of course, but it’s something you need to do explicitly through a complementary language construct.
So I had this idea to try and add similar mechanism to Python. One part of it is already present in its standard library: we have the warnings
module and a built-in category of DeprecationWarning
s. These can be ignored, suppressed, caught or even made into errors.
They are also pretty powerful, as they allow to deprecate certain code paths and not just symbols, which can be useful when introducing new meanings for function parameters, among other things. At the same time, it means using them is irritatingly imperative and adds clutter:
And in this particular case, it also doesn’t work as intended, for reasons that will become apparent later on.
What we’d like instead is something similar to annotation approach that is available in Java:
Given that the @
-things in Python (decorators, that is) are significantly more powerful than the Java counterparts, it shouldn’t be a tough call to achieve this…
Surprisingly, though, it turns out to be very tricky and quite arcane. The problems lie mostly in the subtle issues of what exactly constitutes “usage” of a symbol in Python, and how to actually detect it. If you try to come up with a few solutions, you’ll soon realize how the one that may eventually require walking through the interpreter call stack turns out to be the least insane one.
But hey, we didn’t go to the Moon because it was easy, right? ;) So let’s see how at least we can get started.
Last weekend I attended the PyGrunn conference, back in the good ol’ Netherlands. It was a very enjoyable and instructive event, featuring not only a few local speakers but also some prominent figures from the Python community – like Kenneth Reitz or Armin Ronacher. Overall, it was a weekend of some great pythonic fun.
…Except for a one small detail. As I’ve learned there, Python will apparently get to have enums in the 3.4 version of the language. To say that this was baffling to me would be a severe understatement. Not only I see very little need for such a feature, but I also have serious doubts whether it fits into the general spirit of Python language.
Why so? It’s mostly because of the main purpose of enumeration types, derived from their usage in many languages that already have them as a feature. That purpose is to turn arbitrary data – mostly integers and strings – into well-known, reliable entities that can be safely manipulated inside our programs. Enums act as border guardians, filtering out unexpected data and converting expected data into its safe representation.
What’s safety in this context? It’s type safety, of course. Thanks to enumeration types, we can be certain that a particular value belongs to specific and constrained set of elements. They clearly define all the variants that our code should handle, because everything else was already culled at the conversion stage.
Problem is, in Python there is nothing that would guarantee those safety promises are actually fulfilled. True, the basic property of enum types is retained: given an enum object, we know it must belong to a preordained set of entities. But there is nothing that ensures we are dealing with an enum object at all – short of us actually checking that ourselves:
How this is different from a straightforward membership check:
except for the latter looking cleaner and more explicit?…
This saying, I do not claim enums are completely out of place in Python. This is untrue, if simply because of the fact they are easily implemented through a rather simple metaclass. In fact, this is exactly how the proposed enums.Enum
base is supposed to work.
At the same time, it is also possible to provide some of the before-mentioned type safety. Just look into various libraries that enhance Python with support for contracts: a form of type safety which is even more powerful than what you’ll find in many statically typed languages. You are free to use them, and you should definitely do, if your project would benefit from such a functionality.
Incidentally, they fit right in with that new & upcoming enumeration types from Python 3.4. It remains to be seen what it means exactly for the overall direction of the language design. But with enums in place, the style of “checked” typing suddenly became a lot more pythonic than it was before.
And I can’t say I like it very much.
You probably know very well that internationalization is hard. The mere act of translating the UI texts is actually one of the easiest parts, even though it’s not a pushover either. As one example: if your messages include quantities, you need to have some logic in place to choose different forms of nouns to go with your numbers. Fortunately, most frameworks already have that, as it’s a standard i18n feature.
Not every string message in your code is something to localize, of course. Log messages that are not visible to the user can be left alone in English – they should be, in fact. Coincidentally, though, those messages are also very likely to contain many numbers, often used as numerical quantities: things to do, things done, error count, and so on:
What if that number is 1?…
Oh well. That’s hardly the end of the world, isn’t it? Anyway, let’s just make the message slightly more universal:
There, problem solved!
No worries, I haven’t gone insane. I know that no real-world software would put such a gold plating on something as irrelevant as grammar of its log messages. But it’s spring break, and we can be silly, so let’s have some fun with the idea.
Here I pose the question:
How hard would it be to construct a plural form of English noun from the singular one?
Consulting the largest repository of human knowledge (well, second largest) reveals that the rules of building English plurals are not exactly trivial – but not very complex either. There are exceptions to almost every rule, though, and a large body of exceptions in general. Still, you could expect to achieve at least some success by just disregarding them completely, and following the simple rules to the letter.
How high that success ratio would be, though?
There is this quite well known book, titled Clean Code. It is a very insightful work which I wholeheartedly recommend reading for any serious (or semi-serious) programmer.
The premise revolves around the concept of “cleanness” of code, which the author defines in various ways. Mostly it boils down to high signal-to-noise ratio and clear structure, where everything is neatly subdivided into smaller parts.
The idea is very appealing. For some it may even sound like a grand revelation; I know it was almost like that for me. But there is a bigger kind of meta-lesson to be learned here: the one of scope where such great ideas apply – and where they don’t.
You see, the Clean Code ideal works very well for certain kind of languages. The original examples in the book are laid down in Java and this is no coincidence. Java – as well as C, Go and probably few others – is not a very expressive language: lots of detailed busy work is often needed, even for conceptually simple tasks. And if several such tasks are lined up one after another, the reader is likely to drown in small details instead of seeing the bigger picture.
Those details are therefore one of the prime reasons why you may call some code “unclean”. To tidy up, they need to be properly encapsulated, away from a higher level overview. Hence it’s pretty common in Java to see functions like this one:
and think of them as good code, even if such a function is only used once. The goodness comes from the fact that they are relieving their callers from irrelevant loop minutiae, so that it’s easier to see why we need foos
to be valid in the first place.
Yet, at the same time, if you saw a completely equivalent Python construct:
you would firstly curse at incompetence and lack of knowledge of whoever wrote it:
and then you would get rid of the function altogether:
Why? Because the language is expressive enough for implementation itself to be almost as readable as the function call. Sure, throw the result of all(...)
into a variable for even more self-documenting sweetness, but don’t put it in some far away place behind a standalone function. Such code may or may not be cleaner; it will definitely raise eyebrows, though.
And lack of astonishment is probably the most important metric.
Flask is one of the countless web frameworks available for Python. It’s probably my favorite, because it’s rather minimal, simple and easy to use. All the expected features are there, too, although they might not be as powerful as in some more advanced tools.
As an example, here’s how you define some simple request handler, bound to a parametrized URL pattern:
This handler responds to requests that go to /post/42 and similar paths. The syntax for those URL patterns is not very advanced: parameters can only be captured as path segments rather than arbitrary groups within a regular expression. (You can still use query string arguments, of course).
On the flip side, reversing the URL – building it from handler name and parameters – is always possible. There is a url_for
function which does just that. It can be used both from Python code and, perhaps more usefully, from HTML (Jinja) templates:
Parameters can have types, too. We’ve seen, for example, that post_id
was defined as int
in the URL pattern for blogpost
handler. These types are checked during the actual routing of HTTP requests, but also by the url_for
function:
Most of the time, this little bit of “static typing” is a nice feature. However, there are some cases where this behavior of url_for
is a bit too strict. Anytime we don’t intend to invoke the resulting URL directly, we might want a little more flexibility.
Biggest case-in-point are various client-side templates, used by JavaScript code to update small pieces of HTML without reloading the whole page. If you, for example, wanted to rewrite the template above to use Underscore templates, you would still want url_for
to format the blogpost
URL pattern:
Assuming you don’t feel dizzy from seeing two templating languages at once, you will obviously notice that '< %= post.id %>'
is not a valid int
value. But it’s a correct value for post_id
parameter, because the resulting URL (/post/< %= post.id %>
) would not be used immediately. Instead, it would be just sent to the browser, where some JS code would pick it up and replace the Underscore placeholder with an actual ID.
Unfortunately, bypassing the default strictness of url_for
is not exactly easy.
Often I advocate using Python for various automation tasks. It’s easy and powerful, especially when you consider how many great libraries – both standard and third party – are available at your fingertips. If asked, I could definitely share few anecdotes on how some .py script saved me a lot of hassle.
So I was a bit surprised to encounter a non-trivial problem where using Python seemed like an overkill. What I needed to do was to parse some text documents; extract specific bits of information from them; download several files through HTTP based on that; unzip them and place their content in designated directory.
Nothing too fancy. Rather simple stuff.
But then I realized that doing all this in Python would result in something like a screen and a half of terse code, full of tedious minutiae.
The parsing part alone would be a triply nested loop, with the first two layers taken by os.walk
boilerplate. Next, there would be the joys of urllib2
; heaven forbid it turns out I need some headers, cookies or authentication. Finally, I would have to wrap my head around the zipfile
module. Oh cool, seems like some StringIO
glue might be needed, too!
Granted, I would probably use glob2 for walking the file system, and definitely employ requests for HTTP work. And thus my little script would have external dependencies; isn’t that making it a full-blown program?…
Hey, I didn’t sign up for this! It was supposed to be simple. Why do I need to reimplement grep
and curl
, anyway? Can’t I just…
…oh wait.