Let me introduce you to the following two important functions:
I don’t believe, of course, that this is the first time you may have encountered them. Nor that they are even half as complicated as the definitions above would suggest.
But although they appear rather awkward, what I wanted for those formulations to highlight is one particular way of interpreting the min and max functions as they are used in code. You may think it’s easy enough to read them quite literally (“get me the smallest/largest value”), and in many cases you are absolutely correct:
Often though, we don’t want to get the extreme value from a set, list, vector or other collection of unspecified size. Instead, we call the min/max functions on a few arguments that are known and “hard-coded” upfront. In many cases, this is mostly done to spare us from introducing a verbose if
statement, or a more cryptic ternary operator (?:
).
Or is it? I was recently surprised when discussing some very simple programming exercise on one of the IRC channels I frequent. Someone pointed out how my proposed solution is quite unusual with its (over)use of the max
function. The task goes somewhere along these lines:
You are presented with a device that has n counters (c0, …, cn-1) and n+1 buttons (b0, …, bn). Every counter ci is incremented whenever a corresponding button bi is pressed. Additionally, pressing the extra button bn sets all the counters to the maximum value displayed on any of them (e.g. [1, 4, 3] → [4, 4, 4]).
Find a way to compute the final state of all counters given a sequence of k buttons presses.
Just by reading the above description and implementing it in the most straightforward way, we can trivially arrive at an algorithm which solves the problem in O(nk) time. And since we need to go through the sequence of button presses at least once, the lower bound for complexity of any other solution is therefore O(k).
My version was a simple improvement over the obvious one, scoring O(n+k) at the cost of maintaining a negligible amount of extra data:
The noteworthy application of max
function was responsible for updating one of those values:
If you look closely, you will notice how the first argument serves as a reference point, while only the second one is the actual ‘input’. What this invocation is saying is essentially “make max_state
at least as big as it was before – and possibly bigger”.
Hardly a groundbreaking insight, eh? This approach, however, allows to rapidly parse even complicated applications of min
and max
. If you encounter, for example, an inlined version of a “clamping” function that is written in the following manner:
I’m pretty sure it will take you at least a few moments to grok what it’s doing. The max
–min
compound is puzzling, and the meaningful arguments (a and b) are disconnected from the function names. Had it had more nesting, you might even have needed to *gasp* count the parentheses.
But we know it’s all about trimming a number to the <a; b> interval. Why not express it in a way that readily highlights this fact?
max(a, ...)
means “I want the result to be at least a“.
min(b, ...)
means “I want the result to be at most b“.
Making both thresholds into first arguments to their respective functions is just one of those nice small things that very subtly, almost invisibly, will make your code easier to work with.
Today I’d like to present something what I consider rather obvious. Normally I don’t do that, but I’ve had one aspiring Pythonist whom I helped with the trick below, and he marveled at the apparent cleverness of this solution. So I thought it might be useful for someone else, too.
Here’s the deal. In Python, functions can be invoked with keyword arguments, so that argument name appears in the function call. Many good APIs use that feature extensively; database libraries known as ORMs are one typical example:
In this call to filter_by()
we pass the email
argument as a keyword. Its value is then used to construct an SQL query that contains a filter on email
column in the WHERE
clause. By adding more arguments, we can introduce more filters, linked together with the AND
operator.
Suppose, though, that we don’t know the column name beforehand. We just have it stored in some variable, maybe because the query is part of authentication procedure and we support different means for it: e-mail, Facebook user ID, Twitter handle, etc.
However, the keyword arguments in function call must always be written as literal Python identifiers. Which means that we would need to “eval” them somehow, i.e. compute dynamically.
How? Probably best is to construct an ad-hoc dictionary and unpack it with **
operator:
That’s it. It may not be obvious at first, because normally we only unpack dictionaries that were carefully crafted as local variables, or received as kwargs
parameters:
But **
works on any dictionary. We are thus perfectly allowed to create one and then unpack it immediately. It doesn’t make much sense in most cases, but this is one of the two instances when it does.
The other situation arises when we know the argument name while writing code, but we cannot use it directly. Python reserves many short, common words with plethora of meanings (computer-scientific or otherwise), so this is not exactly a rare occurrence. You may encounter it when building URLs in Flask:
or parsing HTML with BeautifulSoup:
Strictly speaking, this technique allows you to have completely arbitrary argument names which are not even words. Special handling would be required on both ends of function call, though.
It came up today as a real surprise to me. Up until then, I thought that long gone were the times when I stared at plain old syntax errors in confused bewilderment. Well, at least if we’re talking languages I have some experience with, like Python. So when it happened to me today, I was really caught off-guard.
The crux of the issue can be demonstrated in the following, artificial example:
The goal is to build some simple XML tree using the most convenient interface, i.e. the lxml.builder.E
manipulator from the lxml
library. The real code is somewhat longer and more complicated but this snippet encapsulates the issue pretty neatly.
And strange as it may seem, this little piece produces a SyntaxError
at the final closing parenthesis:
In such case, the first obvious thing anyone would do is of course to look for unmatched opening brace. With the aid of modern editors (or even not so modern ones ;>) this is a trivial task. Before too long we would therefore find out that… all the braces are fine. Double-checking, just to be sure, will have the same result. Everything appears to be in order.
But, of course, we still have the syntax error. What the hell?!
As it turns out, the offending line is just above the seemingly erroneous parentheses. It’s this one:
Or, to be more specific, it is the very last character of this line that the interpreter has problems with:
See, Python really doesn’t like this trailing comma. Which, admittedly, is more than surprising, given how lenient it is in pretty much any other setting. You may recall that it’s perfectly OK to include the additional comma after the final element of a list, tuple, or dictionary, and it is quite useful to do so in practice. Not only that – it is also possible for argument lists in function call. Indeed, this very fragment has one instance of such trailing comma that appears after a keyword argument (city=user.address.city,
).
But apparently this doesn’t really work for all kinds of arguments. If we unpack some positional ones (using *
operator), we cannot put a comma afterwards. The relevant part of Python grammar specification is stating this, of course:
but I wouldn’t call it very explicit. And it seems that you actually can have a comma after *foo
but only if another argument follows. If my intuition of formal grammars is correct, the reason for this rule to prohibit foo(*args,)
(or foo(**kwargs,)
for that matter) is strictly related to the fact than Python’s grammar is LL(1). And this, by the way, is here to stay. Quoting PEP 3099:
Simple is better than complex. This idea extends to the parser. Restricting Python’s grammar to an LL(1) parser is a blessing, not a curse. It puts us in handcuffs that prevent us from going overboard and ending up with funky grammar rules like some other dynamic languages that will go unnamed, such as Perl.
I, for one, deem this attitude completely reasonable – even if it results in 20 minutes of utter confusion once in a blue moon.
Footnote: The title is of course a not-so-obvious reference to The Infernal Semicolon.
It is common that features dubbed ‘syntactic sugar’ are often fostering novel approaches to programming problems. Python’s decorators are no different here, and this was a topic I touched upon before. Today I’d like to discuss few quirks which are, unfortunately, adding to their complexity in a way that often doesn’t feel necessary.
Let’s start with something easy. Pretend that we have a simple decorator named @trace
, which logs every call of the function it is applied to:
An implementation of such decorator is relatively trivial, as it wraps the decorated function directly. One of the possible variants can be seen below:
That’s pretty cool for starters, but let’s say we want some calls to stand out in the logging output. Perhaps there are functions that we are more interested in than the rest. In other words, we’d like to adjust the priority of log messages that are generated by @trace
:
This seemingly small change is actually mandating massive conceptual leap in what our decorator really does. It becomes apparent when we de-sugar the @decorator
syntax and look at the plumbing underneath:
Introduction of parameters requires adding a new level of indirection, because it’s the return value of trace(level=logging.INFO)
that does the actual decorating (i.e. transforming given function into another). This might not be obvious at first glance and admittedly, a notion of function that returns a function which takes some other function in order to output a final function might be – ahem – slightly confusing ;-)
But wait! There is just one more thing… When we added the level
argument, we not necessarily wanted to lose the ability to invoke @trace
without it. Yes, it is still possible – but the syntax is rather awkward:
That’s expected – trace
only returns the actual decorator now – but at least slightly annoying. Can we get our pretty syntax back while maintaining the added flexibility of specifying custom arguments? Better yet: can we make @trace
, @trace()
and @trace(level)
all work at the same time?…
Looks like tough call, but fortunately the answer is positive. Before we delve into details, though, let’s step back and try to somewhat improve the way we are writing our decorators.
Powszechny stereotyp sugeruje, że programiści uwielbiają spierać się co do zalet i wad używanych przez siebie rozwiązań: języków, frameworków, bibliotek czy nawet narzędzi takich jak edytory. Zapewne jest w tym spore ziarno prawdy. Nie ma oczywiście nic złego w rzeczowej dyskusji, w której używane są racjonalne i mające podstawy argumenty. Z tym jednak nie zawsze jest tak różowo.
Jednym z często używanych, ogólnych i pasujących niemal wszędzie “argumentów” jest wspominanie o naturalności danego rozwiązania – lub o jej braku. Mam osobiście duży problem z określeniem zarysów jakichkolwiek użytecznych granic dla tego pojęcia. Nawet więcej: jest ono na tyle niedookreślone, że z niemal równym powodzeniem można by mu przypisać dwa dokładnie przeciwstawne znaczenia. Oba byłyby wprawdzie ścisłe i pozwalały jednoznacznie powiedzieć, czy coś faktycznie jest naturalne czy nie. Problem w tym, że dla wszystkich rozważanych rzeczy odpowiedź byłaby taka sama – a to już jest bezużyteczne. Predykat, który zwraca zawsze
true
lub zawsze false
nie niesie ze sobą żadnej informacji.
Jak więc wyglądają te dwie przeciwstawne postacie?… Z jednej strony trudno zareagować inaczej niż śmiechem na wszelkie próby doszukiwania się pierwotnej Natury w takich zjawiskach jak języki programowania. Życzę powodzenia każdemu, kto próbowałby znaleźć paralele między alternatywą w rodzaju “pętla for
czy funkcja map
” a decyzjami, jakie musieli podejmować nasi przodkowie na afrykańskiej sawannie jakieś 200 tysięcy lat temu. Tak pojmowana naturalność wyklucza oczywiście wszystkie te rzeczy, o które tak przyjemnie jest się spierać. Pocieszmy się przynajmniej tym, że porażka w używaniu któregoś z nich nie oznacza skończenia jako posiłek dla lwa.
Z drugiej strony jednak nie widać powodu, dla którego dowolne wytwory człowieka nie miałyby być uważane za w pełni naturalnie, jeśli nie odmawiamy tego miana mrowiskom, gniazdom ptakom czy pajęczynom. To znów tworzy nam dobrze określony, jednoznaczny predykat… który jednak zawsze zwraca true
, co redukuje jego przydatność do zera.
To wszystko jest oczywiście podejrzane. Jeśli pojęcie ‘naturalny’ nie służy do przekazania nawet jednego bitu użytecznej informacji, to nie powinno być w ogóle używane. Ale jest; to sugeruje, że jego cel jest inny. Może być nim na przykład ukrycie braku rzeczywistego argumentu i próba przemycenia subiektywnego punktu widzenia pod przykrywką obiektywnie brzmiącej etykiety. Słówko ‘naturalny’ brzmi bowiem lepiej niż nawet ‘intuicyjny’. Wydaje się być znacznie precyzyjniejsze (intuicje są przecież mgliste) i sprawia wrażenie odwoływania do pozornie uniwersalnych kryteriów (w przeciwieństwie do subiektywnych intuicji).
Zwykle jednak to tylko złudzenie, ukrywające brak dobrego uzasadnienia dla swoich twierdzeń. Zatem nie mów mi, co jest naturalne. Magiczne zaklęcia nie zastąpią braku rzetelnych argumentów.