Archive for Programming

Python is the Worst Part of C++

2013-07-15 22:15

I’ve had a peculiar kind of awful realization after listening to a C++ talk earlier today. The speaker (Bjarne Stroustrup, actually) went over a few defining components of the language, before he took a much longer stop at templates. In C++14, templates are back in the spotlight because of the new constraints feature, intended to make working with (and especially debugging) templates much more pleasant.

Everyone using C++ templates now is probably well accustomed to the cryptic and excessively long error messages that the compiler spits out whenever you make even the slightest of mistakes. Because of the duck typing semantics of template arguments, those messages are always exposing the internals of a particular template’s implementation. If you, for example, worked with STL implementation from Visual C++, you would recognize the internal __rb_tree symbol; it appeared very often if you misused the map or set containers. Its appearance was at best only remotely helpful at locating the issue – especially if it was inside a multi-line behemoth of an error message.

Before constraints (or “concepts lite”, as they are dubbed) improve the situation, this is arguably the worst part of the C++ language. But alas, C++ is not the only language offering such a poor user experience. As a matter of fact, there is a whole class of languages which are exactly like that – and they won’t change anytime soon.

Yes, I’m talking about the so-called scripting languages in general, and Python in particular. The analogies are striking, too, once you see past the superfluous differences.

Take the before mentioned duck typing as an example. In Python, it is one of the core semantical tenets, a cornerstone of language’s approach to polymorphism. In current C++, this is precisely the cause of page-long, undecipherable compiler errors. You just don’t know whether it’s a duck before you tell it to quack, which usually happens somewhere deep inside the template code.

But wait! Python et al. also have those “compiler errors”. We just call them stacktraces and have interpreters format them in much a nicer, more readable way.
Of course unlike template-related errors in C++, stacktraces tend to be actually helpful. I pose, however, that it’s mostly because we learned to expect them. Studying Python or any other scripting language, we’re inevitably exposed to them at the very early stage, with a steady learning curve that corresponds to the growing complexity of our code.
This is totally different than having the compiler literally throw its innards at you when you try to sort a list of integers.

What I find the most interesting in this whole intellectual exercise is to examine what solutions are offered by both sides of the comparison.
Dynamic languages propose coping mechanisms, at best. You are advised to liberally blanket your code with automated tests so that failing to quack is immediately registered before the duck (er, code) goes live. While some rudimentary static analysis and linting is typically provided, you generally cannot have a reasonable idea whether your code doesn’t fail at the most basic level before you actually run it.

Now, have you ever unit-tested the template specification process that the C++ compiler performs for any of your own templates? Yeah, I thought so. Except for the biggest marvels of template metaprogramming, this may not be something that even crosses your mind. Instead, the established industry practice is simply “don’t do template metaprogramming”.

But obviously, we want to use dynamic languages, and some of us probably want to do template metaprogramming. (Maybe? Just a few? Anyone?…) Since they clearly appear to be similar problems, it’s not very surprising that remedies start to look somewhat alike. C++ is getting concepts in order to impose some rigidity on the currently free-form template arguments. Python is not fully on that path yet but the signs are evident, with the recent adoptions of enums (that I’ve fretted about) as the most prominent example.

If I’m right here, it would be curious to see what lies at the end of this road. In any case, it will probably have been already invented fifty years ago in Lisp.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on Python is the Worst Part of C++

Testing Jinja Templates

2013-07-06 21:43

If you use a powerful HTML templating engine – like Jinja – inevitably you will notice a slow creep of more and more complicated logic entering your templates. Contrary to what many may tell you, it’s not inherently bad. Views can be complex, and keeping that complexity contained within templates is often better than letting it sip into controller code.

But logic, if not trivial, requires testing. Exempting it by saying “That’s just a template!” doesn’t really cut it. It’s pretty crappy excuse, at least in Flask/Jinja, where you can easily import your template macros into Python code:

  1. {% macro hello_world() %}
  2.     Hello world!
  3. {% endmacro %}
  1. import flask
  2. hello_world = flask.get_template_attribute('hello.html', 'hello_world')
  3. print hello_world()

When writing a fully featured test suite, though, you would probably want some more leverage Importing those macros by hand in every test can get stale rather quickly and leave behind a lot of boilerplate code.

Fortunately, this is Python. We have world class tools to combat repetition and verbosity, second only to Lisp macros. There is no reason we couldn’t write tests for our Jinja templates in clean and concise manner:

  1. class Hello(JinjaTestCase):
  2.     __template_imports__ = {
  3.         'hello_world': 'hello.html',
  4.     }
  5.  
  6.     def test_hello_world(self):
  7.         result = self.hello_world()
  8.         self.assert_in("hello", result.lower())
  9.         self.assert_in("world", result.lower())

The JinjaTestCase base, implemented in this gist, provides evidence that a little __metaclass__ can go a long way :)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Testing Jinja Templates

Fluent Chaining

2013-06-30 14:24

Look at the following piece of jQuery code:

  1. $('span')
  2.     .attr('data-type', 'info')
  3.     .addClass('message')
  4.     .css('width', '100%')
  5.     .text("Hello world!")
  6.     .appendTo($('body'))
  7. ;

Of the two patterns it demonstrates, one is almost decisively bad: you shouldn’t build up DOM nodes this way. To get more concise and maintainable code, it’s better to use one of the client-side templating engines.

The second pattern, however, is hugely interesting. Most often called method chaining, it also goes by a more glamorous name of fluent interface. As you can see by a careful look at the code sample above, the idea is pretty simple:

Whenever a method is mostly mutating object’s state, it should return the object itself.

Prime example of methods that do that are setters: simple function whose pretty much only purpose is to alter the value of some property stored as a field inside the object. When augmented with support for chaining, they start to work very pleasantly with few other common patterns, such as builders in Java.
Here’s, for example, a piece of code constructing a Protocol Buffer message that doesn’t use its Builder‘s fluent interface:

  1. Person.Builder builder = Person.newBuilder();
  2. builder.setId(42);
  3. builder.setFirstName("John");
  4. builder.setLastName("Doe");
  5. builder.setEmail("johndoe@example.com")
  6. Person person = builder.build();

And here’s the equivalent that takes advantage of method chaining:

  1. Person person = Person.newBuilder()
  2.     .setId(42)
  3.     .setFirstName("John")
  4.     .setLastName("Doe")
  5.     .setEmail("johndoe@example.com")
  6.     .build();

It may not be shorter by pure line count, but it’s definitely easier on the eyes without all these repetitions of (completely unnecessary) builder variable. We could even say that the whole Builder pattern is almost completely hidden thanks to method chaining. And undoubtedly, this a very good thing, as that pattern is just a compensation for the deficiencies of Java programming language.

By now you’ve probably figured out how to implement method chaining. In derivatives of C language, that amounts to having a return this; statement at the end of method’s body:

  1. jQuery.fn.extend({
  2.     addClass: function( value ) {
  3.         // ... lots of jQuery code ...
  4.         return this;
  5.     },
  6.     // ... other methods ...
  7. });

and possibly changing the return type from void to the class itself, a pointer to it, or reference:

  1. public Builder setFirstName(String value) {
  2.     firstName_ = value;
  3.     return this;
  4. }

It’s true that it may slightly obscure the implementation of fluent class for people unfamiliar with the pattern. But this cost comes with a great benefit of making the usage clearer – which is almost always much more important.

Plus, if you are lucky to program in Python instead, you may just roll out a decorator ;-)

Tags: , , , , , ,
Author: Xion, posted under Programming » Comments Off on Fluent Chaining

Ask for Forgiveness, Not Permission

2013-06-16 6:08

Even though it’s not a part of the Zen of Python, there is a widely accepted principle in Python community that reads:

It’s better to ask for forgiveness rather than permission.

For code, it generally means trying to do something and seeing whether it succeeded, rather than checking if it should succeed and then doing it. The first approach produces something like this:

  1. try:
  2.     print dictionary[key]
  3. except KeyError:
  4.     print "Key not found"

and it’s preferred over the more cautious alternative:

  1. if key in dictionary:
  2.     print dictionary[key]
  3. else:
  4.     print "Key not found"

What I realized recently is that there is much more general level of software development where the exact same principle can be applied. It’s not just about individual pieces of code, or catching exceptions versus checking preconditions. It extends to functions, modules, classes and packages – but more importantly, to the whole process of adding features, extending functionality or even conceiving totally new projects.

Taking a bit simplistic view, this process can be thought of having two extremes. On one end, there is a concept of pure “waterfall”. Every bit of work has to fit into predefined set of phases and no later phase can start before the previous one has completely finished. In this setting, design must be done upfront and all knowledge about the future product has to be gathered before coding starts.
By even this short description, it’s hard to stop thinking about all the hilarious ways the waterfall approach can end badly. Curiously, though, there are still companies that not only follow it but flaunt the fact publicly.

The other end is sometimes called “cowboy coding”, an almost derogatory term for churning out code without regard to pretty much any methodology whatsoever. Like in case of waterfall, it sometimes (very rarely) works, but it’s more so by accident rather than by design. And like pure waterfall, it also doesn’t really exist in nature, as long as more than one programmer is involved.

If we frame the spectrum like this, with the two completely opposite points at the edges, the natural tendency is to search for the middle ground. But if we subscribe to the titular principle of asking for forgiveness rather than permission, the right answer will clearly be shifted towards the “lean” side.
What is permission here? That’s a green light from design, architecture, requirements, UX and similar standpoints. You wouldn’t mind having all these before you start coding, but it is not absolutely crucial. Not having it, you can still do things: code, prototype, explore, and see how things pan out.
Even if you end up ripping it all apart, or maybe even fail to produce anything notable, you will still accrue useful knowledge. Say you have to throw away most of the week’s work because you’ve learned you need to do it differently. So what? With the insight you have gained along the way, it may very well take just a few hours now.

And so thou shalt be forgiven.

Tags: , ,
Author: Xion, posted under Programming » Comments Off on Ask for Forgiveness, Not Permission

Implication as an Operator

2013-06-02 18:28

A vast majority of code is dealing with logical conditions by using three dedicated operators: not (negation), and (conjunction), or (alternative). These are often written as !, && and ||, respectively, in C-like programming languages.

In principle, this is sufficient. Coupled with true and false, it’s enough to encode any boolean function of zero, one or two arguments. But language designers didn’t seem to be concerned with minimalism here, because it’s possible to replace those three operators with just one of the following binary functions:

  1. nand(x, y) = not (x and y)
  2. nor(x, y) = not (x or y)

If you can’t immediately see how, start with deriving negation first.
So we already have some redundancy for the sake of readability. While it’s surely a bad idea to try and incorporate all 22 before mentioned functions, isn’t there at least few others that would make sense as operators on their own?

I can probably think of one: the material implication (p \to q). It may seem like a weird choice at first, but there are certain common scenarios where such operator would make things more explicit.
Imagine for a second that some mainstream language (like Java) has been enhanced with operator => that acts as implication. Here’s one example of its straightforward usage:

  1. public class NameMatcher {
  2.     @Nullable private String firstName_;
  3.     @Nullable private String lastName_;
  4.  
  5.     public NameMatcher(@Nullable String firstName,
  6.                        @Nullable String lastName) {
  7.         firstName_ = firstName;
  8.         lastName_ = lastName;
  9.     }
  10.  
  11.     public boolean matches(String firstName, String lastName) {
  12.         return firstNameMatches(firstName) && lastNameMatches(lastName);
  13.     }
  14.  
  15.     private boolean firstNameMatches(String firstName) {
  16.         return matchFirstName() => firstName_.equals(firstName);
  17.     }
  18.     private boolean matchFirstName() {
  19.         return firstName_ != null;
  20.     }
  21.  
  22.     private boolean lastNameMatches(String lastName) {
  23.         return matchLastName() => lastName_.equals(lastName);
  24.     }
  25.     private boolean matchLastName() {
  26.         return lastName_ != null;
  27.     }
  28. }

Many situations involving “optionals” could take advantage of logical implication as an operator. Also note how in this case, the alternatives do not look very appealing at all. One could use an if statement to produce equivalent construct:

  1. private boolean firstNameMatches(String firstName) {
  2.     if (matchFirstName()) {
  3.         return firstName_.equals(firstName);
  4.     }
  5.     return true;
  6. }

but this makes a trivial one-liner suddenly look quite involved. We could also expand the implication using the equivalence law p \to q \equiv \neg p \lor q:

  1. private boolean matchFirstName(String firstName) {
  2.     return !matchFirstName() || firstName_.equals(firstName);
  3. }

Reader would then have to perform the opposite transformation anyway, in order to restore the real meaning hidden behind the non-obvious ! and || operators. Finally, we could be a little more creative:

  1. private boolean firstNameMatches(String firstName) {
  2.     return matchFirstName() ? firstName_.equals(firstName) : true;
  3. }

and capture the intent almost perfectly… At least until along comes someone clever and “simplifies” the expression into the not-a-or-b form presented above.

Does any language actually have the implication operator? Not surprisingly, the answer is yes – but it’s most likely a language you wouldn’t want to code in. Older and scripting versions of Visual Basic had the Imp operator, intended to evaluate the logical connective p \to q.
Besides provoking a few chuckles with its hilarious name, its usefulness was limited by the fact that it wasn’t short-circuiting. Both arguments were always evaluated, even if the first one turned out false. You may notice that in our NameMatcher example, such a behavior would produce NullPointerException when one of the names is null. This is also the reason why implication implemented as a function:

  1. bool implies(bool p, bool q) {
  2.     return !p || q;
  3. }

would not work in most languages, for the arguments are all evaluated before executing function’s code.

Tags: , ,
Author: Xion, posted under Math, Programming » 3 comments

Deprecate This

2013-05-26 14:02

As a fact of life, in bigger projects you often cannot just delete something – be it function, method, class or module. Replacing all its usages with whatever is the new recommendation – if any! – is typically outside of your influence, capabilities or priorities. By no means it should be treated as lost cause, though; any codebase would be quickly overwhelmed by kludges if there were no way to jettison them.

To reconcile those two opposing needs – compatibility and cleanliness – the typical approach involves a transition period. During that time, the particular piece of API shall be marked as deprecated, which is a slightly theatrical term for ‘obsolete’ and ‘not intended for new code’. How effective this is depends strongly on target audience – for publicly available APIs, someone will always wake up and start screaming when the transition period ends.

For in-project interfaces, however, the blow may be effectively cushioned by using certain features of the language, IDE, source control, continuous integration, and so on. As an example, Java has the @Deprecated annotation that can be applied to functions or classes:

  1. public class Foo {
  2.     /**
  3.      * @deprecated Use FooFactory instead
  4.      */
  5.     @Deprecated
  6.     public static Foo create() {
  7.         return new Foo();
  8.     }
  9. }

If the symbol is then used somewhere else, it produces a compiler warning (and visual cue in most IDEs). These can be suppressed, of course, but it’s something you need to do explicitly through a complementary language construct.

So I had this idea to try and add similar mechanism to Python. One part of it is already present in its standard library: we have the warnings module and a built-in category of DeprecationWarnings. These can be ignored, suppressed, caught or even made into errors.
They are also pretty powerful, as they allow to deprecate certain code paths and not just symbols, which can be useful when introducing new meanings for function parameters, among other things. At the same time, it means using them is irritatingly imperative and adds clutter:

  1. class Foo(object):
  2.     def __init__(self):
  3.         warnings.warn("Foo is deprecated", DeprecationWarning)
  4.         # ... rest of Foo constructor ...

And in this particular case, it also doesn’t work as intended, for reasons that will become apparent later on.
What we’d like instead is something similar to annotation approach that is available in Java:

  1. @deprecated
  2. class Foo(object):
  3.     # ...

Given that the @-things in Python (decorators, that is) are significantly more powerful than the Java counterparts, it shouldn’t be a tough call to achieve this…

Surprisingly, though, it turns out to be very tricky and quite arcane. The problems lie mostly in the subtle issues of what exactly constitutes “usage” of a symbol in Python, and how to actually detect it. If you try to come up with a few solutions, you’ll soon realize how the one that may eventually require walking through the interpreter call stack turns out to be the least insane one.

But hey, we didn’t go to the Moon because it was easy, right? ;) So let’s see how at least we can get started.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on Deprecate This

The Subshell Gotcha

2013-05-20 13:13

Many are the quirks of shell scripting. Most are related to confusing syntax, but some come from certain surprising semantics of Bash as a language, as well as the way scripts are executed.
Consider, for example, that you’d like to list files that are within certain size range. This is something you cannot do with ls alone. And while there’s certainly some awk incantation that makes it trivial, let’s assume you’re a rare kind of scripter who actually likes their hacks readable:

  1. #!/bin/sh
  2.  
  3. min=$1
  4. max=$2
  5.  
  6. ls | while read filename; do
  7.     size=$(stat -f %z $filename)
  8.     if [ $size -gt $min ] && [ $size -lt $max ]; then
  9.         echo $filename
  10.     fi
  11. done

So you use an explicit while loop, obtain the file size using stat and compare it to given bounds using a straightforward if statement. Pretty simple code that shouldn’t cause any troubles later on… right?

But as your needs grow, you find that you also want to count how many files fall within your range, and how many do not. Given that you have an explicit if, it appears like a simple addition (in quite literal sense):

  1. matches=0
  2. misses=0
  3. ls | while read filename; do
  4.     size=$(stat -f %z $filename)
  5.     if [ $size -gt $min ] && [ $size -lt $max ]; then
  6.         echo $filename
  7.         ((matches++))
  8.     else
  9.         ((misses++))
  10.     fi
  11. done
  12.  
  13. echo >&2 "$matches matches"
  14. echo >&2 "$misses misses"

Why it doesn’t work, then? Because clearly this is not the output we’re looking for (ls_between is our script here):

  1. $ ls -al
  2. total 25296
  3. drwxrwxr-x  19 xion  staff       646 15 Apr 18:44 .
  4. drwxrwxr-x  15 xion  staff       510 20 May 11:15 ..
  5. -rw-rw-r--   1 xion  staff        16 10 May  2012 hello.py
  6. -rw-rw-r--   1 xion  staff      4005 28 May  2012 keyword_stats.py
  7. -rw-rw-r--   1 xion  staff       218  5 Aug  2012 magical.py
  8. -rw-rw-r--   1 xion  staff     19901 11 May  2012 space_invaders.py
  9. $ ls_between 1024 10241024
  10. keyword_stats.py
  11. space_invaders.py
  12. 0 matches
  13. 0 misses

It seems that neither matches nor misses are counted properly, even though it’s clear from the printed list that everything is fine with our if statement and loop. Wherein lies the problem?

Tags: , , , ,
Author: Xion, posted under Applications, Programming » 2 comments
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.