Posts tagged ‘Python’

Literal __imports__

2013-09-28 22:56

As part of a language, Python obviously has import statements. They allow us to divide the code into different modules and packages:

  1. import collections
  2. import os
  3.  
  4. import thirdpartylib
  5. import anotherlib
  6.  
  7. from myapp.util import do_stuff

What is lesser known fact is that it also has an __import__ function. This function retains all functionality of the import statement, but has some additional features and slightly different use cases. With it, for example, you can import a module whose name you only know at “runtime”:

  1. def import_plugin(name):
  2.     plugins_package = __import__('myapp.plugins', fromlist=[name])
  3.     return getattr(plugins_package, name)

This comes handy in various types of general dispatchers, factory functions, plugin systems, and so forth. Returned from __import__ function is always a module object (even in cases when fromlist argument is used), so often a getattr is needed to extract a specific symbol from it.

Quite surprisingly, I have discovered that __import__ function may very well be useful also when you do know the desired module name. Reason is that import statement is sometimes unwieldy. It has similar problem as global variables (i.e. global statements) and inner function definitions (as opposed to lambdas): it makes the code stretch unnecessarily in the vertical dimension.

Just for one thing

This can be considered a waste if you only need to access one specific thing from one specific module. Using __import__ function, you can golf the import and the usage into a single statement. Here’s an example, coming straight from my own project recursely:

  1. __import__('recursely').install()

Incidentally, the other uses of literal __import__ described can be conveniently replaced thanks to that small library :)

Will it lint?

Another issue with import statement is that it introduces symbols into the global (or local) namespace. Most of the time, this is precisely what we want. Occasionally, though, a sole fact of loading the module is enough.

A canonical example of the latter case is web application with request handlers scattered between different Python files, or even packages. All those files have to be imported if the handlers are to be added to framework’s routing table; but beyond that, we have no business with them.

As a result, the import statement(s):

  1. import handlers

introduces an unused symbol – here, it is handlers. Many linting tools will be eager to point this fact out, which is not really that helpful. There is sometimes an option to disable the warning on per line basis, but some checkers (e.g. pep8.py) don’t offer this functionality.
Universal solution? Use __import__ function, of course:

  1. __import__('handlers')

The module is still loaded just fine, but since we’re ignoring the return value, no stray variables are created. As a added bonus, the __import__ call also looks very different, signifying its special purpose.

Staying in place

Actually, there is one more benefit of this trick, also coming from fooling-the-tools department. Many Python IDEs, like Eclipse/Pydev, are able to automatically insert necessary imports and organize them in groups, effectively providing a neat, Java-like experience. What is not so neat is that they often insist on putting every import statement somewhere near the beginning of the file. preceding any other definition, variable, class or function.

In a scenario described above, this behavior may actually cause problems. When the handlers’ module gets imported, it may need to refer back to the application object; this is exactly the case in the Flask framework, for example. If that object happens to be defined in the module importing handlers, we’ll have a circular import error because the application object has not yet been defined. It would have been defined, however, if the statement:

  1. import handlers

hasn’t been touched by the IDE when it wanted to be helpful and organize our imports. All imports, as it turns out.

Fortunately, mechanisms like that tend to be easy to fool. Per answers to StackOverflow question I’ve once asked, it is a matter breaking the textual pattern that the algorithm searches our code for. As you may have guessed by now, one of the ways of achieving that goal is to shed the import statement in favor of __import__ function.

Tags: , ,
Author: Xion, posted under Programming » Comments Off on Literal __imports__

Lists vs Tuples

2013-09-04 18:42

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

  1. files = get_file_list('/home/xion')
  2. if files[0] == '.':
  3.     files.pop(0)
  4. if files[0] == '..':
  5.     files.pop(0)
  6. print "Found ", len(files), " files."
  7. for file in files:
  8.     print "- ", file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

  1. a_list = [1, 2, 3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

  1. diverse_list = [42, "foo", False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

  1. a_tuple = (1, 2, 3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

  1. for x in a_tuple:
  2.    print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

  1. >>> a_tuple[0] = 42
  2. Traceback (most recent call last):
  3.   File "<stdin>", line 1, in <module>
  4. TypeError: 'tuple' object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

  1. diverse_tuple = (42, "foo", False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

  • objects retain their individual identity
  • their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.
  • objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

  1. typedef tuple<int, string> numbered_line;
  2.  
  3. // Example usage
  4. vector<numbered_line> read_numbered_lines(string filename, int start=1) {
  5.     vector<numbered_line> result;
  6.     ifstream infile(filename);
  7.     string line;
  8.     for (int i = start; getline(filename, line); ++i) {
  9.         result.push_back(make_tuple(i, line));
  10.     }
  11.     return result;
  12. }

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

  1. struct numbered_line {
  2.     string filename;
  3.     int number;
  4.     string text;
  5. };

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Lists vs Tuples

Hashbang Hacks: Parameters for Python

2013-08-18 14:00

This:

  1. #!/bin/sh

is an example of hashbang. It’s a very neat Unix concept: when placed at the beginning of a script, the line starting with # (hash) and ! (bang) indicates an interpreter that should be chosen when running the script as an executable. Often used for shells (#!/bin/bash, #!/bin/zsh…), it also works for many regular programming languages, like Ruby, Python or Perl. Some of them may not even use # as comment character but still allow for hashbangs, simply by ignoring such a first line. Funnily enough, this is just enough to fully “support” them, as the choice of interpreter is done at the system level.

Sadly, though, the only portable way to write a hashbang is to follow it with absolute path to an executable, which makes it problematic for pretty much anything other than /bin/*sh.
Take Python as an example. On many Linuxes it will be under /usr/bin/python, but that’s hardly a standard. What about /usr/local/bin/python? ~/bin/python?… Heck, one Python I use is under /usr/local/Cellar/python/2.7.3/bin – that’s installed by Homebrew on OS X, a perfectly valid Unix! And I haven’t even mentioned virtualenv

This madness is typically solved by a standard tool called env, located under /usr/bin on anything at least somewhat *nixy:

  1. #!/usr/bin/env python

env looks up the correct executable for its argument, relying on the PATH environmental variable (hence its name). Thanks to env, we can solve all of the problems signaled above, and any similar woes for many other languages. That’s because by the very definition, running Python file with the above hashbang is equivalent to passing it directly to the interpreter:

  1. $ cat >hello.py «EOF
  2. #!/usr/bin/env python
  3. print "Hello, world"
  4. EOF
  5. $ chmod a+x hello.py
  6. $ ./hello.py
  7. Hello, world
  8. $ python hello.py
  9. Hello, world

Now, what if you wanted to also include some flags in interpreter invocation? For python, for example, you can add -O to turn on some basic optimizations. The seemingly obvious solution is to include them in hashbang:

  1. #!/usr/bin/env python -O

Although this may very well work, it puts us again into “not really portable” land. Thankfully, there is a very ingenious (but, sadly, quite Python-specific) trick that lets us add arguments and be confident that our program will run pretty much anywhere.

Here’s how it looks like:

  1. #!/bin/sh
  2. """"exec python -O "$0" "$@";" """

Understandably, it may not be immediately obvious how does it work. Let’s dismantle the pieces one by one, so we can see how do they all fit together – down not just to every quotation sign, but also to every space.

Simulating nonlocal Keyword in Python 2.x

2013-08-04 15:30

The overall direction where Python 3 is going might be a bit worrying, but it’s undeniable that the 3.0 line has some really nice features and quality-of-life improvements. What’s not to love about Unicode string literals enabled by default? Or the print function? What about range, map and filter all being generators? Neato!

There are also few lesser known points. One is the new nonlocal keyword. It shares the syntax with the global keyword, which would make it instantaneously fishy just by this connotation. However, nonlocal looks genuinely useful: it allows to modify variables captured inside function’s closure:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.  
  4.     def examine(number):
  5.         if number % 2 == 0:
  6.             nonlocal even_count
  7.             even_count += 1
  8.         else:
  9.             nonlocal odd_count
  10.             odd_count += 1
  11.  
  12.     list(map(examine, numbers))
  13.     return even_count, odd_count

It’s something you would do quite a lot in certain other languages (*cough* JavaScript), so it’s good that Python has got around to support the notion as well. Syntax here is a bit clunky, true, but that’s simply a trade off, stemming directly from the lack of variable declarations in Python.

What about Python 2.x, though – are we out of luck? Well, not completely. There are a few ways to emulate nonlocal through other pythonic capabilities, sometimes even to better effect than the nonlocal keyword would yield.

Use generator instead

Speaking of yielding… As you have probably noticed right away, the example above is quite overblown and just plain silly. You don’t need to play functional just to count some values – you would use a loop instead:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.     for number in numbers:
  4.         if number % 2 == 0:
  5.             even_count += 1
  6.         else:
  7.             odd_count += 1
  8.      return even_count, odd_count

Really, the previous version is just a mindless application of the classic Visitor pattern, which is another reason why you shouldn’t do that: pattern overuse is bad. This saying, Visitor obviously has its place: it’s irreplaceable when traversing more complicated structures in more bureaucratic languages. A simple list of numbers in Python is the direct opposite of both of these characteristics.

Complex data structures exist in any language, however. How would we run some Python code for every node in a tree, or maybe graph? Unrolling the DFS or BFS or whatever traversal algorithm we use certainly doesn’t sound like an elegant and reusable approach.
But even then, there is still no need for functions and closures. We can easily get away with the simple for loop, if we just find a suitable iterable to loop over:

  1. def bst_count_parity(tree):
  2.     """Count the number of even and odd numbers in binary search tree."""
  3.     even_count = odd_count = 0
  4.     for node in bst_nodes(tree):
  5.         if node.value % 2 == 0:
  6.             even_count += 1
  7.         else:
  8.             odd_count += 1
  9.     return even_count, odd_count

The bst_nodes function above is not black magic by any stretch. It’s just a simple example of generator function, taking advantage of the powerful yield statement:

  1. def bst_nodes(tree):
  2.     """Yields nodes of binary tree in breadth-first order."""
  3.     queue = [tree]
  4.     while queue:
  5.         node = queue.popleft()
  6.         yield node
  7.         if node.left:
  8.             queue.append(node.left)
  9.         if node.right:
  10.             queue.append(node.right)

This works because both bst_count_parity and bst_nodes functions are executed “simultaneously”. That has the same practical effect as calling the visitor function to process a node, only the “function” is concealed as the body of for loop.

Language geeks (and Lisp fans) would say that we’ve exchanged a closure for continuation. There is probably a monad here somewhere, too.

Create reference where there is none

Generators can of course solve a lot of problems that we may want to address with nonlocal, but it’s true you cannot write them all off just by clever use of yield statement. For the those rare occasions – when you really, positively, truly need a mutable closure – there are still some options on the board.

The crucial observation is that while the closure in Python 2.x is indeed immutable – you cannot add new variables to it – the objects inside need not be. If you are normally able to change their state, you can do so through captured variables as well. After all, you are still just “reading” those variables; they do not change, even if the objects they point to do.

Hence the solution (or workaround, more accurately) is simple. You need to wrap your value inside a mutable object, and access it – both outside and inside the inner function – through that object only. There are few choices of suitable objects to use here, with lists and dictionaries being the simplest, built-in options:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     res = []
  6.  
  7.     def txn(pipe):
  8.         res[0] = int(pipe.get(key)) + 1
  9.         pipe.multi()
  10.         pipe.set(key, res[0])
  11.  
  12.     redis.transaction(txn, key)
  13.     return res[0]

If you become fond of this technique, you may want to be more explicit and roll out your own wrapper. It might be something like a Var class with get and set methods, or just a value attribute.

Classy solution

Finally, there is a variant of the above approach that involves a class rather than function. It is strangely similar to “functor” objects from the old C++, back when it didn’t support proper lambdas and closures. Here it is:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     class IncrTransaction(object):
  6.         def __call__(self, pipe):
  7.             self.result = int(pipe.get(key)) + 1
  8.             pipe.multi()
  9.             pipe.set(key, self.result)
  10.  
  11.     txn = IncrTransaction()
  12.     redis.transaction(txn, key)
  13.     return txn.result

Its main advantage (besides making it a bit clearer what’s going on) is the potential for extracting the class outside of the function – and thus reusing it. In the above example, you would just need to add the __init__(self, key) method to make the class independent from the enclosing function.

Ironically, though, that would also defeat the whole point: you don’t need a mutable closure if you don’t need a closure at all. Problem solved? ;-)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Simulating nonlocal Keyword in Python 2.x

Python is the Worst Part of C++

2013-07-15 22:15

I’ve had a peculiar kind of awful realization after listening to a C++ talk earlier today. The speaker (Bjarne Stroustrup, actually) went over a few defining components of the language, before he took a much longer stop at templates. In C++14, templates are back in the spotlight because of the new constraints feature, intended to make working with (and especially debugging) templates much more pleasant.

Everyone using C++ templates now is probably well accustomed to the cryptic and excessively long error messages that the compiler spits out whenever you make even the slightest of mistakes. Because of the duck typing semantics of template arguments, those messages are always exposing the internals of a particular template’s implementation. If you, for example, worked with STL implementation from Visual C++, you would recognize the internal __rb_tree symbol; it appeared very often if you misused the map or set containers. Its appearance was at best only remotely helpful at locating the issue – especially if it was inside a multi-line behemoth of an error message.

Before constraints (or “concepts lite”, as they are dubbed) improve the situation, this is arguably the worst part of the C++ language. But alas, C++ is not the only language offering such a poor user experience. As a matter of fact, there is a whole class of languages which are exactly like that – and they won’t change anytime soon.

Yes, I’m talking about the so-called scripting languages in general, and Python in particular. The analogies are striking, too, once you see past the superfluous differences.

Take the before mentioned duck typing as an example. In Python, it is one of the core semantical tenets, a cornerstone of language’s approach to polymorphism. In current C++, this is precisely the cause of page-long, undecipherable compiler errors. You just don’t know whether it’s a duck before you tell it to quack, which usually happens somewhere deep inside the template code.

But wait! Python et al. also have those “compiler errors”. We just call them stacktraces and have interpreters format them in much a nicer, more readable way.
Of course unlike template-related errors in C++, stacktraces tend to be actually helpful. I pose, however, that it’s mostly because we learned to expect them. Studying Python or any other scripting language, we’re inevitably exposed to them at the very early stage, with a steady learning curve that corresponds to the growing complexity of our code.
This is totally different than having the compiler literally throw its innards at you when you try to sort a list of integers.

What I find the most interesting in this whole intellectual exercise is to examine what solutions are offered by both sides of the comparison.
Dynamic languages propose coping mechanisms, at best. You are advised to liberally blanket your code with automated tests so that failing to quack is immediately registered before the duck (er, code) goes live. While some rudimentary static analysis and linting is typically provided, you generally cannot have a reasonable idea whether your code doesn’t fail at the most basic level before you actually run it.

Now, have you ever unit-tested the template specification process that the C++ compiler performs for any of your own templates? Yeah, I thought so. Except for the biggest marvels of template metaprogramming, this may not be something that even crosses your mind. Instead, the established industry practice is simply “don’t do template metaprogramming”.

But obviously, we want to use dynamic languages, and some of us probably want to do template metaprogramming. (Maybe? Just a few? Anyone?…) Since they clearly appear to be similar problems, it’s not very surprising that remedies start to look somewhat alike. C++ is getting concepts in order to impose some rigidity on the currently free-form template arguments. Python is not fully on that path yet but the signs are evident, with the recent adoptions of enums (that I’ve fretted about) as the most prominent example.

If I’m right here, it would be curious to see what lies at the end of this road. In any case, it will probably have been already invented fifty years ago in Lisp.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on Python is the Worst Part of C++

Testing Jinja Templates

2013-07-06 21:43

If you use a powerful HTML templating engine – like Jinja – inevitably you will notice a slow creep of more and more complicated logic entering your templates. Contrary to what many may tell you, it’s not inherently bad. Views can be complex, and keeping that complexity contained within templates is often better than letting it sip into controller code.

But logic, if not trivial, requires testing. Exempting it by saying “That’s just a template!” doesn’t really cut it. It’s pretty crappy excuse, at least in Flask/Jinja, where you can easily import your template macros into Python code:

  1. {% macro hello_world() %}
  2.     Hello world!
  3. {% endmacro %}
  1. import flask
  2. hello_world = flask.get_template_attribute('hello.html', 'hello_world')
  3. print hello_world()

When writing a fully featured test suite, though, you would probably want some more leverage Importing those macros by hand in every test can get stale rather quickly and leave behind a lot of boilerplate code.

Fortunately, this is Python. We have world class tools to combat repetition and verbosity, second only to Lisp macros. There is no reason we couldn’t write tests for our Jinja templates in clean and concise manner:

  1. class Hello(JinjaTestCase):
  2.     __template_imports__ = {
  3.         'hello_world': 'hello.html',
  4.     }
  5.  
  6.     def test_hello_world(self):
  7.         result = self.hello_world()
  8.         self.assert_in("hello", result.lower())
  9.         self.assert_in("world", result.lower())

The JinjaTestCase base, implemented in this gist, provides evidence that a little __metaclass__ can go a long way :)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Testing Jinja Templates

Fluent Chaining

2013-06-30 14:24

Look at the following piece of jQuery code:

  1. $('span')
  2.     .attr('data-type', 'info')
  3.     .addClass('message')
  4.     .css('width', '100%')
  5.     .text("Hello world!")
  6.     .appendTo($('body'))
  7. ;

Of the two patterns it demonstrates, one is almost decisively bad: you shouldn’t build up DOM nodes this way. To get more concise and maintainable code, it’s better to use one of the client-side templating engines.

The second pattern, however, is hugely interesting. Most often called method chaining, it also goes by a more glamorous name of fluent interface. As you can see by a careful look at the code sample above, the idea is pretty simple:

Whenever a method is mostly mutating object’s state, it should return the object itself.

Prime example of methods that do that are setters: simple function whose pretty much only purpose is to alter the value of some property stored as a field inside the object. When augmented with support for chaining, they start to work very pleasantly with few other common patterns, such as builders in Java.
Here’s, for example, a piece of code constructing a Protocol Buffer message that doesn’t use its Builder‘s fluent interface:

  1. Person.Builder builder = Person.newBuilder();
  2. builder.setId(42);
  3. builder.setFirstName("John");
  4. builder.setLastName("Doe");
  5. builder.setEmail("johndoe@example.com")
  6. Person person = builder.build();

And here’s the equivalent that takes advantage of method chaining:

  1. Person person = Person.newBuilder()
  2.     .setId(42)
  3.     .setFirstName("John")
  4.     .setLastName("Doe")
  5.     .setEmail("johndoe@example.com")
  6.     .build();

It may not be shorter by pure line count, but it’s definitely easier on the eyes without all these repetitions of (completely unnecessary) builder variable. We could even say that the whole Builder pattern is almost completely hidden thanks to method chaining. And undoubtedly, this a very good thing, as that pattern is just a compensation for the deficiencies of Java programming language.

By now you’ve probably figured out how to implement method chaining. In derivatives of C language, that amounts to having a return this; statement at the end of method’s body:

  1. jQuery.fn.extend({
  2.     addClass: function( value ) {
  3.         // ... lots of jQuery code ...
  4.         return this;
  5.     },
  6.     // ... other methods ...
  7. });

and possibly changing the return type from void to the class itself, a pointer to it, or reference:

  1. public Builder setFirstName(String value) {
  2.     firstName_ = value;
  3.     return this;
  4. }

It’s true that it may slightly obscure the implementation of fluent class for people unfamiliar with the pattern. But this cost comes with a great benefit of making the usage clearer – which is almost always much more important.

Plus, if you are lucky to program in Python instead, you may just roll out a decorator ;-)

Tags: , , , , , ,
Author: Xion, posted under Programming » Comments Off on Fluent Chaining
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.