Monthly archive for June, 2012

Dictionary with Missing Values

2012-06-29 22:36

When working with dictionaries in Python, or any equivalent data structure in some other language, it is quite important to remember the difference between a key which is not present and a key that maps to None (null) value. We often tend to blur the distinction by using dict.get:

  1. thingie = some_dict.get('key')
  2. if thingie:
  3.     do_stuff_with(thingie)

We do that because more often that not, None and other falsy values (such as empty strings) are not interesting on their own, so we may as well lump them together with the “no value at all” case.

There are some situations, however, where these variants shall be treated separately. One of them is building a dictionary of keyword arguments that are subsequently ‘unpacked’ through the **kwargs construct. Consider, for example, this code:

  1. import re
  2. from bs4 import BeautifulSoup
  3.  
  4. def find_images_by_text(html, text, in_alt=True, in_title=False):
  5.     """Search for <img> tags 'matching' given text in HTML document."""
  6.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  7.     attrs = {}
  8.     if in_alt:
  9.         attrs['alt'] = regex
  10.     if in_title:
  11.         attrs['title'] = regex
  12.     return BeautifulSoup(html).find_all('img', **attrs)

With a key mapping to None, we’re calling the function with argument explicitly set to None. Without the key present, we’re not passing the argument at all, allowing it to assume its default value.

But adding or not adding a key to dictionary is somewhat more cumbersome than mapping it to some value or None. The latter can be done with conditional expression (x if cond else None), together with many other keys and value at once. The former requires an if statement, as shown above.
Would it be convenient if we had a special “missing” value that could be used like None, but caused the key to not be added to dictionary at all? If we had it, we could (for example) rewrite parts of the previous function that currently contain if branches:

  1. attrs = {'alt': regex if in_alt else missing,
  2.          'title': regex if in_title else missing}

It shouldn’t be surprising that we could totally introduce such a value and extend dict to support this functionality – after all, it’s Python we’re talking about :) Patching the dict class itself is of course impossible, but we can inherit it and come up with something like the following piece:

  1. class MissingDict(dict):
  2.     def __init__(self, **kwargs):
  3.         kwargs = dict((k, v) for (k, v) in kwargs.iteritems()
  4.                       if v is not missing)
  5.         dict.__init__(self, **kwargs)
  6.  
  7. missing = object()

The magical missing object is only a marker here, used to filter out keys that we want to ignore.

With this class at hand, some dictionary manipulations become a bit shorter:

  1. def find_images_by_text(html, text, in_alt=True, in_title=False):
  2.     regex = re.compile(".*%s.*" % re.escape(text), re.IGNORE_CASE)
  3.     return BeautifulSoup(html).find_all('img', **MissingDict(
  4.         alt=regex if in_alt else missing,
  5.         title=regex if in_title else missing))

We could take this idea further and add support for missing not only initialization, but also other dictionary operations – most notably the __setitem__ assignments. This gist shows how it could be done.

Tags: , ,
Author: Xion, posted under Computer Science & IT » Comments Off on Dictionary with Missing Values

The Principle of Two Hacks

2012-06-18 20:19

At work I’ve got a colleague who displays unusual aptitude in coming up with amusing terminology for everyday (coding) things. Among those, the Cake Pattern is always certain to provoke few laughs.

Recently, though, I heard him mention a great common sense rule for any development decision, from the grand architecture down to single line of code. It can be phrased like this:

One hack is fine. But if you need another one on top of the first, it’s probably high time to really consider what you are doing.

For good measure, I’ll call it a Principle of Two Hacks, and I’m pretty convinced that it’s a very beneficial rule to apply in programming – especially when creating any non-trivial, not throw-away programs.

At first it may sound rather vague, however. It’s concept of a “hack” is not easily explicable, or put into indisputable definitions. But that’s what makes it powerful: we don’t need to invoke elaborate (and often controversial) notions of design patterns or abstraction to be able to discuss them.

At best, the ideas of accidental complexity or technical debt might be somewhat close to what developers typically deem as a hack. In practice, this is mostly an opaque intuition that stems from experience or skill, and it’s usually very hard to express in words. Yet, it’s always apparent whenever we encounter it, even though the exact sensation may vary considerably: from a dim feeling that something is maybe a bit off, up to severe intellectual nausea caused by looking at really bad code.

I also like how extremely pragmatic this principle is. Quick-and-dirty fixes making it into production code are just a fact of life, and we are not prohibited from letting them slip. What we are strongly advised here is to maintain integrity of the software we’re writing, trying not to stack one hack upon another.

But even that is not absolute, nonnegotiable gospel; there might still be valid reasons to loosen up its structure. The important part, however, is to notice when we’re doing something fishy and consciously decide whether or not it is a good idea. It is much better than just plunging forward with total disregard of sanity of future maintainers.

Ultimately, this principle is just subtly telling us to think, and that is never a bad advice.

Tags: , ,
Author: Xion, posted under Programming » 1 comment

Interesting Problem with MySQL & SQLAlchemy

2012-06-12 14:31

While at work a few days ago, I had an interesting albeit weird problem which started with the following cryptic error message:

ERROR 1005: can’t create table `qtn_formdisplay_product` (errno: 150)

It was produced by a local MySQL server running on my development machine when I tried to rebuild test database to accommodate for some model changes happening in the codebase. As you might have noticed, it’s not terribly informative, with the errno number as the only useful tidbit. A cursory glance at top search result for this message said that the most probable cause was a malformed FOREIGN KEY constraint inside the offending CREATE TABLE query.

Upon reading this, I blinked several times; something here was definitely off. The query wasn’t of course written by hand – if it was, we could at least consider an actual mistake to be a problem here. But no, it came from ORM – and not just an ORM, but the best ORM known to mankind. While obviously nothing is perfect, I would think it’s extremely unlikely that I found a serious bug in a widely used library just by doing something as innocent as creating a table with foreign key. I’m not that good, after all ;)

Well, except that it could totally be such a bug. The before mentioned search results also pointed to MySQL issue tracker where it was suggested that the error might happen after trying to create foreign key constraint with duplicate name. Supposedly, this could “corrupt” the parent table and no new FOREIGN KEYs could reference it anymore, yielding the errno 150 if we attempted to create one. While it could not explain the behavior I observed (the parent table was freshly created), it raised some doubts whether MySQL itself may be to blame here.

These were exacerbated when one of my colleagues tried out the same procedure, and it worked for him just fine. He turned out to use newer version of MySQL, though: 5.5 versus 5.1. This appeared to support the hypothesis about a possible bug in MySQL but it didn’t seem to help one bit to get the thing running on the older version.

However, it was an important clue that something relevant changed in between, that had an influence on the whole issue. It was not really any particular bugfix or new feature: it was a change of defaults.

Tags: , , , ,
Author: Xion, posted under Computer Science & IT » 1 comment

The Javascript Functions Which Are Not

2012-06-03 15:25

It would be quite far-fetched to call JavaScript a functional language, for it lacks many more sophisticated features from the FP paradigm – like tail recursion or automatic currying. This puts it on par with many similar languages which incorporate just enough of FP to make it useful but not as much as to blur their fundamental, imperative nature (and confuse programmers in the process). C++, Python or Ruby are a few examples, and on the surface JavaScript seems to place itself in the same region as well.

Except that it doesn’t. The numerous different purposes that JavaScript code uses functions makes it very distinct, even though the functions themselves are of very typical sort, found in almost all imperative languages. Learning to recognize those different roles and the real meaning of function keyword is essential to becoming an effective JS coder.

So, let’s look into them one by one and see what the function might really mean.

A scope

If you’ve seen few good JavaScript libraries, you have surely stumbled upon the following idiom:

  1. /* WonderfulThing.js
  2.  * A real-time, HTML5-enabled, MVC-compatible boilerplate
  3.  * for appifying robust prototypes... or something
  4.  */
  5.  
  6. (function() {
  7.     // actual code goes here
  8. })();

Any and all code is enclosed within an anonymous function. It’s not even stored in a variable; it’s just called immediately so its content is just executed, now.

This round-trip may easily be thought as if doing absolutely nothing but there is an important reason for keeping it that way. The point is that JavaScript has just one global object (window in case of web browsers) which is a fragile namespace, easily polluted by defining things directly at the script level.

We can prevent that by using “bracketing” technique presented above, and putting everything inside this big, anonymous function. It works because JavaScript has function scope and it’s the only type of non-global scope available to the programmer.

A module

So in the example above, the function is used to confine script’s code and all the symbols it defines. But sometimes we obviously want to let some things through, while restricting access to some others – a concept known as encapsulation and exposing an interface.

Perhaps unsurprisingly, in JavaScript this is also done with the help of a function:

  1. var counter = (function() {
  2.     var value = 0;
  3.     return {
  4.         increment: function(by) {
  5.             value += by || 1;
  6.         },
  7.         getValue: function() {
  8.             return value;
  9.         },
  10.     };
  11. })();

What we get here is normal JS object but it should be thought of more like a module. It offers some public interface in the form of increment and getValue functions. But underneath, it also has some internal data stored within a closure: the value variable. If you know few things about C or C++, you can easily see parallels with header files (.h, .hpp, …) which store declarations that are only implemented in the code files (.c, .cpp).

Or, alternatively, you may draw analogies to C# or Java with their public and private (or protected) members of a class. Incidentally, this leads us to another point…

Object factories (constructors)

Let’s assume that the counter object from the example above is practical enough to be useful in more than one place (a tall order, I know). The DRY principle of course prohibits blatant duplication of code such as this, so we’d like to make the piece more reusable.

Here’s how we typically tackle this problem – still using only vanilla functions:

  1. var createCounter = function(initial) {
  2.     var value = initial || 0;
  3.     return {
  4.         increment: function(by) {
  5.             value += by || 1;
  6.         },
  7.         getValue: function() {
  8.             return value;
  9.         },
  10.     };
  11. };
  12. var counter = createCounter();
  13. var counterFrom1000 = createCounter(1000);

Pretty straightforward, right? Instead of calling the function on a spot, we keep it around and use to create multiple objects. Hence the function becomes a constructor for them, while the whole mechanism is nothing else but a foundation for object-oriented programming.

\displaystyle functions + closures = OOP

We have now covered most (if not all) roles that functions play when it comes to structuring JavaScript code. What remains is to recognize how they interplay with each other to control the execution path of a program. Given the highly asynchronous nature of JavaScript (on both client and server side), it’s totally expected that we will see a lot of functions in any typical JS code.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on The Javascript Functions Which Are Not
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.