Inheritance as an API

2013-09-15 22:46

Saying “API” nowadays is thought first and foremost to refer to a collection of HTTP request handlers that expose data from some Web service in a machine-readable format (usually JSON). This is not the meaning of API I have in mind right here, though. The classic one talks about any conglomerate of programming language constructs – functions, classes, packages – that are available for us to use.

You can interact with an API in numerous different ways, but one of them is somewhat less common. Occasionally, besides having functions to call and objects to create, you are also presented with base classes to inherit. This is symptomatic to more complex libraries, written mostly in statically typed languages. Among them, the one that takes the most advantage of this technique is probably Java.

Note that what I’m talking about here is substantially different from implementing an interface. That is a relatively common occurrence, required when working with listeners and callbacks:

  1. private OnClickListener mHelloButtonClickListener = new OnClickListener() {
  2.     @Override
  3.     public void onClick(View v) {
  4.         Toast.makeText(getContext(), "Hello world!", Toast.LENGTH_SHORT).show();
  5.     }
  6. };

as well as in many other situations and patterns. There is nothing terribly special about it, given that even non-object oriented languages have equivalent mechanisms of accomplishing the same objective: separating the “how” from “what” in code, possibly to exchange or expand the latter.

Inheriting from a base class is something else entirely. The often criticized, ideologically skewed interpretation of OOP would claim that inheritance is meant to introduce more specialized kinds of existing types of objects. In practice, that’s almost completely missing the real point.

What’s important in creating a derived class is that:

  • like with interface, it needs to override certain methods and fill them with code
  • but also, it may do so using a unique, additional API

Combined, these two qualities allow to introduce much more sophisticated ways of communication between the API and its client code. It’s a powerful tool and should be used sparingly, but certain types of libraries and (especially) frameworks can benefit greatly from it.

As an example, look at the Guice framework for dependency injection. If you use it in your project, you need to configure it by specifying bindings: mapping between the interfaces used by your code and their implementations. This is necessary for the framework can wire them together, possibly in more then one way (different in production code and in test code, for example).
Bindings are quite sophisticated constructs. For non-trivial applications – and these are the ones you generally want to apply dependency injection to – they cannot really be pinned down to a single function call. Other, similar frameworks would therefore use completely external configuration files (usually in XML), which has a lot of downsides.

Guice, however, has a sufficiently smart API that allows to realize all configuration in actual, compilable code. To achieve this, it uses inheritance, exactly as described above. Here’s a short sample:

  1. public class BillingModule extends AbstractModule {
  2.     @Override
  3.     protected void configure() {
  4.       bind(TransactionLog.class).to(DatabaseTransactionLog.class);
  5.       bind(CreditCardProcessor.class).to(PaypalCreditCardProcessor.class);
  6.     }
  7. }

Overridden methods in the above class (well, one method) serve as “sections” in the “configuration file”. Inherited methods, on the other hand, provide an internal, limited namespace that can be used to compose configuration “entries”. Since everything is real code, we can have the compiler check everything for some basic sanity as well.

Tags: , , ,
Author: Xion, posted under Programming » 4 comments

Lists vs Tuples

2013-09-04 18:42

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

  1. files = get_file_list('/home/xion')
  2. if files[0] == '.':
  3.     files.pop(0)
  4. if files[0] == '..':
  5.     files.pop(0)
  6. print "Found ", len(files), " files."
  7. for file in files:
  8.     print "- ", file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

  1. a_list = [1, 2, 3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

  1. diverse_list = [42, "foo", False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

  1. a_tuple = (1, 2, 3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

  1. for x in a_tuple:
  2.    print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

  1. >>> a_tuple[0] = 42
  2. Traceback (most recent call last):
  3.   File "<stdin>", line 1, in <module>
  4. TypeError: 'tuple' object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

  1. diverse_tuple = (42, "foo", False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

  • objects retain their individual identity
  • their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.
  • objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

  1. typedef tuple<int, string> numbered_line;
  2.  
  3. // Example usage
  4. vector<numbered_line> read_numbered_lines(string filename, int start=1) {
  5.     vector<numbered_line> result;
  6.     ifstream infile(filename);
  7.     string line;
  8.     for (int i = start; getline(filename, line); ++i) {
  9.         result.push_back(make_tuple(i, line));
  10.     }
  11.     return result;
  12. }

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

  1. struct numbered_line {
  2.     string filename;
  3.     int number;
  4.     string text;
  5. };

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Lists vs Tuples

A Whole Lot of OSes at Once

2013-08-29 23:11

I realized I haven’t talked yet about my workspace setup, including the choice of computer(s) and operating system(s) which I prefer. A topic like that is a perfect opportunity to provide fuel for countless flamewars raging through the Internet forums, so it seems like a no-brainer if you want to round up some passionate readers ;)
So, what’s the best these days? Is it Windows or Linux? Or maybe OS X? Should you get a Mac or a(n other kind of) PC? Make it a desktop or just a laptop? Or perhaps subscribe to the post-PC doctrine and grab a tablet?…

Well, I don’t know. I won’t really advise anything here. Myself, I take more… holistic approach. I just use all of these things :)

Use ALL the systems!

Desktop split in two

As I see it, desktop PC is the place where stuff gets done. Be it “work” or play, you cannot easily forfeit the conveniences of big monitor(s), full keyboard layout, snappy mouse that fits your palm, and a comfortable chair.

Thing is, those two activities require two completely different operating systems. When it comes to games, Windows is still reigning supreme, with only some small glimpses of what might shake up this status quo. Sadly, I have little hope much will change here in the foreseeable future.
On the other hand, it’s pretty much given that the best system for most kinds of development work is some flavor of Linux, preferably sporting such powerful a package manager as apt-get. There are no (good) alternatives for certain vendor lock-ins (*cough* iOS) but if you are not forced to comply with them, anything else is almost certainly an inferior choice.

How to reconcile those requirements? There is dual booting, of course, but the need of frequent resets would get old very quickly. So instead of that, and after conducting a bit of a research, I opted for virtualization, choosing a simple solution of running Linux as guest OS through VirtualBox.
This turned out to be super easy to set up; the only “trick” was to flip a switch in BIOS in order to enable CPU virtualization features, required to run a 64-bit system as guest. With these on, performance is a non-issue: it feels like running directly on the hardware, for all intents and purposes – including running Android emulator inside the VM :)

Laptop as versatile substitute

Now, if only all that goodness could be made available on the go… Although some may pretend it is, most (sane) people cater to their mobility needs by getting a laptop. This is immediately a compromise because of the form factor alone, but even more so because of inevitably inferior hardware specs one could cram into it.

So no, I didn’t try to replicate the VM-based setup described above :) While possible, it seemed way more sensible to try and kill two birds with one stone by getting a computer that:

  • adequately (though not perfectly) substitutes Windows for gaming and Linux for coding
  • runs the third major operating system, in case that ever comes handy e.g. for testing purposes

As you’ve likely guessed, that description accurately fits a MacBook. Of its virtues and woes I have already written elsewhere, so let’s just say it fulfills its purpose pretty well.

And even when it doesn’t, there is always Chrome Remote Desktop to bridge the gap :)

Preview (re)Structured Text in a Browser

2013-08-24 22:33

Those cute little text formats are all the rage now, especially Markdown and reStructuredText. You can write them pretty easily, without lots of markup boilerplate that HTML entails, so they increasingly rule the READMEs of various open source projects. Both GitHub and Bitbucket support them perfectly well for this purpose.

They are obviously not WYSIWYG, though, so you may want to check the formatting first, before showing your prose to the rest of the world. There are some standard HTML generators for each of those text formats, but it would be more convenient to have one tool to rule them all…
And so there is Pandoc. That nifty utility is capable of converting from many input text formats into a vast variety of output formats, including HTML and PDF. It can be installed very easily on most systems:

  1. $ sudo apt-get install pandoc

and downloadable installers are available for OS X and Windows.

pandoc is a well behaved command line program, so it can accept both files and standard input, enabling it to be used in shell pipelines. Unfortunately, browser executables are not so cooperative – they generally want files, not just streams of data. We can circumvent that with the use of /tmp directory and a little bit of shell scripting:

  1. #!/bin/sh
  2.  
  3. # Preview structed text files in a browser
  4. # Usage: $ preview FILE [BROWSER]
  5.  
  6. FILE="$1"
  7. BROWSER="${2:-firefox}"
  8.  
  9. out=$(mktemp)
  10. pandoc "$FILE" >"$out"
  11. $BROWSER "$out"

This should work firefox and chrome, and likely with any other browser.

Next step? Maybe deploy a file system watcher, hook it up with some browser instrumentation, and make the page reload whenever a change in the source is detected. Non-trivial, but definitely doable :)

Hashbang Hacks: Parameters for Python

2013-08-18 14:00

This:

  1. #!/bin/sh

is an example of hashbang. It’s a very neat Unix concept: when placed at the beginning of a script, the line starting with # (hash) and ! (bang) indicates an interpreter that should be chosen when running the script as an executable. Often used for shells (#!/bin/bash, #!/bin/zsh…), it also works for many regular programming languages, like Ruby, Python or Perl. Some of them may not even use # as comment character but still allow for hashbangs, simply by ignoring such a first line. Funnily enough, this is just enough to fully “support” them, as the choice of interpreter is done at the system level.

Sadly, though, the only portable way to write a hashbang is to follow it with absolute path to an executable, which makes it problematic for pretty much anything other than /bin/*sh.
Take Python as an example. On many Linuxes it will be under /usr/bin/python, but that’s hardly a standard. What about /usr/local/bin/python? ~/bin/python?… Heck, one Python I use is under /usr/local/Cellar/python/2.7.3/bin – that’s installed by Homebrew on OS X, a perfectly valid Unix! And I haven’t even mentioned virtualenv

This madness is typically solved by a standard tool called env, located under /usr/bin on anything at least somewhat *nixy:

  1. #!/usr/bin/env python

env looks up the correct executable for its argument, relying on the PATH environmental variable (hence its name). Thanks to env, we can solve all of the problems signaled above, and any similar woes for many other languages. That’s because by the very definition, running Python file with the above hashbang is equivalent to passing it directly to the interpreter:

  1. $ cat >hello.py «EOF
  2. #!/usr/bin/env python
  3. print "Hello, world"
  4. EOF
  5. $ chmod a+x hello.py
  6. $ ./hello.py
  7. Hello, world
  8. $ python hello.py
  9. Hello, world

Now, what if you wanted to also include some flags in interpreter invocation? For python, for example, you can add -O to turn on some basic optimizations. The seemingly obvious solution is to include them in hashbang:

  1. #!/usr/bin/env python -O

Although this may very well work, it puts us again into “not really portable” land. Thankfully, there is a very ingenious (but, sadly, quite Python-specific) trick that lets us add arguments and be confident that our program will run pretty much anywhere.

Here’s how it looks like:

  1. #!/bin/sh
  2. """"exec python -O "$0" "$@";" """

Understandably, it may not be immediately obvious how does it work. Let’s dismantle the pieces one by one, so we can see how do they all fit together – down not just to every quotation sign, but also to every space.

Simulating nonlocal Keyword in Python 2.x

2013-08-04 15:30

The overall direction where Python 3 is going might be a bit worrying, but it’s undeniable that the 3.0 line has some really nice features and quality-of-life improvements. What’s not to love about Unicode string literals enabled by default? Or the print function? What about range, map and filter all being generators? Neato!

There are also few lesser known points. One is the new nonlocal keyword. It shares the syntax with the global keyword, which would make it instantaneously fishy just by this connotation. However, nonlocal looks genuinely useful: it allows to modify variables captured inside function’s closure:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.  
  4.     def examine(number):
  5.         if number % 2 == 0:
  6.             nonlocal even_count
  7.             even_count += 1
  8.         else:
  9.             nonlocal odd_count
  10.             odd_count += 1
  11.  
  12.     list(map(examine, numbers))
  13.     return even_count, odd_count

It’s something you would do quite a lot in certain other languages (*cough* JavaScript), so it’s good that Python has got around to support the notion as well. Syntax here is a bit clunky, true, but that’s simply a trade off, stemming directly from the lack of variable declarations in Python.

What about Python 2.x, though – are we out of luck? Well, not completely. There are a few ways to emulate nonlocal through other pythonic capabilities, sometimes even to better effect than the nonlocal keyword would yield.

Use generator instead

Speaking of yielding… As you have probably noticed right away, the example above is quite overblown and just plain silly. You don’t need to play functional just to count some values – you would use a loop instead:

  1. def count_parity(numbers):
  2.     even_count = odd_count = 0
  3.     for number in numbers:
  4.         if number % 2 == 0:
  5.             even_count += 1
  6.         else:
  7.             odd_count += 1
  8.      return even_count, odd_count

Really, the previous version is just a mindless application of the classic Visitor pattern, which is another reason why you shouldn’t do that: pattern overuse is bad. This saying, Visitor obviously has its place: it’s irreplaceable when traversing more complicated structures in more bureaucratic languages. A simple list of numbers in Python is the direct opposite of both of these characteristics.

Complex data structures exist in any language, however. How would we run some Python code for every node in a tree, or maybe graph? Unrolling the DFS or BFS or whatever traversal algorithm we use certainly doesn’t sound like an elegant and reusable approach.
But even then, there is still no need for functions and closures. We can easily get away with the simple for loop, if we just find a suitable iterable to loop over:

  1. def bst_count_parity(tree):
  2.     """Count the number of even and odd numbers in binary search tree."""
  3.     even_count = odd_count = 0
  4.     for node in bst_nodes(tree):
  5.         if node.value % 2 == 0:
  6.             even_count += 1
  7.         else:
  8.             odd_count += 1
  9.     return even_count, odd_count

The bst_nodes function above is not black magic by any stretch. It’s just a simple example of generator function, taking advantage of the powerful yield statement:

  1. def bst_nodes(tree):
  2.     """Yields nodes of binary tree in breadth-first order."""
  3.     queue = [tree]
  4.     while queue:
  5.         node = queue.popleft()
  6.         yield node
  7.         if node.left:
  8.             queue.append(node.left)
  9.         if node.right:
  10.             queue.append(node.right)

This works because both bst_count_parity and bst_nodes functions are executed “simultaneously”. That has the same practical effect as calling the visitor function to process a node, only the “function” is concealed as the body of for loop.

Language geeks (and Lisp fans) would say that we’ve exchanged a closure for continuation. There is probably a monad here somewhere, too.

Create reference where there is none

Generators can of course solve a lot of problems that we may want to address with nonlocal, but it’s true you cannot write them all off just by clever use of yield statement. For the those rare occasions – when you really, positively, truly need a mutable closure – there are still some options on the board.

The crucial observation is that while the closure in Python 2.x is indeed immutable – you cannot add new variables to it – the objects inside need not be. If you are normally able to change their state, you can do so through captured variables as well. After all, you are still just “reading” those variables; they do not change, even if the objects they point to do.

Hence the solution (or workaround, more accurately) is simple. You need to wrap your value inside a mutable object, and access it – both outside and inside the inner function – through that object only. There are few choices of suitable objects to use here, with lists and dictionaries being the simplest, built-in options:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     res = []
  6.  
  7.     def txn(pipe):
  8.         res[0] = int(pipe.get(key)) + 1
  9.         pipe.multi()
  10.         pipe.set(key, res[0])
  11.  
  12.     redis.transaction(txn, key)
  13.     return res[0]

If you become fond of this technique, you may want to be more explicit and roll out your own wrapper. It might be something like a Var class with get and set methods, or just a value attribute.

Classy solution

Finally, there is a variant of the above approach that involves a class rather than function. It is strangely similar to “functor” objects from the old C++, back when it didn’t support proper lambdas and closures. Here it is:

  1. def incr(redis, key):
  2.     """Increments value of Redis key, as if Redis didn't have INCR command.
  3.    :return: New value for the key
  4.    """
  5.     class IncrTransaction(object):
  6.         def __call__(self, pipe):
  7.             self.result = int(pipe.get(key)) + 1
  8.             pipe.multi()
  9.             pipe.set(key, self.result)
  10.  
  11.     txn = IncrTransaction()
  12.     redis.transaction(txn, key)
  13.     return txn.result

Its main advantage (besides making it a bit clearer what’s going on) is the potential for extracting the class outside of the function – and thus reusing it. In the above example, you would just need to add the __init__(self, key) method to make the class independent from the enclosing function.

Ironically, though, that would also defeat the whole point: you don’t need a mutable closure if you don’t need a closure at all. Problem solved? ;-)

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Simulating nonlocal Keyword in Python 2.x

Set the Returns Free

2013-07-24 19:02

Few pastimes are more pointless and unproductive than arguing about coding style. As long as the basic requirement of consistency is present, there is really next to nothing that wouldn’t fly. You may hear stories of certain… unusual approaches, like using 3-space indentation to “offend all equally”, but it still doesn’t give you the license to criticize – or worse yet, disobey. If you chose to contribute to a particular codebase, you follow the standard, period.

With this out of the way, I will go ahead to pick on my personal pet peeve by pointing out that the rule of single return from function is just dumb!… To my excuse, however, it is not really a stylistic rule, which is also one of the reasons I consider it inane. Since it changes the algorithmic structure of a function, it’s actually about semantics and should be treated as architectural guideline.

Even then, what’s wrong with avoiding more than one return statement? A couple of things come to mind.

It’s unnecessarily verbose

There is a particular “idiom” that you may encounter from time to time, especially in beginner’s code, or just code written by someone who had a particularly bad day. Here’s how it goes:

  1. bool isFooed() {
  2.     if (some != condition || thatMay(involve(foo))) {
  3.         return true;
  4.     } else {
  5.         return false;
  6.     }
  7. }

While it looks completely silly, its only felony is carrying the result through one more place than it’s absolutely necessary. But we would never write such a thing with clear mind. We would point it out immediately in code review. We would fix it whenever encountered.

And yet, it’s essentially the same pattern as in the code below:

  1. bool performSomeObscureAction() {
  2.     bool success;
  3.     if (startsAlignProperly()) {
  4.         // .. do stuff, followed by more stuff ...
  5.         success = true;
  6.     } else {
  7.         // ... report error, maybe do some cleanup ...
  8.         success = false;
  9.     }
  10.     return success;
  11. }

It has just one redundant layer of complexity: the success variable. We could remove it and replace the assignments to it by direct return true; and return false;. But that, of course, would violate the rule of single return point, and therefore code meant to follow it is stuck with flaws equivalent to having no idea how basic control statements and boolean logic works.

It reeks of Pascal stench

If we are happily condemning the use of more than one return statement, we could as well stay completely true to our beliefs. Why not prohibit all of such statements, and in fact use a language that just plain doesn’t support them?…

  1. function PerformAction : Boolean;
  2. begin
  3.     if canDo; then
  4.         { ... do stuff ... }
  5.         PerformAction := True;
  6.     else
  7.         PerformAction := False;
  8. end;

I’m totally baffled by any claims purporting that this technique makes code more readable. Not only the return value is now semi-hidden, disguised as one of the ordinary variable assignments, but also the return point itself is completely lost. An execution path may end inside an if inside while inside a switch, somewhere deep in the function innards, and you wouldn’t even notice without tracing all the closing braces end;s, or following the changes in indentation levels.
I’m all for disincentivizing long functions but really, there are much better ways to do that.

It gives the wrong impression

Although by far the most braindead, the rule of single return is unfortunately not the only one of its kind. There is a whole class of misguided principles that apply to certain flow control instructions: return, break and continue. What they share in common is the great potential of obfuscating the real meaning of some parts of the algorithms, simply because of forcing the code to be structured in a specific, rigid way.

Why it’s bad? Because one size does not fit all. Abusing early returns is hardly commendable:

  1. @Override
  2. public void onReceive(Context context, Intent intent) {
  3.     if (intent.getBooleanExtra(BatteryManager.EXTRA_PLUGGED, false)) {
  4.         resumeBackgroundTasks();
  5.         return;
  6.     }
  7.     stopBackgroundTasks();
  8.     // XXX: why "charger plugged" is more special state than "unplugged"?...
  9. }

but avoiding them at all costs won’t make the code better by any stretch:

  1. void read_config_file(const std::string& filename) {
  2.     std::ifstream ifs(filename):
  3.     if (ifs) {
  4.         // read the file ...
  5.         // ... parse the file ...
  6.         // ... remember config values ...
  7.         // ... and probably some other things too!
  8.     } else {  // (200 lines later)
  9.         log("cannot read config file: %s", filename);
  10.     }
  11. }

Since every function is different, trying to mold them into some over-specified template will just end awry.

Tags: ,
Author: Xion, posted under Computer Science & IT » Comments Off on Set the Returns Free
 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.