Monthly archive for May, 2013

Deprecate This

2013-05-26 14:02

As a fact of life, in bigger projects you often cannot just delete something – be it function, method, class or module. Replacing all its usages with whatever is the new recommendation – if any! – is typically outside of your influence, capabilities or priorities. By no means it should be treated as lost cause, though; any codebase would be quickly overwhelmed by kludges if there were no way to jettison them.

To reconcile those two opposing needs – compatibility and cleanliness – the typical approach involves a transition period. During that time, the particular piece of API shall be marked as deprecated, which is a slightly theatrical term for ‘obsolete’ and ‘not intended for new code’. How effective this is depends strongly on target audience – for publicly available APIs, someone will always wake up and start screaming when the transition period ends.

For in-project interfaces, however, the blow may be effectively cushioned by using certain features of the language, IDE, source control, continuous integration, and so on. As an example, Java has the @Deprecated annotation that can be applied to functions or classes:

  1. public class Foo {
  2.     /**
  3.      * @deprecated Use FooFactory instead
  4.      */
  5.     @Deprecated
  6.     public static Foo create() {
  7.         return new Foo();
  8.     }
  9. }

If the symbol is then used somewhere else, it produces a compiler warning (and visual cue in most IDEs). These can be suppressed, of course, but it’s something you need to do explicitly through a complementary language construct.

So I had this idea to try and add similar mechanism to Python. One part of it is already present in its standard library: we have the warnings module and a built-in category of DeprecationWarnings. These can be ignored, suppressed, caught or even made into errors.
They are also pretty powerful, as they allow to deprecate certain code paths and not just symbols, which can be useful when introducing new meanings for function parameters, among other things. At the same time, it means using them is irritatingly imperative and adds clutter:

  1. class Foo(object):
  2.     def __init__(self):
  3.         warnings.warn("Foo is deprecated", DeprecationWarning)
  4.         # ... rest of Foo constructor ...

And in this particular case, it also doesn’t work as intended, for reasons that will become apparent later on.
What we’d like instead is something similar to annotation approach that is available in Java:

  1. @deprecated
  2. class Foo(object):
  3.     # ...

Given that the @-things in Python (decorators, that is) are significantly more powerful than the Java counterparts, it shouldn’t be a tough call to achieve this…

Surprisingly, though, it turns out to be very tricky and quite arcane. The problems lie mostly in the subtle issues of what exactly constitutes “usage” of a symbol in Python, and how to actually detect it. If you try to come up with a few solutions, you’ll soon realize how the one that may eventually require walking through the interpreter call stack turns out to be the least insane one.

But hey, we didn’t go to the Moon because it was easy, right? ;) So let’s see how at least we can get started.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on Deprecate This

The Subshell Gotcha

2013-05-20 13:13

Many are the quirks of shell scripting. Most are related to confusing syntax, but some come from certain surprising semantics of Bash as a language, as well as the way scripts are executed.
Consider, for example, that you’d like to list files that are within certain size range. This is something you cannot do with ls alone. And while there’s certainly some awk incantation that makes it trivial, let’s assume you’re a rare kind of scripter who actually likes their hacks readable:

  1. #!/bin/sh
  2.  
  3. min=$1
  4. max=$2
  5.  
  6. ls | while read filename; do
  7.     size=$(stat -f %z $filename)
  8.     if [ $size -gt $min ] && [ $size -lt $max ]; then
  9.         echo $filename
  10.     fi
  11. done

So you use an explicit while loop, obtain the file size using stat and compare it to given bounds using a straightforward if statement. Pretty simple code that shouldn’t cause any troubles later on… right?

But as your needs grow, you find that you also want to count how many files fall within your range, and how many do not. Given that you have an explicit if, it appears like a simple addition (in quite literal sense):

  1. matches=0
  2. misses=0
  3. ls | while read filename; do
  4.     size=$(stat -f %z $filename)
  5.     if [ $size -gt $min ] && [ $size -lt $max ]; then
  6.         echo $filename
  7.         ((matches++))
  8.     else
  9.         ((misses++))
  10.     fi
  11. done
  12.  
  13. echo >&2 "$matches matches"
  14. echo >&2 "$misses misses"

Why it doesn’t work, then? Because clearly this is not the output we’re looking for (ls_between is our script here):

  1. $ ls -al
  2. total 25296
  3. drwxrwxr-x  19 xion  staff       646 15 Apr 18:44 .
  4. drwxrwxr-x  15 xion  staff       510 20 May 11:15 ..
  5. -rw-rw-r--   1 xion  staff        16 10 May  2012 hello.py
  6. -rw-rw-r--   1 xion  staff      4005 28 May  2012 keyword_stats.py
  7. -rw-rw-r--   1 xion  staff       218  5 Aug  2012 magical.py
  8. -rw-rw-r--   1 xion  staff     19901 11 May  2012 space_invaders.py
  9. $ ls_between 1024 10241024
  10. keyword_stats.py
  11. space_invaders.py
  12. 0 matches
  13. 0 misses

It seems that neither matches nor misses are counted properly, even though it’s clear from the printed list that everything is fine with our if statement and loop. Wherein lies the problem?

Tags: , , , ,
Author: Xion, posted under Applications, Programming » 2 comments

We Don’t Need No Enums (in Python)

2013-05-13 21:48

Last weekend I attended the PyGrunn conference, back in the good ol’ Netherlands. It was a very enjoyable and instructive event, featuring not only a few local speakers but also some prominent figures from the Python community – like Kenneth Reitz or Armin Ronacher. Overall, it was a weekend of some great pythonic fun.

…Except for a one small detail. As I’ve learned there, Python will apparently get to have enums in the 3.4 version of the language. To say that this was baffling to me would be a severe understatement. Not only I see very little need for such a feature, but I also have serious doubts whether it fits into the general spirit of Python language.

Why so? It’s mostly because of the main purpose of enumeration types, derived from their usage in many languages that already have them as a feature. That purpose is to turn arbitrary data – mostly integers and strings – into well-known, reliable entities that can be safely manipulated inside our programs. Enums act as border guardians, filtering out unexpected data and converting expected data into its safe representation.

What’s safety in this context? It’s type safety, of course. Thanks to enumeration types, we can be certain that a particular value belongs to specific and constrained set of elements. They clearly define all the variants that our code should handle, because everything else was already culled at the conversion stage.

Problem is, in Python there is nothing that would guarantee those safety promises are actually fulfilled. True, the basic property of enum types is retained: given an enum object, we know it must belong to a preordained set of entities. But there is nothing that ensures we are dealing with an enum object at all – short of us actually checking that ourselves:

  1. if isinstance(state, State):

How this is different from a straightforward membership check:

  1. if state in STATES:

except for the latter looking cleaner and more explicit?…

This saying, I do not claim enums are completely out of place in Python. This is untrue, if simply because of the fact they are easily implemented through a rather simple metaclass. In fact, this is exactly how the proposed enums.Enum base is supposed to work.
At the same time, it is also possible to provide some of the before-mentioned type safety. Just look into various libraries that enhance Python with support for contracts: a form of type safety which is even more powerful than what you’ll find in many statically typed languages. You are free to use them, and you should definitely do, if your project would benefit from such a functionality.

Incidentally, they fit right in with that new & upcoming enumeration types from Python 3.4. It remains to be seen what it means exactly for the overall direction of the language design. But with enums in place, the style of “checked” typing suddenly became a lot more pythonic than it was before.

And I can’t say I like it very much.

Tags: , , ,
Author: Xion, posted under Events, Programming » 3 comments

One Thing that Git Does Better: Rebranching

2013-05-04 0:06

Much can be said about similarities between two popular, distributed version control systems: Git and Mercurial. For the most part, choosing between them can be a matter of taste. And because Git seems to have considerably more proponents in the Cool Kids camp, it doesn’t necessarily follow it’s a better option.

But I have found at least one specific and common scenario where Git clearly outshines Hg. Suppose you have coded yourself into a dead end: the feature you planned doesn’t pan out the way you wanted it; or you have some compatibility issues you cannot easily resolve; or you just need to escape the refactoring bathtub.

Rebranching in Git (start)

In any case, you just want to step back a few commits and pretend nothing happened, for now. The mishap might be useful later, though, so it’d be nice if we left it marked for the future.

In Git, this is easily done. You would start by creating a new branch that points to your dead end:

  1. $ git checkout my_feature
  2. $ git branch __my_feature__dead_end__

Afterwards, both my_feature and __my_feature__dead_end__ refer to the same, head commit. We would then move the former a little back, sticking it to one of the earlier hashes. Let’s find a suitable target:

  1. $ git log
  2. commit 130fe41a3a0587a48c1ef8797030ae2e682c6fb4
  3. Author: John Doe <john.doe@example.com>
  4. Date:   Fri May 3 21:39:32 2013 +0200
  5.  
  6.     I don't like this adventure. Not one bit.
  7.  
  8. commit f834a10d199f5c23fa14ed0ebfcf89226d9c21a1
  9. Author: John Doe <john.doe@example.com>
  10. Date:   Fri May 3 21:25:46 2013 +0200
  11.  
  12.     Let's try this, maybe...
  13.  
  14. commit 9d877d32f3a08f416615f9a051ad3985e6e4d2ad
  15. Author: John Doe <john.doe@example.com>
  16. Date:   Fri May 3 21:18:57 2013 +0200
  17.  
  18.     Shiny!
  19. $ git checkout 9d877
  20. Note: checking out '9d877'.
  21. # trimmed for brevity

If it looks right, we can reset the my_feature branch so it points to this specific commit:

  1. $ git reset --hard 9d877

Our final situation would then looks like this:

Rebranching in Git (end)

which is exactly what we wanted. Note how any further commits starting from the referent of my_feature would fork at that point, diverging from development line which has lead us into dead end.

Why the same thing is not so easily done in Mercurial?… In general, this is mostly because of its one fundamental design decision: every commit belongs to one branch, forever and for always. Branch designation is actually part of the changeset’s metadata, just like the commit message or diff. Moving things around – like we did above – is therefore equivalent to changing history and requires tools that are capable of doing so, such as hg strip.

Tags: , , ,
Author: Xion, posted under Programming » 5 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.