Archive for Programming

License It, Please

2013-01-31 6:45

In a blog relayed to my news reader today via Slashdot, I found this bit about providing licenses to the open source code you publish. Or, more specifically, about not providing them:

If some ‘no license’ sharing is a quiet rejection of the permission culture, the lawyer’s solution (make everyone use a license, for their own good!) starts to look bad. This is because once an author has used a standard license, their immediate interests are protected – but the political content of not choosing a license is lost. [emphasis mine]

I admire how the author goes all post-modernistic by bringing up fuzzy terms like “permission culture”. It’s a serious skill, to muddy such a clear-cut issue by making so many onerous assumptions per square inch of text. The alleged existence of some “political content” involved in not choosing any license – as opposed to, say, negligence or lack of knowledge – is easily my favorite here.

On more serious note: what a load of manure. I won’t even grace the premise with speculation on how likely it is to have anything to do with reality – that is, how big a percentage of the ‘no license’ projects is made so by the conscious choice of their authors. No, I will be insanely generous and assume that it actually holds water. Doesn’t matter; the claim that this practice should be encouraged and that something valuable is lost if software project has a license is still sheer lunacy.

If you don’t explicitly renounce some rights to your code – by providing a license, as the most common way – they are all reserved to you. Regardless of what political or cultural weight you may want to associate with this fact, the practical one is implied by law. And it’s very simple: no one can safely do anything with that code of yours, for there is always a risk you will exercise your rights through prosecution.

Of course I know you wouldn’t do anything so clearly evil. But that’s no guarantee for many parties that treat the issues of copyright and liability very seriously: from solo freelancers to the biggest of companies, and from lone hackers to the most influential software foundations. If you care about your code being widely used and solving problems for as many people as possible, this is also something you should pay attention to.
Otherwise it may happen that for someone, your work is just the perfect piece of puzzle – but they cannot use it safely. They might ask you to fix your oversight, of course, but they might also go somewhere else.

And that is typically the “immediate interest” that you protect by licensing: letting others actually use your code. If you ask me, that sounds like something totally worthy of protection.

Tags: ,
Author: Xion, posted under Internet, Programming » 3 comments

Go is Like Better C, Mostly

2013-01-09 22:55

The Go programming language is was on my (long) list of things to look into for quite some time now. Recently, at last, I had the opportunity to go through the most part of a comprehensive tour of Go from the official website, as well as write few bits of some Go code by myself.

Go-pherToday I’d like to recap on some of my impressions. You can treat it as “unboxing” of the Go language, much like when people post movies of their first hands-on experiences with new devices. Except, it will be just text – I’m not cool enough to do videos yet ;)

Some trivia

We all like to put stuff into our various mental buckets, so let’s do that with Go too.

Go is a compiled, statically typed programming language that runs directly on the hardware, without any underlying virtual machine or other bytecode-based runtime. That sounds good from the speed viewpoint and indeed, Go comes close to C in raw performance of equivalent programs.

Syntax of Go is C-like, at least in the fact that it’s using curly braces to delimit blocks of code. Some visual clutter is intentionally omitted, though. Semicolons are optional, for example, and idiomatic Go code omits them at all times.
But more surprisingly, parentheses around if and for conditions are straight out forbidden. As a result, it’s mandatory to use curly braces even for blocks that span just one line:

  1. if obj == nil {
  2.     return
  3. }

If you’re familiar with reasoning that suggests doing that in other C-like languages, you shouldn’t have much problems adapting to this requirement.

No-fuss static typing

Go is type-safe and requires all variables to be declared prior to use. For that it provides very nice sugar in the form of := operator, coupled with automatic type inference:

  1. s := "world"
  2. fmt.Printf("Hello %s!\n", s)

But of course, function arguments and return values have to be explicitly typed. Coming from C/C++/Java/etc. background, those type declarations might look weird at first, for they place the type after the name:

  1. func Greet(whom string) string {
  2.     return fmt.Sprintf("Hello, %s! How are you?", whom)
  3. }

As you can see, this also results in putting return type at the end of function declarations – something that e.g. C++ also started to permit.

But shorthand variable declarations are not the only way Go improves upon traditional idioms of static typing. Its interfaces are one of the better known features here. They essentially offer the support for duck typing (known from Python, among others) in a compiled language.
The trick is that objects do not specify which interfaces they implement: it’s just apparent by their methods. We can, however, state what interfaces we require for our parameters and variables, and those constraints will be enforced by the compiler. Essentially, this allows for accepting arbitrary values, as long as they “quack like a duck”, while retaining the overall type safety.

As an example, we can have a function that accepts a very general io.Writer:

  1. func SendGreetings(w io.Writer, name string) {
  2.     fmt.Fprintf(w, "Hello, %s!", name)
  3. }

and use it with anything that looks like something you could write into: file objects, networked streams, gzipped HTTP responses, and so on. Those objects won’t have to declare or even know about io.Writer; it’s sufficient that they implement a proper Write method.

Pointers on steroids

Talking about objects and interfaces sounds a bit abstract, but we shall not forget that Go is not a very high level language. You still have pointers here like in C, with the distinction between passing an object by address and copying it by value like in C++. Those two things are greatly simplified and made less error prone, however.

First, you don’t need to remember all the time whether you interact with object directly or through a pointer. There’s no -> (“arrow”) operator in Go, so you just use dot (.) for either. This makes it much easier to change the type of variable (add or remove *) if there’s need.

Second, most common uses for pointers from C (especially pointer arithmetic) are handled by dedicated language mechanism. Strings, for example, are distinct type with syntactic support and not just arrays of chars, neither a standard library class like in C++. Arrays (called slices) are also well supported, including automatic reallocation based on capacity, with the option of reserving the exact amount of memory beforehand.

Finally, the common problems with pointer aliasing don’t really exist in Go. Constraints on pointer arithmetic (i.e. prohibiting it outright) mean that compiler is able to track how each and every object may be used throughout the program. As a side effect, it can also prevent some segmentation faults, caused by things like local pointers going out of scope:

  1. func Leak() *int {
  2.     i := 42
  3.     return &i
  4. }

The i variable here (or more likely: the whole stack frame) will have been preserved on heap when function ends, so the pointer does not become immediately invalid.

Packages!

If you ever coded a bit in some of the newer languages, then coming to C or C++ you will definitely notice (and complain about) one thing: lack of proper package management. This is an indirect result of the header/implementation division and the reliance on #include‘ing header files as means of specifying dependencies. Actually, #includes are not even that: they work only for compiler and not linker, and are in some sense abused when working with precompiled headers.

What about Go?… Turns out it does the right thing. There are no separate header and implementation units, only modules (.go files). Unless you are using GCC frontend or interfacing with C code, the compiler itself is also unified.

But most importantly, there are packages and normal import statements. You can have qualified and unqualified imports, and you can alias things you’re importing into different names. Packages themselves are based on directory structure rooted in $GOROOT, much like e.g. Python ones are stored under $PYTHONPATH.

The only thing you can want at this point is the equivalent of virtualenv. Note that it’s not as critical as in interpreted languages: standalone compiled binaries do not have dependency problems, after all. But it’s still a nice thing to have for development. So far, people seem to be using their own solutions here.

Tags: , , , , , ,
Author: Xion, posted under Programming » Comments Off on Go is Like Better C, Mostly

At Least Python Got Equality Right

2013-01-03 23:13

I’m still flabbergasted after going through the analysis of PHP == operator, posted by my infosec friend Gynvael Coldwind. Up until recently, I knew two things about PHP: (1) it tries to be weakly typed and (2) it is not a stellar example of language design. Now I can confidently assert a third one…

It’s completely insane.

However, pondering the specific case of equality checks, I realized it’s not actually uncommon for programming languages to confuse the heck out of developers with their single, double or even triple “equals”. Among the popular ones, it seems to be a rule rather than exception.

Just consider that:

  • JavaScript has both == and ===, exactly like PHP does. And the former is just slightly less crazy than its PHP counterpart. For both languages, it just seems like a weak typing failure.
  • In C and C++, you may easily use = (assignment) in lieu of == (equality), because the former is perfectly allowed inside conditions for if, while or for statements.
  • Java is famously counterintuitive when it comes to comparing strings, requiring to use String.equals method rather than == (like in case of other fundamental data types). Many, many programmers have been bitten by that. (The fact that under certain conditions you can compare strings char-by-char with == doesn’t exactly help either).
  • C# complicates stuff even more by allowing to override Equals and overload == operator. It also introduces ReferenceEquals which usually works like ==, except when the latter is overloaded. Oh, and it also has two different kinds of types (value and reference types) which by default compare in two different ways… Joy!

The list could likely go on and include most of the mainstream languages but one of them would be curiously absent: Python.

You see, Python got the == operator right:

  • It tests for equality only, not identity (also known as “reference equality”). For that there is a separate is operator.
  • All basic objects – not only strings, but also lists or dictionaries (hash tables) – compare by value. Hence e.g. two lists are equal if they contain equal elements in the same order, whether or not they are the same objects.
  • Implicit conversions are applied judiciously. Different types of numbers (int, long, float) compare to each other just fine, but there is clear distinction between 42 (number) and "42" (string).
  • You can overload == but there are no magical tricks that instantly turn your class into wannabe fundamental type (like in C#). If you really want value semantics, you need to write that yourself.

In retrospect, all of this looks like basic sanity. Getting it right two decades ago, however… That’s work of genius, streak of luck – or likely both.

Tags: , , , , ,
Author: Xion, posted under Programming » 1 comment

Alternative @property Syntax

2012-12-19 22:04

As you probably know very well, in Python you can add properties to your classes. They behave like instance fields syntactically, but under the hood they call accessor functions whenever you want to get or set the property value:

  1. import os
  2.  
  3. class Directory(object):
  4.     """Simple class representing a directory in the file system."""
  5.     def __init__(self, path):
  6.         self.path = path
  7.  
  8.     @property
  9.     def parent(self):
  10.         """Parent directory."""
  11.         return Directory(os.path.join(self.path, os.pardir))
  12.  
  13. # usage: no () after .parent
  14. home_dir = Directory('/home/xion')
  15. root_dir = home_dir.parent.parent

Often – like in the example above – properties are read-only, providing only the getter method. It’s very easy to define them, too: just stick a @property decorator above method definition and you’re good to go.

iGet, iSet

Occasionally though, you will want to define a read-write property. (Or read-delete, but those are very rare). One function won’t cut it, since you need a setter in addition to getter. The canonical way Python docs recommend in such a case (at least since 2.6) is to use the @property.setter decorator:

  1. class TracedObject(object):
  2.     """Object that tracks changes to its properties."""
  3.     def __init__(self):
  4.         self.changed = False
  5.  
  6.     @property
  7.     def x(self):
  8.         return self._x
  9.  
  10.     @x.setter
  11.     def x(self, value):
  12.         self._x = value
  13.         self.changed = True

Besides that I find it ugly to split a single property between two methods, this approach will annoy many static code analyzers (including PEP8 checker) due to redefinition of x. Warnings like that are very useful in general, so we certainly don’t want to turn them off completely just to define a property or two.

So if our analyzer doesn’t support line-based warning suppression (like, again, pep8), we may want to look for a different solution.

Tags: , ,
Author: Xion, posted under Programming » 2 comments

Command Line Parsing in Python: Tips & Tricks

2012-12-13 21:49

Reading program’s command line and doing something with the arguments is the main purpose of most small (or bigger) utilities. Those are often written in Python – because of how easy and fast this is – so there should be a way to parse the command line in Python, too.
And in fact there are quite a few of them, all from the standard library. But the argparse module is most likely the best of them all, equally for its flexibility and power, as well as the sole fact of not being deprecated yet ;-)

For that matter, I have already used it several times, not only in Python. Today I want to present a summary of few useful techniques and solutions that I learned along the way, mostly by braving the not-so-friendly documentation of argparse. Given I’m not likely to do unusual stuff here, they should also address quite common, albeit less trivial use cases.

Boolean flags

Following the convention of every operating system imaginable, argparse has positional arguments and flags. Flags are denoted by one or two dashes preceding the name or its one-letter abbreviation:

  1. $ git commit -m "Fix stuff"
  2. $ hg bisect --bad 42
  3. $ ln -s ~/node_modules/foobar-0.0.1/bin/foobar ~/bin/foobar

Normally in argparse, flags take arguments that are later stored in the result object. This would be helpful for parsing something like the -m (message) flag in the git commit example above.
Not every flag needs to behave like that, though. In the last ln example, the -s does not take any arguments. Instead, it alters the program behavior by its mere presence: with it, ln creates a symbolic link instead of “hard” link. So in a sense, the flag is boolean. We would like to handle it as such.

In argparse, this is possible by setting the appropriate action= in the add_argument method:

  1. parser.add_argument("--symbolic", "-s", action='store_true', default=False)

Depending on what’s more logical for your program, you can reverse the logic to 'store_false' and default=True, of course.

Multiple positional arguments

If your program takes one entity as an argument and does something specific with it, users will often expect it to work with multiple entities too. You can observe it first hand with pip:

  1. $ pip install Flask
  2. $ pip install Flask WTForms SQLAlchemy celery pytz

or any version control application:

  1. $ git add README
  2. $ git add foo.h foo.c Makefile

There is no reason to ignore this expectation and it’s pretty easy to satisfy in argparse. Again, there is an action= for that:

  1. parser.add_argument("--foo", action='append')

and it’s sufficient for flags. Here the object returned by parse_args will get foo attribute with the list of arguments from all occurrences of --foo.

For positionals, it’s a little bit trickier because by default, they are meant to appear exactly once. This can be changed using nargs=:

  1. parser.add_argument("files", nargs='+')

The value of '+' is probably the most useful here, as it requires for the argument to be present at least once. Just like for flags, the result will be a list of all its occurrences, so you can iterate or map over it easily.

Optional positional arguments

Less typically, you may want to have a positional argument which can be supplied or not (an optional one). Although it is possible with the API outlined above, I wouldn’t recommend it: you will have to deal with unnecessary 0-or-1-element list and you won’t get proper error checking at the argparse level.

The correct solution involves nargs=, too, but with a dedicated '?' value:

  1. parser.add_argument("cache_dir", nargs='?', default='/tmp')

As you may guess, default= allows you to specify the value in parse_args result should the argument be omitted.

Testing

Once you set up your ArgumentParser, you will (hopefully) want to test it. Lucky for you, this can be done easily without every touching the actual command line. Simply pass your arguments (as a list) to parse_args and it will use it instead of sys.argv:

  1. >>> parser.parse_args(['-foo', 'bar'])
  2. Namespace(foo='bar')

With this you can easily write some nice unit tests for your parser – which you should do, obviously. What you should not do, however, is abusing this feature to call your program’s code from itself:

  1. def main(argv=sys.argv):
  2.     args = parse.parse_args(argv)
  3.     # ...
  4.  
  5. # later, somewhere deep inside...
  6. main(['other_function', 'one_argument', '--key', 'value'])

Just don’t.

Read more

There are, of course, many other interesting features and applications of argparse that you will find useful. I can especially recommend that you get to know about:

  • subparsers, a way to divide your complex tool into several internal commands (like git or pip)
  • argument groups for organizing your arguments into functional groups for better --help output, or for mutual exclusion (e.g. --verbose and --quiet option)
  • help text formatting, handy for more elaborate descriptions that need their whitespace preserved

Equipped with this knowledge, you should be able to write beautiful and easy to use command line tools. Please do so :)

Tags: ,
Author: Xion, posted under Programming » Comments Off on Command Line Parsing in Python: Tips & Tricks

Ultimate Tea Solution

2012-12-05 22:15

Programmers are known for using various, ahem, cognitive enhancers (all legal, of course), with coffee as probably the most popular. Well, I’m an avid tea drinker instead, and I’m always on lookout for new flavors, brewing techniques and equipment.

Today I’d like to present a perfect example of from the last category. I’ve found it purely by accident while on one of the many trips to IKEA that I’ve undertaken in the last few days. It’s an ingenious teapot that makes it super easy to brew tea, pour it and – finally – get rid of used-up leaves.
In the past I used several different types of pots with built-in strainers, as well as standalone infusers, and it was always the cleanup part that turned out to be the most cumbersome. Soaked tea leaves don’t come off easily from infusers’ metallic lattice, requiring to flush the remnants out with direct water stream and risk clogging up the sink (eventually).

Overall, it’s just messy, not very clever and hardly user-friendly.


They say it helps with coffee too,
but I find this fact irrelevant.

Fortunately, the teapot I have found has solved it in a much smarter way. There is no separate insert where the leaves should go. Instead, you are supposed to put them directly inside the glass container and pour water straight into it.

This, obviously, seems like an extremely old-fashioned way of brewing tea, but it is also one of the best ones. Leaves are given plenty of space here to spread the flavor throughout the whole pot, rather than being crumpled and confined to the small volume of typical infusers. As a result you may often shorten the brewing time while still getting a richer taste in the end.

Problems arise when you’d like to pour some tea into your cup or glass and you don’t fancy getting some of those pesky leaves alongside with it. This is also where the teapot in question shows its ingenuity – or more precisely, it’s the cap of it that does.

Designers have equipped it with a piston made of fine-grained lattice that goes up and down the pot’s cylindrical body. The idea is just bizarrely simple: once your tea has extracted enough goodness from the leaves floating within, you can just press the piston all the way down. This collects all stray leaves and keeps them conveniently at the bottom of the pot, so that nothing gets through when you try to fill your cup.

Cleaning is also very easy: you simply run some tap water through the piston and into the glass, flushing the former while keeping all the leaves inside the pot. Afterwards, you just flush everything down the toilet and wash the teapot normally (e.g. in dishwasher). It’s effective, clean and simple.

And with a steady supply of tea, your code will likely be so too! :)

Tags: , ,
Author: Xion, posted under Life, Programming » 6 comments

git outgoing

2012-11-20 12:05

Now that I don’t use Mercurial at work anymore, I’ve found that despite its shortcomings (hg status taking 10+ seconds?!) it has some few nice features. One of those is hg outgoing, which shows you which changesets you are going to send to remote repo in your next push. A quick glance at this list will typically ensure that everything is in order, or allow to amend some commits before making them public.

In Git you can do the similar by applying a filter to git log:

  1. $ git log origin/master..

But while origin is most often the remote you want to compare against, the master branch is typically not the one where most of development takes place. So if we want to create a git outgoing command, we would rather check what the current branch is and compare it with its remotely tracked equivalent:

  1. #!/bin/sh
  2. BRANCH=$(git name-rev HEAD 2>/dev/null | awk "{ print \$2 }")
  3. git log origin/$BRANCH..

Simply naming this script git-outgoing and making it executable somewhere within your $PATH (e.g. /usr/bin) will make the git outgoing command available:

  1. $ git outgoing
  2. commit 8c96c21c420dd10a34441cbd7d4c6904a6077716
  3. Author: Karol Kuczmarski <karol.kuczmarski@gmail.com>
  4. Date:   Tue Nov 20 11:51:44 2012 +0100
  5.  
  6.     Add .gitignore
  7.  
  8. commit 8a51a4f39b383c9dff64532403ab3922bc2ae13c
  9. Author: Karol Kuczmarski <karol.kuczmarski@gmail.com>
  10. Date:   Tue Nov 20 11:50:01 2012 +0100
  11.  
  12.     Comments in install script

There are few untold assumptions here, like the fact that branch names must match on both local and remote repo. If you find yourself breaking those, then you’re probably better to just use git log directly.

Tags: , ,
Author: Xion, posted under Programming » 3 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.