Posts tagged ‘C/C++’

Lists vs Tuples

2013-09-04 18:42

Among the many things that Python does right, there are also a few that could have been better thought out. No language is perfect, of course, but since Python is increasingly adopted as the language of choice for complete beginners in programming, the net effect will be many new coders being heavily influenced by ideas they encounter here.

And unfortunately, some of those ideas are dead wrong. A chief example is the relation between lists and tuples, specifically the mistaken notion of their mutual similarity. In reality – that is, in every other language – they are completely different concepts, and rightfully so.

What is list?…

The term ‘list’ in programming is sometimes meant to refer to ‘linked list’ – one of the simple data structures you’d typically encounter in the introductory course to algorithms. It consists of a sequence of separately allocated elements, stitched together using pointers. Depending on the density and direction of those pointers, the type is further subdivided into singly and doubly linked lists, cyclic lists, and so on.

But the broader meaning, which is also more common now, describes a general collection of elements that can be linearly traversed, regardless of the details of its underlying representation. The List interface in Java or .NET is intended to capture the idea, and it does so pretty accurately. Concrete implementations (like ArrayList or LinkedList) can still be decided between if those details turn out to be important, but most of the code can deal with lists in quite abstract, storage-agnostic manner.

…in Python

In Python, lists generally conform to the above description. Here, you cannot really decide how they are stored under the hood (array module notwithstanding), but the interface is what you would expect:

  1. files = get_file_list('/home/xion')
  2. if files[0] == '.':
  3.     files.pop(0)
  4. if files[0] == '..':
  5.     files.pop(0)
  6. print "Found ", len(files), " files."
  7. for file in files:
  8.     print "- ", file

You can insert, append and remove elements, as well as iterate over the list and access elements by indexes. There is also a handy bracket notation for list literals in the code:

  1. a_list = [1, 2, 3]

About the only weird thing is the fact that you can freely put objects of different types into the same list:

  1. diverse_list = [42, "foo", False]

In a way, though, this appears to be in sync with the general free-form nature of Python. For analogy, think about List<Object> in Java that can do exactly the same (or, in fact, any List prior to introduction of generics).

On the other hand, this is a tuple:

  1. a_tuple = (1, 2, 3)

Besides brackets giving way to parentheses, there is no difference in literal notation. Similarly, you can iterate over the tuple just as well as you can over the list:

  1. for x in a_tuple:
  2.    print x

in addition to accessing elements by index. What you cannot do, though, is modifying a tuple once it is created: be it by trying to add more elements, or changing existing ones:

  1. >>> a_tuple[0] = 42
  2. Traceback (most recent call last):
  3.   File "<stdin>", line 1, in <module>
  4. TypeError: 'tuple' object does not support item assignment

But just like with lists, there is no limitation for the types of elements you put therein:

  1. diverse_tuple = (42, "foo", False)

All in all, a tuple in Python behaves just like an immutable (unchangeable) list and can be treated as such for all intents and purposes.

Tuplication

Now, let’s look at tuples in some other programming languages. Those that support them one way or another include C++, D, Haskell, Rust, Scala, and possibly few more exotic ones. There is also a nascent support for tuple-like constructs in Go, but it’s limited to returning multiple results from functions.

In any of these, tuple means several objects grouped together for a particular purpose. Unlike a list, it’s not a collection. That’s because:

  • objects retain their individual identity
  • their number is not arbitrary; in fact, tuples of different count have separate names: pairs, triples, etc.
  • objects have positions assigned: when handling a tuple, it is expected that certain thing X will be at position N

Statically typed languages expand especially on the last point, making it possible to specify what type we expect to see at every position. Together, they constitute the tuple type:

  1. typedef tuple<int, string> numbered_line;
  2.  
  3. // Example usage
  4. vector<numbered_line> read_numbered_lines(string filename, int start=1) {
  5.     vector<numbered_line> result;
  6.     ifstream infile(filename);
  7.     string line;
  8.     for (int i = start; getline(filename, line); ++i) {
  9.         result.push_back(make_tuple(i, line));
  10.     }
  11.     return result;
  12. }

In other words, you don’t operate on “tuples” in general. You use instances of specific tuple types – like a pair of int and string – to bind several distinct values together.
Then, as your code evolves, you may need to have more of them, or to manipulate them in more sophisticated ways. To accommodate, you may either expand the tuple type, or increase readability at the (slight) cost of verbosity and transform it into a structure:

  1. struct numbered_line {
  2.     string filename;
  3.     int number;
  4.     string text;
  5. };

Once here, you can stay with functions operating on those structures, or refactor those into methods, or use some combination of those two approaches.

Missing link

But in Python, that link between tuples and structures is almost completely lost. There is one built-in tuple type, which is simply tuple, but even that isn’t really paid attention to. What happens instead is that tuples are lumped together with lists and other collections into the umbrella term of “iterable”, i.e. something which can iterated over (typically via for loop).

As a result, a tuple can be converted to list and list can be converted to tuple. Both operations make no sense whatsoever. But they are possible, and that leads to tuples and lists being used interchangeably, making it harder to identify crucial pieces of data that drive your logic. And so they will grow uncontrollably, rather than having been harnessed at appropriate time into an object, or a namedtuple at the very least.

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Linus Torvalds

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Lists vs Tuples

Python is the Worst Part of C++

2013-07-15 22:15

I’ve had a peculiar kind of awful realization after listening to a C++ talk earlier today. The speaker (Bjarne Stroustrup, actually) went over a few defining components of the language, before he took a much longer stop at templates. In C++14, templates are back in the spotlight because of the new constraints feature, intended to make working with (and especially debugging) templates much more pleasant.

Everyone using C++ templates now is probably well accustomed to the cryptic and excessively long error messages that the compiler spits out whenever you make even the slightest of mistakes. Because of the duck typing semantics of template arguments, those messages are always exposing the internals of a particular template’s implementation. If you, for example, worked with STL implementation from Visual C++, you would recognize the internal __rb_tree symbol; it appeared very often if you misused the map or set containers. Its appearance was at best only remotely helpful at locating the issue – especially if it was inside a multi-line behemoth of an error message.

Before constraints (or “concepts lite”, as they are dubbed) improve the situation, this is arguably the worst part of the C++ language. But alas, C++ is not the only language offering such a poor user experience. As a matter of fact, there is a whole class of languages which are exactly like that – and they won’t change anytime soon.

Yes, I’m talking about the so-called scripting languages in general, and Python in particular. The analogies are striking, too, once you see past the superfluous differences.

Take the before mentioned duck typing as an example. In Python, it is one of the core semantical tenets, a cornerstone of language’s approach to polymorphism. In current C++, this is precisely the cause of page-long, undecipherable compiler errors. You just don’t know whether it’s a duck before you tell it to quack, which usually happens somewhere deep inside the template code.

But wait! Python et al. also have those “compiler errors”. We just call them stacktraces and have interpreters format them in much a nicer, more readable way.
Of course unlike template-related errors in C++, stacktraces tend to be actually helpful. I pose, however, that it’s mostly because we learned to expect them. Studying Python or any other scripting language, we’re inevitably exposed to them at the very early stage, with a steady learning curve that corresponds to the growing complexity of our code.
This is totally different than having the compiler literally throw its innards at you when you try to sort a list of integers.

What I find the most interesting in this whole intellectual exercise is to examine what solutions are offered by both sides of the comparison.
Dynamic languages propose coping mechanisms, at best. You are advised to liberally blanket your code with automated tests so that failing to quack is immediately registered before the duck (er, code) goes live. While some rudimentary static analysis and linting is typically provided, you generally cannot have a reasonable idea whether your code doesn’t fail at the most basic level before you actually run it.

Now, have you ever unit-tested the template specification process that the C++ compiler performs for any of your own templates? Yeah, I thought so. Except for the biggest marvels of template metaprogramming, this may not be something that even crosses your mind. Instead, the established industry practice is simply “don’t do template metaprogramming”.

But obviously, we want to use dynamic languages, and some of us probably want to do template metaprogramming. (Maybe? Just a few? Anyone?…) Since they clearly appear to be similar problems, it’s not very surprising that remedies start to look somewhat alike. C++ is getting concepts in order to impose some rigidity on the currently free-form template arguments. Python is not fully on that path yet but the signs are evident, with the recent adoptions of enums (that I’ve fretted about) as the most prominent example.

If I’m right here, it would be curious to see what lies at the end of this road. In any case, it will probably have been already invented fifty years ago in Lisp.

Tags: , , , , ,
Author: Xion, posted under Programming » Comments Off on Python is the Worst Part of C++

Decorated Functions in C++… Almost

2013-04-28 21:56

Many languages now include the concept of annotations that can be applied to definitions of functions, classes, or even variables and fields. The term ‘annotation’ comes from Java, while other languages use different names: attributes (C#), decorators (Python), tags (Go), etc.
Besides naming disparities, those features also tend to differ in terms of offered power and flexibility. The Python ones, for example, allow for almost arbitrary transformations, while Go tags are constrained to struct fields and can only consist of text labels. The archetypal annotations from Java lie somewhere in between: they are mostly for storing metadata, but they can also be (pre)processed during compilation or runtime.

Now, what about C++? We know the language has a long history of lacking several critical features (*cough* delegates *cough*), but the recent advent of C++11 fixed quite a few of them all at once.
And at first sight, the lack of annotation support seems to be among them. New standard introduces something called attributes, which appears to fall in into the same conceptual bucket:

  1. [[dllexport]] void SomeFunction(int x);

That’s misleading, though. Attributes are nothing else than unified syntax for compiler extensions. Until C++11, the job of attributes were done by custom keywords, such as __attribute__ (GCC) or __declspec (Visual C++). Now they should be replaced by the new [[squareBracket]] syntax above.
So there is nothing really new about those attributes. At best, you could compare them to W3C deciding on common syntax for border-radius that forces both -webkit-border-radius and -moz-border-radius to adapt. Most importantly, there is no standard way to define your custom attributes and introspect them later.

That’s a shame. So I thought I could try to fix that, because the case didn’t look completely lost. In fact, there is a precedent for a mainstream language where some people implemented something-kinda-almost-like annotations. That language is JavaScript, where annotations can be realized as functions arranged into a pipeline. Here’s an example from Express framework:

  1. app.get('/home', [loginRequired, cached({minutes:30})], function(req, res){
  2.     res.render('home');
  3. });

Both loginRequired and cached(...) are functions, tied together by app.get with the actual request handler at the end. They “decorate” that handler, wrapping it inside a code which offers additional functionality.

But that’s JavaScript, with its dynamic typing and callback bonanza. Can we even try to translate the above code into C++?…
Well yes, we actually can! Two new features from C++11 allow us to attempt this: lambda functions and initializer lists. With those two – and a healthy dose of functional programming – we may achieve at least something comparable.

Go is Like Better C, Mostly

2013-01-09 22:55

The Go programming language is was on my (long) list of things to look into for quite some time now. Recently, at last, I had the opportunity to go through the most part of a comprehensive tour of Go from the official website, as well as write few bits of some Go code by myself.

Go-pherToday I’d like to recap on some of my impressions. You can treat it as “unboxing” of the Go language, much like when people post movies of their first hands-on experiences with new devices. Except, it will be just text – I’m not cool enough to do videos yet ;)

Some trivia

We all like to put stuff into our various mental buckets, so let’s do that with Go too.

Go is a compiled, statically typed programming language that runs directly on the hardware, without any underlying virtual machine or other bytecode-based runtime. That sounds good from the speed viewpoint and indeed, Go comes close to C in raw performance of equivalent programs.

Syntax of Go is C-like, at least in the fact that it’s using curly braces to delimit blocks of code. Some visual clutter is intentionally omitted, though. Semicolons are optional, for example, and idiomatic Go code omits them at all times.
But more surprisingly, parentheses around if and for conditions are straight out forbidden. As a result, it’s mandatory to use curly braces even for blocks that span just one line:

  1. if obj == nil {
  2.     return
  3. }

If you’re familiar with reasoning that suggests doing that in other C-like languages, you shouldn’t have much problems adapting to this requirement.

No-fuss static typing

Go is type-safe and requires all variables to be declared prior to use. For that it provides very nice sugar in the form of := operator, coupled with automatic type inference:

  1. s := "world"
  2. fmt.Printf("Hello %s!\n", s)

But of course, function arguments and return values have to be explicitly typed. Coming from C/C++/Java/etc. background, those type declarations might look weird at first, for they place the type after the name:

  1. func Greet(whom string) string {
  2.     return fmt.Sprintf("Hello, %s! How are you?", whom)
  3. }

As you can see, this also results in putting return type at the end of function declarations – something that e.g. C++ also started to permit.

But shorthand variable declarations are not the only way Go improves upon traditional idioms of static typing. Its interfaces are one of the better known features here. They essentially offer the support for duck typing (known from Python, among others) in a compiled language.
The trick is that objects do not specify which interfaces they implement: it’s just apparent by their methods. We can, however, state what interfaces we require for our parameters and variables, and those constraints will be enforced by the compiler. Essentially, this allows for accepting arbitrary values, as long as they “quack like a duck”, while retaining the overall type safety.

As an example, we can have a function that accepts a very general io.Writer:

  1. func SendGreetings(w io.Writer, name string) {
  2.     fmt.Fprintf(w, "Hello, %s!", name)
  3. }

and use it with anything that looks like something you could write into: file objects, networked streams, gzipped HTTP responses, and so on. Those objects won’t have to declare or even know about io.Writer; it’s sufficient that they implement a proper Write method.

Pointers on steroids

Talking about objects and interfaces sounds a bit abstract, but we shall not forget that Go is not a very high level language. You still have pointers here like in C, with the distinction between passing an object by address and copying it by value like in C++. Those two things are greatly simplified and made less error prone, however.

First, you don’t need to remember all the time whether you interact with object directly or through a pointer. There’s no -> (“arrow”) operator in Go, so you just use dot (.) for either. This makes it much easier to change the type of variable (add or remove *) if there’s need.

Second, most common uses for pointers from C (especially pointer arithmetic) are handled by dedicated language mechanism. Strings, for example, are distinct type with syntactic support and not just arrays of chars, neither a standard library class like in C++. Arrays (called slices) are also well supported, including automatic reallocation based on capacity, with the option of reserving the exact amount of memory beforehand.

Finally, the common problems with pointer aliasing don’t really exist in Go. Constraints on pointer arithmetic (i.e. prohibiting it outright) mean that compiler is able to track how each and every object may be used throughout the program. As a side effect, it can also prevent some segmentation faults, caused by things like local pointers going out of scope:

  1. func Leak() *int {
  2.     i := 42
  3.     return &i
  4. }

The i variable here (or more likely: the whole stack frame) will have been preserved on heap when function ends, so the pointer does not become immediately invalid.

Packages!

If you ever coded a bit in some of the newer languages, then coming to C or C++ you will definitely notice (and complain about) one thing: lack of proper package management. This is an indirect result of the header/implementation division and the reliance on #include‘ing header files as means of specifying dependencies. Actually, #includes are not even that: they work only for compiler and not linker, and are in some sense abused when working with precompiled headers.

What about Go?… Turns out it does the right thing. There are no separate header and implementation units, only modules (.go files). Unless you are using GCC frontend or interfacing with C code, the compiler itself is also unified.

But most importantly, there are packages and normal import statements. You can have qualified and unqualified imports, and you can alias things you’re importing into different names. Packages themselves are based on directory structure rooted in $GOROOT, much like e.g. Python ones are stored under $PYTHONPATH.

The only thing you can want at this point is the equivalent of virtualenv. Note that it’s not as critical as in interpreted languages: standalone compiled binaries do not have dependency problems, after all. But it’s still a nice thing to have for development. So far, people seem to be using their own solutions here.

Tags: , , , , , ,
Author: Xion, posted under Programming » Comments Off on Go is Like Better C, Mostly

Python Optimization 101

2012-07-14 21:17

It’s pretty much assumed that if you’re writing Python, you are not really concerned with the performance and speed of your code, provided it gets the job done in sufficiently timely manner. The benefits of using such a high level language usually outweigh the cons, so we’re at ease with sacrificing some of the speed in exchange for other qualities. The feasibility of this trade is always relative, though, and depends entirely on the tasks at hand. Sometimes the ‘sufficiently fast’ bar might be hung quite high up.

But while some attitudes are clearly beyond Python’s reach – like real-time software on embedded systems – it doesn’t mean it’s impossible to write efficient code. More importantly, it’s almost always possible to write more efficient code than we currently have; the nimble domain of optimization has its subdivision dedicated specifically to Python. And quite surprisingly, performance-tuning at this high level of abstraction often proves to be even more challenging than squeezing nanoseconds out of bare metal.

So today, we’re going to look at some basic principles of optimization and good practices targeted at writing efficient Python code.

Tools of the trade

Before jumping into specific advice, it’s essential to briefly mention few standard modules that are indispensable when doing any kind of optimization work.

The first one is timeit, a simple utility for measuring execution time of snippets of Python code. Using timeit is often one of the easiest way to confirm (or refute) our suspicions about insufficient performance of particular piece of code. timeit helps us in straightforward way: by executing the statement in questions many times and showing average, as well as cumulative, time it has taken.

As for more detailed analysis, the profile and cProfile modules can be used to gain insight on CPU time consumed by different parts of our code. Profiling a statement will yield us some vital data about number of times that any particular function was called, how much time a single call takes on average and how big is the function’s impact on overall execution time. These are the essential information for identifying bottlenecks and therefore ensuring that our optimizations are correctly targeted.

Then there is the dis module: Python’s disassembler. This nifty tool allows us to inspect instructions that the interpreter is executing, with handy names translated from actual binary bytecode. Compared to actual assemblers or even code executed on Java VM, Python’s bytecode is pretty trivial to analyze:

  1. >>> dis.dis(lambda x: x + 1)
  2.   1           0 LOAD_FAST                0 (x)
  3.               3 LOAD_CONST               1 (1)
  4.               6 BINARY_ADD          
  5.               7 RETURN_VALUE

Getting familiar with it proves to be very useful, though, as eliminating slow instructions in favor of more efficient ones is a fundamental (and effective) optimization technique.

Tags: , , ,
Author: Xion, posted under Computer Science & IT » 2 comments

Of Languages and Keywords

2012-05-28 18:40

I recently had a discussion with a co-worker about feasibility of using anonymous functions in Python. We happen to overuse them quite a bit and this is not something I’m particularly fond of. For me lambdas in Python are looking pretty weird and thus I prefer to use them sparingly. I wasn’t entirely sure why is it so – given that I’m quite a fan of functional programming paradigm – until I noticed a seemingly insignificant fact.

Namely: the lambda keyword is long. With six letters, it is among the longer keywords in Python 2.x, tied with return, import and global, and beaten only by continue and finally. Quite likely, this is what causes lambdas in Python to stand out and require additional mental resources to process (assuming we’re comfortable enough with the very idea of anonymous functions). The long lambda keyword seems slightly out of place because, in general, Python keywords are short.

Or are they?… Thinking about this, I’ve got an idea of comparing the average length of keywords from different programming languages. I didn’t really anticipate what kind of information would be exposed by applying such a metric but it seemed like a fun exercise. And it surely was; also, the results might provoke a thought or two.

Here they are:

Language Keyword Total chars Chars / keyword
Python 2.7 31 133 4.29
C++03 74 426 5.76
C++11 84 516 6.14
Java 1.7 50 289 5.78
C 32 166 5.19
C# 4.0 77 423 5.49
Go 25 129 5.16

The newest incarnation of C++ seems to be losing badly in this competition, followed by C#. On the other side of the spectrum, Go and Python seem to be deliberately designed to avoid keyword bloat as much as possible. Java is somewhere in between when it comes to sheer numbers of keywords but their average length is definitely on the long side. This could very well be one of the reasons for the perceived verbosity of the language.

For those interested, the exact data and code I used to obtain these statistics are in this gist.

Tags: , , , , ,
Author: Xion, posted under Computer Science & IT » 8 comments

Checking Whether IP is Within a Subnet

2012-02-04 22:11

Recently I had to solve a simple but very practical coding puzzle. The task was to check whether an IPv4 address (given in traditional dot notation) – is within specified network, described as a CIDR block.
You may notice that this is almost as ubiquitous as a programming problem can get. Implementations of its simplified version are executed literally millions (if not billions) of times per second, for every IP packet at every junction of this immensely vast Internet. Yet, when it comes to doing such thing in Python, one is actually left without much help from the standard library and must simply Do It Yourself.

It’s not particularly hard, of course. But before jumping to a solution, let’s look how we expect our function to behave:

  1. >>> ip_in_network("10.0.0.0", "10.0.0.0/8")
  2. True
  3. >> ip_in_network("127.0.56.34", "127.0.0.0/8")
  4. True
  5. >> ip_in_network("192.168.0.0", "192.168.224.0/24")
  6. False

Pretty obvious, isn’t it? Now, if you recall how packet routing works under the hood, you might remember that there are some bitwise operations involved. They are necessary to determine whether specific IP address can be found within given network. However low-level this may sound, there is really no escape from doing it this way. Yes, even in Python :P

It goes deeper, though. Most of the interface of Python’s socket module is actually carbon-copied from POSIX socket.h and related headers, down to exact names of particular functions. As a result, solving our task in C isn’t very different from doing it in Python. I’d even argue that the C version is clearer and easier to follow. This is how it could look like:
#include
#include
#include
#include

bool ip_in_network(const char* addr, const char* net) {
struct in_addr ip_addr;
if (!inet_aton(addr, &ip_addr)) return false;

char network[32];
strncpy(network, net, strlen(net));

char* slash = strstr(network, “/”);
if (!slash) return false;
int mask_len = atoi(slash + 1);

*slash = ‘\0’;
struct in_addr net_addr;
if (!inet_aton(network, &net_addr)) return false;

unsigned ip_bits = ip_addr.s_addr;
unsigned net_bits = net_addr.s_addr;
unsigned netmask = net_bits & ((1 << mask_len) - 1); return (ip_bits & netmask) == net_bits; }[/c] A similar thing done in Python obviously requires less scaffolding, but it also introduces its own intricacies via struct module (for unpacking bytes). All in all, it seems like there is not much to be gained here from Python’s high level of abstraction.

And that’s perfectly OK: no language is a silver bullet. Sometimes we need to do things the quirky way.

Tags: , , ,
Author: Xion, posted under Computer Science & IT » 3 comments
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.