Posts tagged ‘open source’

Anatomy of a Python Package

2014-01-27 23:00

Over the course of several past months and years I was coding in Python, I’ve created quite a few Python packages: both open source and for private projects. Even though their most important part was always the code, there are numerous additional files that are necessary for the package to correctly serve its purpose. Rather than part of the Python language, they are more closely related to the Python platform.

But if you look for any definite, systemic info about them, you will at best find some scattered pieces of knowledge in various unrelated places. At worst, the only guidance would come in the form of a multitude of existing Python package sources, available on GitHub and similar sites. Parroting them is certainly an option, although I believe it’s much more advantageous to acquire firm understanding of how those different cogs fit together. Without it, following the modern Python’s best development practices – which are all hugely beneficial – is largely impossible.

So, I want to fill this void by outlining the structure of a Python package, as completely as possible. You can follow it as a step-by-step guide when creating your next project. Or just skim through it to see what you’re missing, and whether it’d be worthwhile to address such gaps. Any additional element or file will usually provide some tangible benefit, but of course not every project requires all bells and whistles.

Without further ado, let’s see what’s necessary for a complete Python software bundle.

Ego-Driven Technologies

2013-02-17 19:36

Last week – while still on the other side of the pond – I attended a meet-up organized by the local Google Developers Group. The meeting included a presentation about Go, aimed mostly at newcomers, which covered the language from the ground up but at very fast pace. This spurred a lot of survey questions, as people evidently wanted to assess the language’s viability in general and fitness for particular domain of applications.
One of them was about web frameworks that are available to use in Go. Answer mentioned few simple, existing ones, but also how people coming from other languages are working to (re)build their favorite ones in Go. The point was, of course, that even though the language does not have its own Django or Rails just yet, it’s bound to happen quite soon.

And that’s when it dawned on me.

See, I wondered for a while now why people are eager to subject themselves to a huge productivity drop (among other hardships) when they switch from one technology, that they are proficient in, to a different but curiously similar one.
Mind you, I’m not talking about exploratory ventures intended to evaluate language X or framework Y by doing one or two non-trivial projects; heck, I do it very often (and you should too). No, I’m talking about all-out switching to a new shiny toy, especially when decided consciously and not through a gradual slanting, in a kind of “best tool for the job” fashion.

Whenever I looked for justification, usually I’d just find a straightforward litany of perks and benefits of $targetTechnology, often having a not insignificant intersection with analogous list for the old one. Add the other, necessary part of risk-benefit calculation – drawbacks – and it just doesn’t balance out. Not by a long shot.

So, I notice that I am confused. There must some be other factor in play, but I couldn’t come up with any candidates – until that day.
As I speculate now, there is actually a big incentive to jump ship whenever a new one appear. And it seems to be one of dirty secrets of the hacker community, because it directly questions the esteemed notion of meritocracy that we are so eager to flaunt.

Tags: , , , ,
Author: Xion, posted under Computer Science & IT, Events, Thoughts » Comments Off on Ego-Driven Technologies

License It, Please

2013-01-31 6:45

In a blog relayed to my news reader today via Slashdot, I found this bit about providing licenses to the open source code you publish. Or, more specifically, about not providing them:

If some ‘no license’ sharing is a quiet rejection of the permission culture, the lawyer’s solution (make everyone use a license, for their own good!) starts to look bad. This is because once an author has used a standard license, their immediate interests are protected – but the political content of not choosing a license is lost. [emphasis mine]

I admire how the author goes all post-modernistic by bringing up fuzzy terms like “permission culture”. It’s a serious skill, to muddy such a clear-cut issue by making so many onerous assumptions per square inch of text. The alleged existence of some “political content” involved in not choosing any license – as opposed to, say, negligence or lack of knowledge – is easily my favorite here.

On more serious note: what a load of manure. I won’t even grace the premise with speculation on how likely it is to have anything to do with reality – that is, how big a percentage of the ‘no license’ projects is made so by the conscious choice of their authors. No, I will be insanely generous and assume that it actually holds water. Doesn’t matter; the claim that this practice should be encouraged and that something valuable is lost if software project has a license is still sheer lunacy.

If you don’t explicitly renounce some rights to your code – by providing a license, as the most common way – they are all reserved to you. Regardless of what political or cultural weight you may want to associate with this fact, the practical one is implied by law. And it’s very simple: no one can safely do anything with that code of yours, for there is always a risk you will exercise your rights through prosecution.

Of course I know you wouldn’t do anything so clearly evil. But that’s no guarantee for many parties that treat the issues of copyright and liability very seriously: from solo freelancers to the biggest of companies, and from lone hackers to the most influential software foundations. If you care about your code being widely used and solving problems for as many people as possible, this is also something you should pay attention to.
Otherwise it may happen that for someone, your work is just the perfect piece of puzzle – but they cannot use it safely. They might ask you to fix your oversight, of course, but they might also go somewhere else.

And that is typically the “immediate interest” that you protect by licensing: letting others actually use your code. If you ask me, that sounds like something totally worthy of protection.

Tags: ,
Author: Xion, posted under Internet, Programming » 3 comments

Open Source and the Prisoner’s Dilemma

2012-11-04 15:05

Some time ago, on one of the forums I got into discussion about merits and motivations of releasing projects and code as open source. Turns out that many people cannot exactly wrap their heads around the concept of giving away your code for free. Even leaving the exact meaning of ‘free’ aside (it applies to both of them), I believe we can observe a kind of cultural gap here. Strangely enough, it’s not even the case of Nerds vs. Rest of the World: the geek community is in itself somewhat divided with respect to this issue.

And that’s OK, in a way. I know it may not be obvious how value can be preserved if we just volunteer our time and skills for open source projects and don’t receive any direct compensation in return. Honestly, I’m still kinda amazed how it all works out, but I have my small theory. It’s somewhat tangential to the typical gift culture explanation, but could also shed some light on companies’ motivations for contributing to OSS.

Long story short, I think the relation dynamics between open source contributors and beneficents can be described in term of the prisoner’s dilemma: a rather classic example from the game theory. Before pursuing the analogy further, let’s have a brief look at this curious puzzle.

Of prisoners and stealing

Like the name suggests, the prisoner’s dilemma can be formulated in terms of jail, prisoners, cooperation and defection. I prefer an alternate setting, though, as it seems to better illustrate the concept and may be easier to understand. You can check the original formulation in Wikipedia, among other sources.

There is small game-theoretical difference between those two scenarios, but it’s largely irrelevant to our discussion.

Consider a two-player game with a potential reward of $100. The money can be taken by one of the players or split evenly among them. There is also a possibility of both players getting nothing. It all depends on how the players themselves decide what to do with the money.

They make independent decisions by choosing one of two options. They can decide to either split the money evenly between both of them, or steal (figuratively) the whole sum for themselves.
What happens next is the result of both decisions, revealed and applied at once:

  • If both players choose to split, the reward is indeed split evenly. Each player walks away with $50.
  • If both players choose to steal, they walk away with nothing.
  • If one players choses to steal while the other decides to split, stealer gets the whole $100. The other player walks away with nothing.

Sounds contrived?… It’s been actually tested in real life (as much as television can be called that), sometimes with dramatic results.

If you’re splitting and you know it…

What is the optimal strategy in a setting like this? If both opponents are known to be rational agents, they should arrive at the same conclusions. Because they cannot know each other’s thoughts, they both may only speculate what p_{steal} – probability that opponent steals – may be.

We know their decisions are independent, though, so p_{steal} shall remain the same regardless of what the player chooses in the end. Hence the expected values of both decisions seem to clearly indicate which one is better:

\displaystyle E[split] = 50 * (1 - p_{steal})

\displaystyle E[steal] = 100 * (1 - p_{steal})

Looks like we should simply steal and be done with it.

But wait! Since both players are rational agents, they will both arrive at the same conclusion. Moreover, each will know the other thought the same. So they both know that p_{steal} is actually 1. Unfortunately, in this case

\displaystyle E[split] = E[steal] = 0

and they will both get nothing if they follow this logic.

Pity. Or maybe it’s better to split, then? If they both do just that, each will at least get $50 instead of nothing… Yes, this looks like a much better alternative, especially that we can count on both players to apply this reasoning; they are rational agents, after all. So by this logic, they will both split and everyone will get something in the end…

Well, except that now p_{steal} is 0, so it’s actually better to steal instead. Oh, and both players know that, obviously, and are not happy about it… again. Hence they will rather split, which makes stealing more attractive option, which in turns compels splitting – and so on.

Complexity for the rescue

This example of circular meta-reasoning is unresolvable in two-player case, because a single choice will make or break the system. Fortunately, reality is much more complex, with multiple agents making countless decisions all the time.

Decisions such as, for example, whether to open source this new project, or maybe contribute to some existing one.

Just like with the situation described above, looks like the optimal choice is to “steal”: to draw liberally from the vast expanse of existing open source software while contributing nothing in return. No single such behavior would cause the whole ecosystem to crumble, so the incentive for exploitation is very tangible. Heck, it’s not even obvious what’s the steal here, because what value loss is incurred by “splitters” is not easily recognizable.

Yet it’s obvious why every party cannot apply this strategy, lest the result will be very suboptimal for everyone. Somewhere between those two extremes (everyone “splits” vs everyone “steals”) there might be the point of equilibrium: where stealers can derive maximum utility without irrevocably harming the game’s dynamics. We don’t really know where that point lies, though, so we just choose to play along and split.

Coded4: Time-based Statistics for Git/Hg Repos

2011-12-24 16:23

As the ubiquitous spirit of laziness relaxation permeates the holiday season, a will to do any serious and productive coding seems appallingly low. Instead of fighting it, I went along and tried to write something just for pure fun.

And guess what: it worked pretty well. Well enough, in fact, that the result seems to be worthy of public exposure – which I’m hereby doing. Behold the coded4 project:

  1. $ git clone git@github.com:jquery/jquery
  2. $ coded4 jquery

  1. name                   commits time        
  2. -------------------------------------------
  3. John Resig             1211    5d 09:11:59
  4. jeresig                503     2d 07:50:02
  5. Jörn Zaefferer         309     1d 09:05:53
  6. Brandon Aaron          247     1d 01:34:29
  7. Dave Methvin           235     22:17:27    
  8. jaubourg               221     22:22:31    
  9. timmywil               221     23:00:56    
  10. Ariel Flesler          200     22:10:46    
  11.  
  12. ...

What exactly is this thing? It’s a “time sheet” for the marvelous jQuery project, reconstructed from commit timestamps in its Git repository. coded4 created this result by analyzing repo’s history, grouping changesets by contributors, and running some heuristics to approximate timespans of their coding sessions.

And of course, this can be done for any Git (or Hg) repository. Pretty neat for a mere *pops up shell and types coded4 .* 3 hours of casual coding, eh?

Odkrycie archeologiczne

2011-11-20 21:18

Sprzed prawie czterech lat pochodzi pewien niewielki (~2KLOC) projekt uczelniany, na który natknąłem się kilka dni temu w swoich przepastnych archiwach i postanowiłem upublicznić. Jest to implementacja prostego (acz zupełnie funkcjonalnego) serwera FTP, napisana w czystym C pod systemy POSIX-owe. Nie spodziewam się bynajmniej, aby mogła znaleźć rzeczywiste zastosowanie jako kawałek oprogramowania. Jest ona jednak całkiem interesująca jako kawałek kodu.

Wielu znany jest zapewne “syndrom następnego pół roku”. Polega on na tym, że gdy po pół roku (plus/minus kilka miesięcy) spoglądamy na stworzony przez siebie kod, widzimy go tak, jakby napisał go ktoś zupełnie inny. Zazwyczaj wręcz trudno nam się w nim połapać i szybko dochodzimy do wniosku, że teraz napisalibyśmy go zdecydowanie lepiej. Nasz twór traktujemy więc jako bezwarto… ekhm… legacy code (;]), i uważamy to za naturalną kolej rzeczy.

Jednak moje niedawne znalezisku okazało się pod tym względem sporym zaskoczeniem. Nietypowe jest bowiem to, jak przetrwało ono próbę czasu. Na jego podstawie muszę dojść do lekko szokującego wniosku, iż Xion2007 potrafił – o zgrozo – pisać dobry kod. Robił to wprawdzie ostrożnie i raczej niepewnie (czego dowodem była przesadna ilość komentarzy), ale koniec końców udawało mu się to całkiem nieźle. Wysyłając mu wiadomość z przyszłości, mógłbym wprawdzie wspomnieć o zaletach podziału kodu na pliki krótsze niż 800-linijkowe, lecz poza tym do niewielu rzeczy mógłbym się przyczepić. To zupełnie akceptowalny, czytelny i przejrzysty kod w C

Madness!

Tags: , ,
Author: Xion, posted under Programming, Studies » Comments Off on Odkrycie archeologiczne

pyduck – biblioteka do interfejsów w Pythonie

2011-09-26 22:16

Czas pochwalić się swoim nowym dziełem. Nie jest ono bardzo imponujące ani specjalnie duże. Mam jednak nadzieję, że będzie ono przydatne dla tego (wąskiego) grona odbiorców, do którego jest skierowane.

Mam tu na myśli niewielką biblioteką do Pythona, która ma na celu poprawienie użyteczności jednej z głównych, ideowych cech języka – tak zwanego typowania kaczkowego (duck typing). Geneza tego terminu jest oczywiście wielce intrygująca, ale założenie jest proste. Zamiast czynić obietnice i jawnie deklarować implementowane interfejsy, obiekty w Pythonie “po prostu są” i zwykle próbują być od razu używane do założonych celów. Jeśli okażą się niekompatybilne (np. nie posiadają żądanej metody), wtedy oczywiście rzucany jest wyjątek. Pythonowska praktyka polega więc na przechodzeniu do rzeczy bez zbędnych ceregieli i obsłudze ewentualnych błędów.

Ma to rzecz jasna swoje zalety, ma też wady, a czasami może również rodzić problemy, jeśli błąd spowodowany niekompatybilnością obiektu ujawni się za późno. Z drugiej strony brak konieczności jawnego specyfikowania implementowanych interfejsów to spora zaleta. Najlepiej więc byłoby jakoś połączyć te dwa światy i umożliwić wcześniejsze sprawdzenie możliwości obiektu…

Jak można się pewnie domyślić, to właśnie próbuje umożliwić mój framework, noszący wdzięczną nazwę pyduck. Dodaje on do Pythona mechanizm interfejsów bardzo podobny do tego, który obecny jest w języku Go. Najważniejszą jego cechą jest właśnie fakt, że w konkretnych typach interfejsy są implementowane niejako automatycznie – wystarczy, że mają one odpowiednie metody. Samo sprawdzenie, czy obiekt implementuje dany interfejs polega zaś na faktycznym zaglądnięciu w listę jego metod, a nie weryfikacji jakichś jawnych deklaracji.

Inaczej mówiąc, nie czynimy tutaj żadnych obietnic odnośnie obiektu, ale wciąż mamy możliwość kontroli, czy nasze wymagania są spełnione. Najlepiej ilustruje to oczywiście konkretny przykład:

  1. from pyduck import Interface, expects
  2.  
  3. class Drawable(Interface):
  4.     def get_bounds(self): pass
  5.     def draw(self, canvas): pass
  6.  
  7. class Canvas(object):
  8.     @expects(Drawable)
  9.     def draw_object(self, obj):
  10.         bounds = obj.get_bounds()
  11.         self.set_clipping_bounds(bounds)
  12.         obj.draw(self)

Zaznaczamy w nim, że metoda Canvas.draw_object spodziewa się obiektu zgodnego z interfejsem Drawable. Jest on zdefiniowany wyżej jako posiadający metody get_bounds i draw. Sprawdzenie, czy rzeczywisty argument funkcji spełnia nasze wymogi, zostanie wykonane przez dekorator @expects. Zweryfikuje on obecność i sygnatury metod wspomnianych metod.
Dzięki temu będziemy mogli być pewni, że mamy do czynienia z obiektem, który rzeczywiście potrafi się narysować. Jego konkretna klasa nie będzie musiała natomiast nic wiedzieć na temat interfejsu Drawable ani jawnie deklarować jego wykorzystania.

Po więcej informacji zapraszam oczywiście na stronę projektu na GitHubie. Ewentualnie można też od razu zainstalować paczkę, np. poprzez easy_install:

  1. $ sudo easy_install pyduck

A ponieważ wszystko open source jest zawsze wersją rozwojową, nie muszę chyba wspominać, że z chęcią witam pull requesty z usprawnieniami i poprawkami :>

Tags: , , , ,
Author: Xion, posted under Programming » 1 comment
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.