Posts tagged ‘imports’

Making App Engine More Pythonic

2013-10-18 22:27

Lately, I was re-evaluating Google App Engine – the cloud computing platform – to see how feasible it would be for one pet project I’ve had in mind. It was pleasantly surprising overall, as the platform improved quite a lot while I wasn’t looking, since about a year and a half ago. Mostly interested in the Python part, I noticed that version 2.7 is now standard, lots of libraries are available out of the box, and it’s possible to use to pretty much any web framework you’d like to, such as Flask or Django.

Still, there are some quirks. App Engine SDK, for example, is a self-contained bundle with a bunch of Python packages that make it possible to run the development server with your app on your local machine. You don’t really “install” it into your Python interpreter, though.
Same goes for any additional, third party libraries your app may need. They must all be deployed along with it, as there is no setup.py or requirements.txt to specify your dependencies in. If you’re used to how e.g. Heroku handles dependencies, GAE’s way will undoubtedly be quite a letdown.

Good news are: you can still make it work sanely. By that I mean using virtualenv for development rather than your global, system-level interpreter, and keeping the code of any third party libraries out of your project’s repository. You may not get quite the same experience of pip install and pip freeze > requirements.txt but well… it’s close enough :)

Dependencies as Git submodules

So you have an application that requires some external libraries. Few of them are provided by App Engine itself, and you will be able to access them after you specify your requirement in app.yaml. Many times, however, you will want to tap into broader open source ecosystem, just like you’d like with any other Python app.

There is a way, fortunately, to include external libraries to go with your application without them cluttering your repository. Since the de facto standard for publishing code on the ‘net is to push it to a public Git repository, we can use Git submodules to “symlink” to those repositories. Our own Git repo won’t store any of their actual content, but only a list of URLs; the .gitmodules file.

If you held your breath at the mere mention of Git submodules, don’t panic. They get a lot of flak, that’s true, and many of their claimed shortcomings are quite genuine. All of them apply to the scenario where a main repo uses submodules to reuse shared subproject that is modified in conjunction with the main one.
As you have probably noticed, this is totally different than the setting we’re discussing here. When including an external dependency, the fact that Git submodule points to specific commit in the other repo is a feature, not bug. It’s the exact same reason why we should always put version numbers in requirements.txt: upgrading a third party library must never be accidental, or you risk breaking your code through unexpected API or behavior changes.

So, how to do it – use Git submodules, that is? You substitute pip install with git submodule add:

  1. $ git submodule add git://github.com/mitsuhiko/flask lib/flask

This will establish reference between the repo under given URL and a directory path inside your project, fetching the repo’s content in the process. But as you will quickly notice in $ git status, that content won’t become part of the working directory.
After all this talk about being explicit with your libraries’ version, you probably also want to checkout a correct release:

  1. $ cd lib/flask
  2. $ git checkout 0.10.1

Otherwise, you will work off whatever the current HEAD happened to be, exactly how pip install flask would give you whatever is the newest release in PyPI.

Working alone from a single machine, this would set you up for the time being. For starting somewhere else, though, you need equivalent of pip install -r requirements.txt, i.e. a way to fetch all your libraries at once. Here’s where git submodule update comes handy:

  1. $ git submodule update --init

It will both setup your freshly cloned repo to use submodules specified in .gitmodules files, as well as pull the submodules’ content.

There’s much more to Git submodules, of course, so if you want to gain much more thorough insight into them than this short overview, I recommend having a look at the Git book. And as with most things, $ man git submodule is always helpful.

virtualenv for it all

With dependencies seemingly in place, you might be quite disappointed trying to, you know, use them:

  1. $ workon myawesomeproject
  2. (myawesomeproject)$ python
  3. Python 2.7.4 (default, Sep 26 2013, 03:20:26)
  4. [GCC 4.7.3] on linux2
  5. Type "help", "copyright", "credits" or "license" for more information.
  6. >>> import flask
  7. Traceback (most recent call last):
  8.   File "<stdin>", line 1, in <module>
  9. ImportError: No module named flask

The reason for that is simple, though: the libraries are physically there on your disk, but they are not in your virtualenv’s $PYTHONPATH, so Python has no idea where to import them from. There are ways to solve this problem that I could ramble for a while about, but I will just go ahead and demonstrate a ready-made shell script which handles it all :)
You might need to tweak it, e.g. if your GAE SDK installation path is different than /opt/google_appengine, but otherwise it should be pretty straightforward. One caveat, though: the script should be re-run after adding a brand new library, as described in previous section:

  1. $ git submodule add ...
  2. $ ./setup_virtualenv.sh

As an added bonus, you will get dev_appserver and appcfg binaries inside your virtualenv’s ./bin, so you may remove App Engine’s SDK directory from your regular $PATH.

Deployment shenanigans

Setup of a local development environment generally ends here – you should be now ready to run your app through dev_appserver. What’s still missing is making your bundled libraries work with remote Python on actual App Engine instance. Sadly, there is no virtualenv in the cloud.

Instead, we need to revert to the glorified sys.path hacks. Before importing anything, we extend the actual PYTHONPATH so that it covers our third party libraries. If their directory layout is just like shown in the first section (lib/ root with subdirs for different libraries), the following shim will suffice to correctly bootstrap the import mechanics:

  1. # run.py
  2. import os
  3. import sys
  4.  
  5.  
  6. lib_dir = os.path.join(os.path.abspath('.'), 'lib')
  7. sys.path[1:1] = [os.path.join(lib_dir, name)
  8.                  for name in os.listdir(lib_dir)]
  9.  
  10. app = __import__('myapppackage').app

Place this in the root of your project’s source tree (outside the main Python package) and point the app.yaml to it:

  1. handlers:
  2. # ... other handlers
  3. - url: .*
  4.   script: run.app

With this, you may now deploy your app and see whether it works correctly. If you encounter problems, I recommend taking a look at Flask on App Engine Project Template. Even if you intend to use different web framework, the example code should be largely applicable.

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Making App Engine More Pythonic

Literal __imports__

2013-09-28 22:56

As part of a language, Python obviously has import statements. They allow us to divide the code into different modules and packages:

  1. import collections
  2. import os
  3.  
  4. import thirdpartylib
  5. import anotherlib
  6.  
  7. from myapp.util import do_stuff

What is lesser known fact is that it also has an __import__ function. This function retains all functionality of the import statement, but has some additional features and slightly different use cases. With it, for example, you can import a module whose name you only know at “runtime”:

  1. def import_plugin(name):
  2.     plugins_package = __import__('myapp.plugins', fromlist=[name])
  3.     return getattr(plugins_package, name)

This comes handy in various types of general dispatchers, factory functions, plugin systems, and so forth. Returned from __import__ function is always a module object (even in cases when fromlist argument is used), so often a getattr is needed to extract a specific symbol from it.

Quite surprisingly, I have discovered that __import__ function may very well be useful also when you do know the desired module name. Reason is that import statement is sometimes unwieldy. It has similar problem as global variables (i.e. global statements) and inner function definitions (as opposed to lambdas): it makes the code stretch unnecessarily in the vertical dimension.

Just for one thing

This can be considered a waste if you only need to access one specific thing from one specific module. Using __import__ function, you can golf the import and the usage into a single statement. Here’s an example, coming straight from my own project recursely:

  1. __import__('recursely').install()

Incidentally, the other uses of literal __import__ described can be conveniently replaced thanks to that small library :)

Will it lint?

Another issue with import statement is that it introduces symbols into the global (or local) namespace. Most of the time, this is precisely what we want. Occasionally, though, a sole fact of loading the module is enough.

A canonical example of the latter case is web application with request handlers scattered between different Python files, or even packages. All those files have to be imported if the handlers are to be added to framework’s routing table; but beyond that, we have no business with them.

As a result, the import statement(s):

  1. import handlers

introduces an unused symbol – here, it is handlers. Many linting tools will be eager to point this fact out, which is not really that helpful. There is sometimes an option to disable the warning on per line basis, but some checkers (e.g. pep8.py) don’t offer this functionality.
Universal solution? Use __import__ function, of course:

  1. __import__('handlers')

The module is still loaded just fine, but since we’re ignoring the return value, no stray variables are created. As a added bonus, the __import__ call also looks very different, signifying its special purpose.

Staying in place

Actually, there is one more benefit of this trick, also coming from fooling-the-tools department. Many Python IDEs, like Eclipse/Pydev, are able to automatically insert necessary imports and organize them in groups, effectively providing a neat, Java-like experience. What is not so neat is that they often insist on putting every import statement somewhere near the beginning of the file. preceding any other definition, variable, class or function.

In a scenario described above, this behavior may actually cause problems. When the handlers’ module gets imported, it may need to refer back to the application object; this is exactly the case in the Flask framework, for example. If that object happens to be defined in the module importing handlers, we’ll have a circular import error because the application object has not yet been defined. It would have been defined, however, if the statement:

  1. import handlers

hasn’t been touched by the IDE when it wanted to be helpful and organize our imports. All imports, as it turns out.

Fortunately, mechanisms like that tend to be easy to fool. Per answers to StackOverflow question I’ve once asked, it is a matter breaking the textual pattern that the algorithm searches our code for. As you may have guessed by now, one of the ways of achieving that goal is to shed the import statement in favor of __import__ function.

Tags: , ,
Author: Xion, posted under Programming » Comments Off on Literal __imports__

Importy niezupełnie z zagranicy

2011-03-17 23:55

Przeglądając plik źródłowy programu w dowolnym niemal języku, gdzieś bardzo blisko początku znajdziemy zawsze region z importami. Niekoniecznie będą one oznaczone słowem kluczowym import – czasem to będzie using, być może do spółki z #include – ale zawsze będą robiły zasadniczo to samo. Chodzi o poinformowanie kompilatora lub interpretera, że w tym pliku z kodem używamy takich-a-takich funkcji/klas/itp. z takich-a-takich modułów/pakietów. Dzięki temu “obce” nazwy użyte w dalszej części będą mogły być połączone z symbolami zdefiniowanymi gdzie indziej.

Każdy import w jakiś sposób rozszerza więc przestrzeń nazw danego modułu i zazwyczaj wszystko jest w porządku, dopóki dokładnie wiemy, jak to robi. Dlatego też powszechnie niezalecane są “dzikie” importy (wild imports), które nie wyliczają jawnie wszystkich dodawanych nazw, zwykle ukrywając je za gwiazdką (*). Ale nawet jeśli ich nie używamy, to nie oznacza to, że żadne problemy z importowanymi nazwami nas nie spotkają. Oto kilka innych potencjalnych źródeł kłopotów:

  • Importowanie nazwy, która jest identyczna z jednym z symboli zdefiniowanych w tym samym pakiecie. Import będzie wtedy ją przesłaniał – ale oczywiście tylko wtedy, jeśli go rzeczywiście dodamy. A to nie jest takie pewne zwłaszcza w języku interpretowanym, gdzie nawet symbol o zupełnie innej semantyce może łatwo przejść niezauważony aż do momentu uruchomienia kodu z jego błędnym wystąpieniem. Możemy więc nieświadomie używać nazwy lokalnej tam, gdzie należałoby raczej zaimportować zewnętrzną – lub odwrotnie.
  • Zaimportowane nazwy mogą być kwalifikowane lub nie, zależnie od języka i/lub sposobu importowania. I tak chociażby fraza import foo.bar.baz; wprowadza do przestrzeni modułu nazwę baz (czyli niekwalifikowaną) w przypadku Javy. W przypadku Pythona ten sam efekt wymaga z kolei instrukcji from foo.bar import baz, a zwykła instrukcja import da nam jedynie kwalifikowaną nazwę foo.bar.baz – która z kolei w Javie i C# jest dostępna bez żadnych importów, a w C++ po dodaniu dyrektywy #include… Całkiem intuicyjne, czyż nie? ;-) Skoro tak, to dodajmy do tego jeszcze fakt, iż…
  • Nazwy importowane można w większości języków aliasować. Oznacza to, że wynikowa nazwa wprowadzona do przestrzeni może być zupełnie inna niż ta, która jest zdefiniowana w źródłowym module. Aliasowania używa się najczęściej do skracania prefiksów nazw kwalifikowanych i choć ułatwiają one ich wpisywanie, to w rezultacie ogólna czytelność kodu może się pogorszyć.
  • W wielu językach importy mają zasięg. Przekłada się on potem na zasięg zaimportowanych nazw, którego ograniczenie jest często dobrym pomysłem. Sam import występujący poza początkiem pliku może jednak mieć niepożądane efekty – jak choćby załadowanie do pamięci importowanego modułu dopiero w momencie jego pierwszego użycia, co może wiązać się z potencjalnie dużym, chwilowym spadkiem wydajności.
  • Jeśli nie jesteśmy uważni, możemy rozpropagować zaimportowane nazwy do kolejnych modułów, które bynajmniej wcale z nich nie korzystają. Chyba najbardziej znanym przykładem jest wstawianie deklaracji using w plikach nagłówkowych C++, ale to nie jedyny przypadek i nie jedyny język, w którym importy z jednego modułu mogą zaśmiecić przestrzeń nazw innego.

Podsumowując, importy – chociaż często zarządzane prawie całkowicie przez IDE – to w sumie dość poważna sprawa i warto zwrócić na nie uwagę przynajmniej od czasu do czasu.

Tags: , , , , , , ,
Author: Xion, posted under Programming » Comments Off on Importy niezupełnie z zagranicy
 


© 2017 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.