Lately, I was re-evaluating Google App Engine – the cloud computing platform – to see how feasible it would be for one pet project I’ve had in mind. It was pleasantly surprising overall, as the platform improved quite a lot while I wasn’t looking, since about a year and a half ago. Mostly interested in the Python part, I noticed that version 2.7 is now standard, lots of libraries are available out of the box, and it’s possible to use to pretty much any web framework you’d like to, such as Flask or Django.
Still, there are some quirks. App Engine SDK, for example, is a self-contained bundle with a bunch of Python packages that make it possible to run the development server with your app on your local machine. You don’t really “install” it into your Python interpreter, though.
Same goes for any additional, third party libraries your app may need. They must all be deployed along with it, as there is no setup.py or requirements.txt to specify your dependencies in. If you’re used to how e.g. Heroku handles dependencies, GAE’s way will undoubtedly be quite a letdown.
Good news are: you can still make it work sanely. By that I mean using virtualenv for development rather than your global, system-level interpreter, and keeping the code of any third party libraries out of your project’s repository. You may not get quite the same experience of pip install
and pip freeze > requirements.txt
but well… it’s close enough :)
So you have an application that requires some external libraries. Few of them are provided by App Engine itself, and you will be able to access them after you specify your requirement in app.yaml. Many times, however, you will want to tap into broader open source ecosystem, just like you’d like with any other Python app.
There is a way, fortunately, to include external libraries to go with your application without them cluttering your repository. Since the de facto standard for publishing code on the ‘net is to push it to a public Git repository, we can use Git submodules to “symlink” to those repositories. Our own Git repo won’t store any of their actual content, but only a list of URLs; the .gitmodules file.
If you held your breath at the mere mention of Git submodules, don’t panic. They get a lot of flak, that’s true, and many of their claimed shortcomings are quite genuine. All of them apply to the scenario where a main repo uses submodules to reuse shared subproject that is modified in conjunction with the main one.
As you have probably noticed, this is totally different than the setting we’re discussing here. When including an external dependency, the fact that Git submodule points to specific commit in the other repo is a feature, not bug. It’s the exact same reason why we should always put version numbers in requirements.txt: upgrading a third party library must never be accidental, or you risk breaking your code through unexpected API or behavior changes.
So, how to do it – use Git submodules, that is? You substitute pip install
with git submodule add
:
This will establish reference between the repo under given URL and a directory path inside your project, fetching the repo’s content in the process. But as you will quickly notice in $ git status
, that content won’t become part of the working directory.
After all this talk about being explicit with your libraries’ version, you probably also want to checkout a correct release:
Otherwise, you will work off whatever the current HEAD happened to be, exactly how pip install flask
would give you whatever is the newest release in PyPI.
Working alone from a single machine, this would set you up for the time being. For starting somewhere else, though, you need equivalent of pip install -r requirements.txt
, i.e. a way to fetch all your libraries at once. Here’s where git submodule update
comes handy:
It will both setup your freshly cloned repo to use submodules specified in .gitmodules files, as well as pull the submodules’ content.
There’s much more to Git submodules, of course, so if you want to gain much more thorough insight into them than this short overview, I recommend having a look at the Git book. And as with most things, $ man git submodule
is always helpful.
With dependencies seemingly in place, you might be quite disappointed trying to, you know, use them:
The reason for that is simple, though: the libraries are physically there on your disk, but they are not in your virtualenv’s $PYTHONPATH
, so Python has no idea where to import them from. There are ways to solve this problem that I could ramble for a while about, but I will just go ahead and demonstrate a ready-made shell script which handles it all :)
You might need to tweak it, e.g. if your GAE SDK installation path is different than /opt/google_appengine, but otherwise it should be pretty straightforward. One caveat, though: the script should be re-run after adding a brand new library, as described in previous section:
As an added bonus, you will get dev_appserver
and appcfg
binaries inside your virtualenv’s ./bin
, so you may remove App Engine’s SDK directory from your regular $PATH
.
Setup of a local development environment generally ends here – you should be now ready to run your app through dev_appserver
. What’s still missing is making your bundled libraries work with remote Python on actual App Engine instance. Sadly, there is no virtualenv in the cloud.
Instead, we need to revert to the glorified sys.path
hacks. Before importing anything, we extend the actual PYTHONPATH so that it covers our third party libraries. If their directory layout is just like shown in the first section (lib/ root with subdirs for different libraries), the following shim will suffice to correctly bootstrap the import mechanics:
Place this in the root of your project’s source tree (outside the main Python package) and point the app.yaml to it:
With this, you may now deploy your app and see whether it works correctly. If you encounter problems, I recommend taking a look at Flask on App Engine Project Template. Even if you intend to use different web framework, the example code should be largely applicable.
If you haven’t heard about it, DreamPie is an awesome GUI application layered on top of standard Python shell. I use it for elaborate prototyping where its multi-line input box is a significant advance over raw, terminal UX of IPython.
However, up until recently I didn’t know how to make DreamPie cooperate with virtualenv. Because it’s a GUI program, I scoured its menu and all the preference windows, searching for any trace of option that would allow me to set the Python executable. Having failed, I was convinced that authors didn’t think about including it – which was rather surprising, though.
But hey, DreamPie is open source! So I went to look around its code to see whether I can easily enhance it with an ability to specify Python binary. It wasn’t too long before I stumbled into this vital fragment:
The conclusions we could draw from this anecdote are thereby as follows:
With this newfound knowledge about dreampie arguments, it wasn’t very hard to make it use current virtualenv:
And after doing some more research, I ended up adding the following line to my ~/.bash_aliases:
Now I can simply type dp to get a DreamPie instance operating within current virtualenv but independent from terminal session. Very useful!
One of technological marvels behind modern languages is the easiness of installing new libraries, packages and modules. Thanks to having a central repository (PyPI, RubyGems, Hackage, …) and a suitable installer (pip/easy_install, gem, cabal, …), any library is usually just one command away. For one, this makes it very easy to bootstrap development of a new project – or alternatively, to abandon the idea of doing so because there is already something that does what you need :)
But being generous with external libraries also means adding a lot of dependencies. After a short while, they become practically untraceable, unless we keep an up-to-date list. In Python, for example, it would be the content of requirements.txt file, or a value for requires
parameter of the setuptools.setup
/distutils.setup
function call inside setup.py
module. Other languages have their own means of specifying dependencies but the principles are generally the same.
How to ensure this list is correct, though?… The best way is to create a dedicated virtual environment specifically for our project. An environment is simply a sandboxed interpreter/compiler, along with all the packages that it can use for executing/compiling) programs.
Normally, there is just one, global environment for a system as a whole: all external libraries or packages for a particular language are being installed there. This makes it easy to accidentally introduce extraneous dependencies to our project. More importantly, with this setting we are sharing our required libraries with other applications installed or developed on the system. This spells trouble if we’re relying on particular version of a library: some other program could update it and suddenly break our application this way.
If we use a virtual environment instead, our program is isolated from the rest and is using its own, dedicated set of libraries and packages. Besides preventing conflicts, this also has an added benefit of keeping our dependency list up to date. If we use an API which isn’t present in our virtual environment, the program will simply blow up – hopefully with a helpful error :) Should this happen, we need to make proper amends to the list, and use it to update the environment by reinstalling our project into it. As a bonus – though in practice that’s the main treat – deploying our program to another machine is as trivial as repeating this last step, preferably also in a dedicated virtual environment created there.
So, how to use all this goodness? It heavily depends on what programming language are we actually using. The idea of virtual environments (or at least this very term) comes from Python, where it coalesced into the virtualenv package. For Ruby, there is a pretty much exact equivalent in the form of Ruby Version Manager (rvm). Haskell has somewhat less developed cabal-dev utility, which should nevertheless suffice for most purposes.
More exotic languages might have their own tools for that. In that case, searching for “language virtualenv” is almost certain way to find them.