Posts tagged ‘Google App Engine’

Making App Engine More Pythonic

2013-10-18 22:27

Lately, I was re-evaluating Google App Engine – the cloud computing platform – to see how feasible it would be for one pet project I’ve had in mind. It was pleasantly surprising overall, as the platform improved quite a lot while I wasn’t looking, since about a year and a half ago. Mostly interested in the Python part, I noticed that version 2.7 is now standard, lots of libraries are available out of the box, and it’s possible to use to pretty much any web framework you’d like to, such as Flask or Django.

Still, there are some quirks. App Engine SDK, for example, is a self-contained bundle with a bunch of Python packages that make it possible to run the development server with your app on your local machine. You don’t really “install” it into your Python interpreter, though.
Same goes for any additional, third party libraries your app may need. They must all be deployed along with it, as there is no setup.py or requirements.txt to specify your dependencies in. If you’re used to how e.g. Heroku handles dependencies, GAE’s way will undoubtedly be quite a letdown.

Good news are: you can still make it work sanely. By that I mean using virtualenv for development rather than your global, system-level interpreter, and keeping the code of any third party libraries out of your project’s repository. You may not get quite the same experience of pip install and pip freeze > requirements.txt but well… it’s close enough :)

Dependencies as Git submodules

So you have an application that requires some external libraries. Few of them are provided by App Engine itself, and you will be able to access them after you specify your requirement in app.yaml. Many times, however, you will want to tap into broader open source ecosystem, just like you’d like with any other Python app.

There is a way, fortunately, to include external libraries to go with your application without them cluttering your repository. Since the de facto standard for publishing code on the ‘net is to push it to a public Git repository, we can use Git submodules to “symlink” to those repositories. Our own Git repo won’t store any of their actual content, but only a list of URLs; the .gitmodules file.

If you held your breath at the mere mention of Git submodules, don’t panic. They get a lot of flak, that’s true, and many of their claimed shortcomings are quite genuine. All of them apply to the scenario where a main repo uses submodules to reuse shared subproject that is modified in conjunction with the main one.
As you have probably noticed, this is totally different than the setting we’re discussing here. When including an external dependency, the fact that Git submodule points to specific commit in the other repo is a feature, not bug. It’s the exact same reason why we should always put version numbers in requirements.txt: upgrading a third party library must never be accidental, or you risk breaking your code through unexpected API or behavior changes.

So, how to do it – use Git submodules, that is? You substitute pip install with git submodule add:

  1. $ git submodule add git://github.com/mitsuhiko/flask lib/flask

This will establish reference between the repo under given URL and a directory path inside your project, fetching the repo’s content in the process. But as you will quickly notice in $ git status, that content won’t become part of the working directory.
After all this talk about being explicit with your libraries’ version, you probably also want to checkout a correct release:

  1. $ cd lib/flask
  2. $ git checkout 0.10.1

Otherwise, you will work off whatever the current HEAD happened to be, exactly how pip install flask would give you whatever is the newest release in PyPI.

Working alone from a single machine, this would set you up for the time being. For starting somewhere else, though, you need equivalent of pip install -r requirements.txt, i.e. a way to fetch all your libraries at once. Here’s where git submodule update comes handy:

  1. $ git submodule update --init

It will both setup your freshly cloned repo to use submodules specified in .gitmodules files, as well as pull the submodules’ content.

There’s much more to Git submodules, of course, so if you want to gain much more thorough insight into them than this short overview, I recommend having a look at the Git book. And as with most things, $ man git submodule is always helpful.

virtualenv for it all

With dependencies seemingly in place, you might be quite disappointed trying to, you know, use them:

  1. $ workon myawesomeproject
  2. (myawesomeproject)$ python
  3. Python 2.7.4 (default, Sep 26 2013, 03:20:26)
  4. [GCC 4.7.3] on linux2
  5. Type "help", "copyright", "credits" or "license" for more information.
  6. >>> import flask
  7. Traceback (most recent call last):
  8.   File "<stdin>", line 1, in <module>
  9. ImportError: No module named flask

The reason for that is simple, though: the libraries are physically there on your disk, but they are not in your virtualenv’s $PYTHONPATH, so Python has no idea where to import them from. There are ways to solve this problem that I could ramble for a while about, but I will just go ahead and demonstrate a ready-made shell script which handles it all :)
You might need to tweak it, e.g. if your GAE SDK installation path is different than /opt/google_appengine, but otherwise it should be pretty straightforward. One caveat, though: the script should be re-run after adding a brand new library, as described in previous section:

  1. $ git submodule add ...
  2. $ ./setup_virtualenv.sh

As an added bonus, you will get dev_appserver and appcfg binaries inside your virtualenv’s ./bin, so you may remove App Engine’s SDK directory from your regular $PATH.

Deployment shenanigans

Setup of a local development environment generally ends here – you should be now ready to run your app through dev_appserver. What’s still missing is making your bundled libraries work with remote Python on actual App Engine instance. Sadly, there is no virtualenv in the cloud.

Instead, we need to revert to the glorified sys.path hacks. Before importing anything, we extend the actual PYTHONPATH so that it covers our third party libraries. If their directory layout is just like shown in the first section (lib/ root with subdirs for different libraries), the following shim will suffice to correctly bootstrap the import mechanics:

  1. # run.py
  2. import os
  3. import sys
  4.  
  5.  
  6. lib_dir = os.path.join(os.path.abspath('.'), 'lib')
  7. sys.path[1:1] = [os.path.join(lib_dir, name)
  8.                  for name in os.listdir(lib_dir)]
  9.  
  10. app = __import__('myapppackage').app

Place this in the root of your project’s source tree (outside the main Python package) and point the app.yaml to it:

  1. handlers:
  2. # ... other handlers
  3. - url: .*
  4.   script: run.app

With this, you may now deploy your app and see whether it works correctly. If you encounter problems, I recommend taking a look at Flask on App Engine Project Template. Even if you intend to use different web framework, the example code should be largely applicable.

Tags: , , , ,
Author: Xion, posted under Programming » Comments Off on Making App Engine More Pythonic

On Clever Usage of Python ‘with’ Clauses

2012-01-21 20:21

As the Python docs state, with statement is used to encapsulate common patterns of tryexceptfinally constructs. It is most often associated with managing external resources in order to ensure that they are properly freed, released, closed, disconnected from, or otherwise cleaned up. While at times it is sufficient to just write the finally block directly, repeated occurrences ask for using this language goodness more consciously, including writing our own context managers for specialized needs.

Those managers – basically a with-enabled, helper objects – are strikingly similar to small local objects involved in the RAII technique from C++. The acronym expands to Resource Acquisition Is Initialization, further emphasizing the resource management part of this pattern. But let us not be constrained by that notion: the usage space of with is much wider.

Synchronization Through Memcache

2011-12-10 18:11

In web and server-side development, a memcache is a remote service that keeps transient data in temporary, in-memory store and allows for fast access. It is usually based on keys which identify values being stored and retrieved. Due to storing technique – RAM rather than actual disks – it offers great speed (often less than few milliseconds per operation) while introducing a possibility for values to be evicted (deleted) if memcache runs out of space. Should that happen, oldest values are usually disposed of first.

This functionality makes memcache a good secondary storage that complements the usual persistent database, increasing the efficiency of the system as a whole. The usual scenario is to poll memcache first, and resort to querying the database only if the result cannot be found in cache. Once the query is made, its results are memcached for some reasonable amount time (usually called TTL: time to live), depending on allowed trade-offs between speed and being up-to-date with our results.

While making applications more responsive is the primary use case for memcache, it turns out that we can also utilize it for something completely different. That’s because memcache implementations offer something more besides the obvious get/set commands. They also have operations which make it possible to use memcache as synchronization facility.

Wąskie szyjki od butelek

2011-07-21 21:51

Kilka dni temu zajmowałem się optymalizacją szybkości aplikacji, która działa na platformie Google App Engine. Żeby dobrze podejść do tego zadania, zacząłem oczywiście od poszerzenia swojej wiedzy na temat zagadnienia, co obejmowało chociażby obejrzenie odpowiedniej prezentacji z zeszłorocznego Google I/O. Przedstawiono tam odpowiednie narzędzie służące do profilowania (Appstats), czyli niezbędnego etapu przygotowawczego (o czym mam nadzieję wszyscy pamiętają ;]). Dodatkowo dowiedziałem się też o najważniejszych źródłach opóźnień w tego rodzaju aplikacjach – i to okazało się dość zaskakujące.

Ogólny wniosek, jaki zdołałem stamtąd wyciągnąć, dotyczy właśnie prawidłowego wykrywania wąskich gardeł (zwanych bottlenecks – tytułowe szyjki od butelek). W przypadku wspomnianej aplikacji okazało się na przykład, że nawet stosunkowo skomplikowana logika (przetwarzanie sporych porcji danych w JSON-ie, renderowanie długich szablonów HTML, itd.) napisana całkowicie w wysokiego poziomu języku interpretowanym (Python) relatywnie zajmuje dosłownie chwilę. Nieporównywalnie dłuższy czas związany jest z dowolnym wywołaniem API platformy, bo każde z nich wymaga sieciowej wycieczki do osobnego serwera, który może je obsłużyć. To zaś sprawia, że wszelkie próby optymalizacji samych algorytmów (nawet potworków rzędu O(n^3) czy O(n^4)) nie mają sensu, jeśli nie wiąże się to ze zmniejszeniem ilości wspomnianych wywołań, czyli np. dostępu do magazynu danych.

Podobne pozornie zaskakujące wnioski można wyciągnąć, gdy przyjrzymy się wewnętrznej strukturze innych platform. Bardzo dobrym przykładem jest chociażby diagram opóźnień I/O w komputerach PC. Łatwo na nim zauważyć, w których miejscach należy jak najbardziej ograniczyć ruch danych, aby uzyskać odpowiedni wzrost wydajności. Przy takim spojrzeniu na aplikacje, podejście znane jako DOD (Data Oriented Design) nabiera też nieco więcej sensu ;-)

Tags: , , ,
Author: Xion, posted under Programming, Thoughts » 1 comment

Prezentacja nt. Google App Engine

2011-07-06 20:18

W zeszły piątek wziąłem udział w kolejnym evencie organizowanym przez warszawski oddział polskiej Grupy Użytkowników Technologii (w skrócie GTUG). Nie było to ponadto uczestnictwo pasywne. Wspólnie z kolegą z pracy wygłosiłem tam bowiem półtoragodzinną prezentację na temat technologii Google App Engine, czyli platformy do tworzenia aplikacji sieciowych, działającej w oparciu o ideę cloud computing.
Nosiła ona jakże pogodny tytuł Rozchmurz się! i przedstawiała całą platformę właściwie od podstaw, ale jednocześnie w jak najszerszym przekroju. Dodatkowo nie obyło się oczywiście bez kilku odniesień do osobistych doświadczeń z jej użytkowania. Ogólnie wydaje mi się, że mimo wyraźnie wieczorowej pory udało nam się nie uśpić publiczności całkowicie ;-)

Zamieszczam poniżej slajdy z prezentacji. Na stronie GTUG można natomiast obejrzeć galerię zdjęć z całego wydarzenia. Na dniach powinien również pojawić się filmik.

File: Rozchmurz sie! -- Cloud computing z Google App Engine [PL]  Rozchmurz sie! -- Cloud computing z Google App Engine [PL] (805.8 KiB, 2,254 downloads)

 


© 2023 Karol Kuczmarski "Xion". Layout by Urszulka. Powered by WordPress with QuickLaTeX.com.