On-demand loading of Flickr photo metadata

I've been working on converting an ancient laptop ("vivi", a 166 MHz Pentium) into a digital picture frame. I got the hardware working to my satisfaction a couple weeks ago, so now it's time to work on the software. There are a lot of slideshow programs out there, but I couldn't find any that would allow:

  • 1) the ability to display full-screen photos from my Flickr account.
  • 2) the ability to be remotely controlled from another machine in our home network.

So I wrote my own slideshow program, using Pygame for graphics and Twisted for networking. Along the way, I picked up a neat design pattern. By using Python's function decorators, my Photo objects can load their metadata (description, tags, URLs) on-demand without much additional code.

I have over 2000 photos in my Flickr library. Information on a single photo can be queried with the flickr.photos.getInfo() function of the Flickr API. This function returns an XML string that gives the description of the photo, a list of tags that have been added to the photo, a list of URLs where the photo is shown, and so on. In the slideshow program, I'm going to have a list of 2000+ Photo objects. I could load the metadata for all 2000 photos when the slideshow starts up, but that's going to cause a heck of a lot of network traffic. The slideshow will be very slow to start up, and Flickr may get angry with me for pounding their servers. Instead, I'd like the metadata for a Photo to be loaded on-demand: only when the slideshow app actually wants the info.

First, here's the naive approach: when a Photo object is created, we'll call the Flickr API to immediately load the metadata. We'll store the XML data structure as self.info, and add some methods that will return the data when it's asked for.

  1. class Photo(object):
  2. def __init__(self, flickr, id):
  3. self.info = flickr.photos_getInfo(photo_id=id).photo[0]
  4.  
  5. def getDescription(self):
  6. return info.description[0].elementText
  7.  
  8. def getTags(self):
  9. result = []
  10. for tag in self.info.tags[0].tag:
  11. result.append(tag.elementText)
  12. return result
  13.  
  14. def getURLs(self):
  15. result = []
  16. for url in self.info.urls[0].tag:
  17. result.append(url.elementText)
  18. return result

We can then use the code as follows:

  1. >>> p = Photo(...)
  2. >>> p.getDescription()
  3. "Sword fighting in Yoyogi Park."
  4. >>> p.getTags()
  5. ['japan', '2005', 'tokyo', 'yoyogi', 'park', 'september', 'sword', 'fighting', 'kendo']
  6. >>> p.getURLs()
  7. ['http://www.flickr.com/photos/colinmcmillen/491114090/']

That works, but if we create 2000 Photo objects when the slideshow starts up, we'll send 2000 API calls to the Flickr website. That's not very nice of us. It'd be better if we set self.description = None in the constructor; then each of our getter methods can call a loadInfo() function only if self.description is still set to None. loadInfo() makes the Flickr API call and sets all the metadata. This solution looks like this:

  1. class OnDemandPhoto(object):
  2. def __init__(self, flickr, id):
  3. self.flickr = flickr
  4. self.id = id
  5. self.description = None
  6. self.tags = []
  7. self.urls = []
  8.  
  9. def loadInfo(self):
  10. info = self.flickr.photos_getInfo(photo_id=self.id).photo[0]
  11. self.description = info.description[0].elementText
  12. for tag in info.tags[0].tag:
  13. self.tags.append(tag.elementText)
  14. for url in info.urls[0].url:
  15. self.urls.append(url.elementText)
  16.  
  17. def getDescription(self):
  18. if self.description is None:
  19. self.loadInfo()
  20. return self.description
  21.  
  22. def getTags(self):
  23. if self.description is None:
  24. self.loadInfo()
  25. return self.tags
  26.  
  27. def getURLs(self):
  28. if self.description is None:
  29. self.loadInfo()
  30. return self.urls

This code loads the Photo metadata on demand, but the code isn't as clear as it could be. Each of the getter methods has the same check in the first two lines of the function, and it's not immediately obvious what that check is for. Also, I've been glossing over some of the complexities of the Flickr API: some metadata (such as the available picture sizes, geotagging data, etc.) requires calls to additional Flickr API functions. So we will end up with a bunch of on-demand loaders: loadGeoTags(), loadSizes() and a bunch of getters: getGeoTags(), get800x600(), get640x480(), etc. We need to make sure that the right loader is called before each getter computes a value. Putting the appropriate checks to make sure everything is correctly loaded on-demand might start to get hairy.

The general pattern here is that each getter requires an appropriate loader to be called first, but each loader should only ever be called once. So we want to be able to specify two things explicitly:

  • 1) a way to say that the data needed for the getFoo() function is loaded by the loadBar() function.
  • 2) a way to ensure that each of the loadBar() functions is only called once.

To do this, we'll use function decorators. For now, think of a decorator as some "magic", written on the line before a function definition, that modifies the definition of that function. Specifically, a decorator called @loadedBy will specify that the appropriate loader function gets called before each getter function. A decorator called @callOnce will ensure that each loader function is only executed once per Photo instance. Here's what our example looks like with decorators:

  1. class DecoratedPhoto(object):
  2. def __init__(self, flickr, id):
  3. self.flickr = flickr
  4. self.id = id
  5. self.description = None
  6. self.tags = []
  7. self.urls = []
  8.  
  9. @callOnce
  10. def loadInfo(self):
  11. info = self.flickr.photos_getInfo(photo_id=self.id).photo[0]
  12. self.description = info.description[0].elementText
  13. for tag in info.tags[0].tag:
  14. self.tags.append(tag.elementText)
  15. for url in info.urls[0].url:
  16. self.urls.append(url.elementText)
  17.  
  18. @loadedBy(loadInfo)
  19. def getDescription(self):
  20. return self.description
  21.  
  22. @loadedBy(loadInfo)
  23. def getTags(self):
  24. return self.tags
  25.  
  26. @loadedBy(loadInfo)
  27. def getURLs(self):
  28. return self.urls

This code is marginally shorter than the last example, but that's deceiving because I haven't yet shown the implementation of the function decorators. More importantly, the code is well-factored. If I want to add support for another API call (say flickr.photos.getSizes()), I get to re-use my decorators:

  1. @callOnce
  2. def loadSizes(self):
  3. sizes = self.flickr.photos_getSizes(photo_id=self.id).sizes[0].size
  4. for idx, size in enumerate(sizes):
  5. width = int(size.attrib['width'])
  6. height = int(size.attrib['height'])
  7. url = size.attrib['source']
  8. self.sizes.append(Size(width, height, url))
  9.  
  10. @loadedBy(loadSizes)
  11. def getSizes(self):
  12. return self.sizes

Now, how are the decorators implemented? A decorator is a function that takes in a function (and maybe some arguments) and returns a new function that should be used instead of the original function. Usually, the decorator takes the original function and transforms it by adding some functionality. This is the Decorator design pattern. Really, Python decorators are just syntactic sugar:

  1. @decorator
  2. def foo():
  3. doStuff()

is equivalent to:

  1. def foo():
  2. doStuff()
  3. foo = decorator(foo)

The callOnce() decorator takes in a function fn and ensures that fn called only once per object. The calledBy dict keeps track of which objects have previously called the function. We create a new function result(self) that checks whether self is in calledBy. If it is, we return immediately; otherwise we add self to calledBy, then call fn(self):

  1. def callOnce(fn):
  2. calledBy = {}
  3. def result(self):
  4. if self in calledBy:
  5. return
  6. calledBy[self] = True
  7. fn(self)
  8. return result

The loadedBy function is a little more complicated because it takes an argument loader. This indicates that loadedBy returns a decorator (which in turn takes in a function and returns a new function.) The important thing here is the function result, which first calls the loader, then calls the decorated function (such as getDescription()):

  1. def loadedBy(loader):
  2. def decorator(fn):
  3. def result(self, *args, **kwargs):
  4. loader(self)
  5. return fn(self, *args, **kwargs)
  6. return result
  7. return decorator

So that's how I've used decorators in my slideshow program. Maybe in a few days I'll explain how I used Twisted to allow the slideshow to be remotely controlled. Also, let me know if you want the slideshow code. I'll probably do an official release in a couple of weeks.

If you liked this post, you might also want to read about creating robot behaviors with Python generators.

Use the decorator module

You can use the decorator module to simplify things a bit:

  1. from decorator import decorator
  2. def loadedBy(loader):
  3. def result(self, *args, **kwargs):
  4. loader(self)
  5. return fn(self, *args, **kwargs)
  6. return decorator(result)

Re: Use the decorator module

Thanks for the tip! Unfortunately, the decorator module isn't part of the standard library. However, if anyone's interested, take a look at the decorator module documentation.

And to make the code feel less like Java...

... start using property() instead of those ugly getter methods.

Indeed...

Indeed, my actual code looks like this:

  1. @loadedBy(loadInfo)
  2. def getDescription(self):
  3. return self._description
  4. description = property(getDescription)

... but I didn't mention that in the writeup because it doesn't have anything to do with decorators. :)

OnDemandPhoto can be written

OnDemandPhoto can be written differently by overriding __getattr__

I did this recently to cache an RPC call that returned many different fields.

it basically did something like

  1. def _get_remote_info(self):
  2. if self._remote_info:
  3. return self._remote_info
  4. self._remote_info = do_rpc_call_stuff(self.id)
  5. return self._remote_info
  6. remote_info = property(_get_remote_info)
  7. def __getattr__(self, name):
  8. if name in ('foo', 'bar','baz','lots','of','fields'):
  9. return self.remote_info[name]
  10. ...
  11. #alternatively: if name in self.remote_info:
  12. # return self.remote_info[name]

This way, if the remote method was changed to return more values, I only had to change one line, or zero if I used the alternate method. The only downside I see to this is that the autogenerated docs for the class won't include any mention of these fields.

Nice

I like that solution too. Sometimes it is tricky to document a class with __getattr__ well, but a good class-level docstring is probably sufficient.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockcode>
  • Lines and paragraphs break automatically.
  • You may post block code using <blockcode [type="language"]>...</blockcode> tags.
More information about formatting options