subscribe to this blog

Logilab.org - en

News from Logilab and our Free Software projects, as well as on topics dear to our hearts (Python, Debian, Linux, the semantic web, scientific computing...)

show 207 results
  • Munin Plugins for Zope

    2008/07/01 by Arthur Lutz
    http://munin-monitoring.org/site/munin.png

    Here at Logilab we find Munin pretty useful. We monitor a lots of machines and a lot of services with it, and it usually gives us pretty useful indicators over time that guide us through to optimizations.

    One of the reasons we adopted this technology is it's modular approach with the plugin architecture. And when we realized we could write plugins in python, we knew we'd like it. After years of using it, we're now actually writing plugins for it. Optimizing zope and zeo servers is not an easy task so we're developping plugins to be able to see the difference between before and after changing things.

    You check out the project here, and download it from the ftp.


  • apycot 0.12.1 released

    2008/06/24 by Arthur Lutz

    After one month of internship at logilab, I'm pleased to announce the 0.12.1 release of apycot.

    for more information read the apycot 0.12.1 release note

    You can also check the new sample configuration.

    Pierre-Yves David


  • Instrumentation of google appengine's datastore.

    2008/06/23 by Sylvain Thenault

    Here is a piece of code I've written which I thought may be useful to some other people...

    You'll find here a simple python module to use with the Google AppEngine SDK to monkey patch the datastore API in order to get an idea of the calls performed by your application.

    To instrument of the datastore, put at the top level of your handler file

    import instrdatastore
    

    Note that it's important to have this before any other import in your application or in the google package to avoid that some modules will use the unpatched version of datastore functions (and hence calls to those functions wouldn't be considered).

    Then add at the end of your handler function

    instrdatastore.print_info()
    

    The handler file should look like this:

    """my handler file with datastore instrumenting activated"""
    import instrdatastore
    
    # ... other initialization code
    
    # main function so this handler module is cached
    def main():
      from wsgiref.handlers import CGIHandler
      from ginco.wsgi.handler import ErudiWSGIApplication
      application = ErudiWSGIApplication(config, vreg=vreg)
      CGIHandler().run(application)
      instrdatastore.print_info()
    
    if __name__ == "__main__":
      main()
    

    Now you should see in your logs the number of Get/Put/Delete/Query which has been done during request processing

    2008-06-23 06:59:12 - (root) WARNING: datastore access information
    2008-06-23 06:59:12 - (root) WARNING: nb Get: 2
    2008-06-23 06:59:12 - (root) WARNING: arguments (args, kwargs):
    ((datastore_types.Key.from_path('EGroup', u'key_users', _app=u'winecellar'),), {})
    ((datastore_types.Key.from_path('EUser', u'key_test@example.com', _app=u'winecellar'),), {})
    2008-06-23 06:59:12 - (root) WARNING: nb Query: 1
    2008-06-23 06:59:12 - (root) WARNING: arguments (args, kwargs):
    (({'for_user =': None}, 'EProperty'), {})
    2008-06-23 06:59:58 - (root) WARNING: nb Put: 1
    2008-06-23 06:59:58 - (root) WARNING: arguments (args, kwargs):
    (({u'login': None, u'last_usage_time': 1214204398.2022741, u'data': ""},), {})
    

    I'll probably extend this as the time goes. Also notice you may encounter some problems with the automatic reloading feature of the dev app server when instrumentation is activated, in which case you should simply restart the web server.


  • First version of LAX Book

    2008/06/16 by Arthur Lutz

    Previous documentation was merged into a LAX Book now featuring step-by-step screenshots to get up and running faster.

    http://lax.logilab.org/lax-book

    Don't we all like screenshots...

    http://lax.logilab.org/images/lax-book.08-schema.en.png

    Update: LAX is now included in the CubicWeb semantic web framework.


  • Implementing scalable applications with AppEngine

    2008/06/11 by Nicolas Chauvat
    http://code.google.com/events/images/io_logo_lg.png

    At Google IO, a large part of the Tools track was dedicated to AppEngine. Brett Slatkin gave a talk titled Building scalable Web Applications with Google AppEngine which focused on optimizing the server part of web apps. As other presenters demonstrated it, like Steve Souders in his talk Even Faster Websites, optimizing the browser part of webapps is not to be neglected either.

    Webscale applications require man-made optimisation

    First of all, I must confess I am used to repeat that "early optimisation is the root of all evil" and "delay commitment until the last responsible time". But reading about AppEngine and listening to the Google IO talks, it appears that the tools we have today ask for human intervention to reach web-scale performance, even when "we" stands for "Google".

    In order for web-scale applications to handle the kind of load they are facing, they must be designed and implemented carefully. As carefully as any application was designed before the exponential growth of PC computation power let us move away from low-level implementation details and made some inefficiencies acceptable as long as the time spent developing was short enough.

    It all depends on the parameters of your cost function, but for web-scale applications, it seems like we have not enough computer-time and can not trade it for human-time.

    Writes are more expensive than reads

    To get a better idea of the work constraints, one should know that a disk seek is about 10ms, which means there will be a maximum of 100 accesses per second. On the other hand, if we need consistent data as opposed to transactional data (the latter implying that data is fetched each time it is asked for), data can be read from disk once then cached. Following reads are done from memory at a rate of about 4GB/sec, which means 4000 accesses per second if entities are around 1MB in size. Result of this back of the envelope approximation is 40 reads equals one write.

    It follows that, although the actual time depends on the size and shape of data, writes are very expensive compared to reads and both are better done in batches to optimise disk access.

    Entity groups in AppEngine

    http://code.google.com/appengine/images/noassembly.gif

    The AppEngine Datastore was designed with this constraints in mind. Entities are sets of property name/value pairs. Each entity may have a parent. An entity without a parent is the root of a hierarchy called an entity group.

    Entities of the same group are stored on disk close to each other, but two distinct entity groups may be stored on different computers. Read access to entities of the same group is thus faster than read access to entities of different groups.

    Write access is serialized per entity group. As opposed to a traditionnal RDBMS that provides row locking, the datastore only provides entity group locking. Writes to the a single entity group will always happen in sequence, even though changes concern different entities.

    There is no limit to the number of entity groups or to the number of entities per group, but because of the locking strategy, large entity groups will cause high contention and a lot of failed transactions. Since writes are expensive, not thinking about write throughput is a very bad idea when designing an AppEngine application if one want it to scale.

    On the other hand, the parallel nature of the datastore make it scale wide and there is no limit to the number of entity groups that can be written to in parallel, nor to the number of reads that can be done in parallel.

    To understand this design in details, you will have to read about GFS, BigTable and other technologies developed by Google to implement large-scale clustering.

    Example of counters

    http://code.google.com/apis/gears/resources/database.gif

    Counters are a good example to address when discussing write throughput, because the datastore locking strategy makes writing to global data very expensive.

    Let us assume that we want to display on the main page of a wiki application the total number of comments posted.

    A global counter would serialize all its updates. If 100 users were to add comments at the same time, some of them would have to wait several seconds for their action to complete: one write for the comment, one write for the counter, at most 100 writes per second for the counter and a lot of time lost due to failed transaction that need to be restarted.

    The solution to make the counter scale is to partition it among all entity groups then sum these partial counters when the global value is needed.

    Since chances are low that a given user will write more than one comment at a time, comment entities for a user can be grouped together and a partial counter can be added to the same entity group. Creating a new comment and increasing the partial counter will be done in the same batch.

    When a new request for the main page is received, the counter total is looked up in the cache. If it is not found, all partial counters are fetched and summed up, then the cache is refreshed with a short timeout, for example one minute.

    During the next minute, the counter will be "consistent", read no too far-off, and served extremely fast from the cache.

    Prevent repeated or unneeded work

    http://code.google.com/apis/gears/resources/localserver.gif

    To sum things up, when implementing applications on top of AppEngine with web-scale usage as a goal, everything that can be done to save time should be considered. Including the following:

    • importing python modules as late as possible will minimize the python runtime overhead
    • retrieving data that is not going to be used is a waste
    • repeated queries and queries returning large result sets must be avoided
    • when Get() if sufficient, do not spend time on Query()
    • landing pages are traffic intensive and would better use the same query for everyone
    • entity groups have to be designed to match the load and aim at low write contention
    • caching must be used aggressively (it is no surprise that memcache was the first improvement that followed within a month of the AppEngine public release)

    Conclusion

    As a conclusion, the interface AppEngine is exhibiting today requires to optimize early, but I would bet that in the years to come, new languages and domain-specific compilers or database engines will take part of that burden off the hands of the developers.

    Did not Yahoo and Google start developping PigLatin and Sawzall to make it easier to write parallel data-processing programs ? The same could happen with describe a data-model in a high-level language and get a tool to optimize it for write contention and web-scale application.

    See Also

    http://www.logilab.fr/images/lax.png

    LAX (Logilab App engine eXtension) is a full-featured web application framework running on Google AppEngine developed by Logilab.


  • Google App Engine future directions

    2008/06/09 by Nicolas Chauvat
    http://code.google.com/appengine/images/appengine_lowres.jpg

    Several of us went to San Francisco last week to attend Google IO. As usual with conferences, meeting people was more interesting than listening to most talks. The AppEngine Fireside Chat was a Q&A session that lasted about an hour. Here is what I learned from this session and various chats with AppEngineers.

    1. Google has decided to provide its scalable datastore architecture as a service. At this point, the datastore is the product and the goal it to make it as widely accessible as possible.
    2. The google.appengine.api.datastore API alone would not have made for a very sexy launch. In order to attract more people and lower the bar the beginners would have to jump over, they looked for a higher level programming interface.
    3. Since some people working at Google have been using Django and know it, they reimplemented part of its interface for defining data models. Late in the project, they added GQL because Django-like queries were a bit too difficult. In both case, the goal was to make it easier for external developers to get started.
    4. But Google is not in the business of providing web application frameworks and AppEngineers made explicit that they would not be officially supporting a specific framework or a specific version of a given framework (not even Django 0.96, although there is a django-appengine-helper project on code.google.com). They expect frameworks to be provided by communities of developers.

    My conclusion is twofold:

    • They will be focusing on supporting other languages in AppEngine (I would bet on Java being the next one available) rather than extending Python frameworks support.
    • Anyone is free to join with his own framework and provide support for it, the One True Interface being the one defined by google.appengine.api.datastore, not the one defined by db.model and GQL.

    This is why Logilab published its own framework running on App Engine as free software and is providing support for it: Logilab Appengine eXtension.


  • LAX - Logilab Appengine eXtension is a full-featured web application framework running on AppEngine

    2008/06/09 by Arthur Lutz
    http://code.google.com/appengine/images/appengine_lowres.jpg

    LAX version 0.3.0 was released today, see http://lax.logilab.org/

    Get a new application running in ten minutes with the install guide and the tutorial:

    Enjoy!

    Update: LAX is now included in the CubicWeb semantic web framework.


  • Browsers strangeness ...

    2008/06/07 by Adrien Di Mascio

    ... or when inverting two lines of code in your HTML's HEAD can speed up your web page rendering !

    If you have the following HTML page:

    <html>
      <head>
        <link rel="stylesheet" type="text/css" href="http://yourdomain.com/css1.css" />
        <script type="text/javascript">
          var somearray = [1, 2, 3];
        </script>
        <link rel="stylesheet" type="text/css" href="http://yourdomain.com/css2.css" />
      </head>
      <body>
        <h1>Hello</h1>
      </body>
    </html>
    

    Firefox3 [1] will download the CSS sequentially, hence if both CSS get 250ms to download, this page will approximatively appear in more or less half a second.

    Now, if you just move the inline script before the two CSS declarations:

    <html>
      <head>
        <script type="text/javascript">
          var somearray = [1, 2, 3];
        </script>
        <link rel="stylesheet" type="text/css" href="http://yourdomain.com/css1.css" />
        <link rel="stylesheet" type="text/css" href="http://yourdomain.com/css2.css" />
      </head>
      <body>
        <h1>Hello</h1>
      </body>
    </html>
    

    The two CSS files are now downloaded in parallel, and your page now take about half time to render !

    One of the lessons here is that optimizing your website's backend is great and necessary, but is a quite long term and hard job. On the other hand, optimizing the frontend is often easier and pays off immediatly (well, so to speak...). Don't forget that in complex and rich web sites, most of the time can be spent on the client side.

    [1] It seems that Firefox 2 doesn't event try to download CSS in parallel.

    Going further

    http://developer.yahoo.com/yslow/help/images/OverallGrade_Size.png

    Of course, this is quite browser-dependant ! It would be simpler if all browsers behaved the same way but fortunately, there is a very nice tool named cuzillion developed by Steve Souders at Google (formerly Chief performance at Yahoo and developer of Yslow, a firebug's extension which is able to point out performance problems of your site). This tool lets you create web pages online by inserting inline scripts, CSS, images, etc. and then test how long the page takes to be rendered in your browser. You can control the order of the inserted elements as well as customize their properties (how long it shoud take to download, choose another domain to download, if a script is defined with a script tag, an XHR, an iframe, etc.)


  • New apycot release

    2008/06/02 by Arthur Lutz
    http://www.logilab.org/image/4878?vid=download&small=true

    After almost 2 years of inactivity, here is a new release of apycot the "Automated Pythonic Code Tester". We use it everyday to maintain our software quality, and we hope this tool can help you as well.

    Admittedly it's not trivial to setup, but once it's running you'll be able to count on it. We're working on getting it to work "out-of-the-box"...

    Here's what's in the ChangeLog :

    2008-05-19 -- 0.11.0
    • updated documentation
    • new pylintrc option for the pyhton_lint checker.
    • Added code to disabled checker with missing required option with the proper ERROR statut
    • removed the catalog option of the xml_valid checker this feature can now be handle with the XML_CATALOG_FILE environement variable (see libxml2 doc for details)
    • moved xml tool from python-xml to lxml
    • new 'hourly' mode for running tests
    • new 'test_activity_report' report
    • pylint checker support new disable_msg and show_categories options (show_categories default to Error and Fatal categories to avoid reports polution)
    • activity option "days" has been renamed to "time" and correspond to a number of day in daily mode but to a number of hour in hourly mode
    • fixed debian_lint and debian_piuparts to actually do something...
    • fixed docutils checker for recent docutils versions
    • dropped python 2.2/2.3 compat (to run apycot itself)
    • added output redirectors to the debian preprocessor to avoid parsing errors
    • can use regular expressions in <pp>_match_* options

  • Flying to Google I/O

    2008/05/27 by Arthur Lutz
    http://code.google.com/images/io_logo_sm.gif http://code.google.com/appengine/images/appengine_lowres.jpg

    Three of us from Logilab are going to San Francisco to listen, share and discuss at Google I/O.

    It's a two day developer gathering in San Francisco, with various talks about google technologies : http://code.google.com/events/io/

    We're hoping to show and talk about LAX (http://lax.logilab.org) which uses Google AppEngine


show 207 results