The latest release of logilab-astng (0.23), the underlying source code representation library used by PyLint, provides a new API that may change pylint users' life in the near future...

It aims to allow registration of functions that will be called after a module has been parsed. While this sounds dumb, it gives a chance to fix/enhance the understanding PyLint has about your code.

I see this as a major step towards greatly enhanced code analysis, improving the situation where PyLint users know that when running it against code using their favorite framework (who said CubicWeb? :p ), they should expect a bunch of false positives because of black magic in the ORM or in decorators or whatever else. There are also places in the Python standard library where dynamic code can cause false positives in PyLint.

The problem

Let's take a simple example, and see how we can improve things using the new API. The following code:

import hashlib

def hexmd5(value):
    """"return md5 checksum hexadecimal digest of the given value"""
    return hashlib.md5(value).hexdigest()

def hexsha1(value):
    """"return sha1 checksum hexadecimal digest of the given value"""
    return hashlib.sha1(value).hexdigest()

gives the following output when analyzed through pylint:

[syt@somewhere ~]$ pylint -E example.py
No config file found, using default configuration
************* Module smarter_astng
E:  5,11:hexmd5: Module 'hashlib' has no 'md5' member
E:  9,11:hexsha1: Module 'hashlib' has no 'sha1' member

However:

[syt@somewhere ~]$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import smarter_astng
>>> smarter_astng.hexmd5('hop')
'5f67b2845b51a17a7751f0d7fd460e70'
>>> smarter_astng.hexsha1('hop')
'cffb6b20e0eef296772f6c1457cdde0049bdfb56'

The code runs fine... Why does pylint bother me then? If we take a look at the hashlib module, we see that there are no sha1 or md5 defined in there. They are defined dynamically according to Openssl library availability in order to use the fastest available implementation, using code like:

for __func_name in __always_supported:
    # try them all, some may not work due to the OpenSSL
    # version not supporting that algorithm.
    try:
        globals()[__func_name] = __get_hash(__func_name)
    except ValueError:
        import logging
        logging.exception('code for hash %s was not found.', __func_name)

Honestly I don't blame PyLint for not understanding this kind of magic. The situation on this particular case could be improved, but that's some tedious work, and there will always be "similar but different" case that won't be understood.

The solution

The good news is that thanks to the new astng callback, I can help it be smarter! See the code below:

from logilab.astng import MANAGER, scoped_nodes

def hashlib_transform(module):
    if module.name == 'hashlib':
        for hashfunc in ('sha1', 'md5'):
            module.locals[hashfunc] = [scoped_nodes.Class(hashfunc, None)]

def register(linter):
    """called when loaded by pylint --load-plugins, register our tranformation
    function here
    """
    MANAGER.register_transformer(hashlib_transform)

What's in there?

  • A function that will be called with each astng module built during a pylint execution, i.e. not only the one that you analyses, but also those accessed for type inference.
  • This transformation function is fairly simple: if the module is the 'hashlib' module, it will insert into its locals dictionary a fake class node for each desired name.
  • It is registered using the register_transformer method of astng's MANAGER (the central access point to built syntax tree). This is done in the pylint plugin API register callback function (called when module is imported using 'pylint --load-plugins'.

Now let's try it! Suppose I stored the above code in a 'astng_hashlib.py' module in my PYTHONPATH, I can now run pylint with the plugin activated:

[syt@somewhere ~]$ pylint -E --load-plugins astng_hashlib example.py
No config file found, using default configuration
************* Module smarter_astng
E:  5,11:hexmd5: Instance of 'md5' has no 'hexdigest' member
E:  9,11:hexsha1: Instance of 'sha1' has no 'hexdigest' member

Huum. We have now a different error :( Pylint grasp there are some md5 and sha1 classes but it complains they don't have a hexdigest method. Indeed, we didn't give a clue about that.

We could continue on and on to give it a full representation of hashlib public API using the astng nodes API. But that would be painful, trust me. Or we could do something clever using some higher level astng API:

from logilab.astng import MANAGER
from logilab.astng.builder import ASTNGBuilder

def hashlib_transform(module):
    if module.name == 'hashlib':
    fake = ASTNGBuilder(MANAGER).string_build('''

class md5(object):
  def __init__(self, value): pass
  def hexdigest(self):
    return u''

class sha1(object):
  def __init__(self, value): pass
  def hexdigest(self):
    return u''

''')
    for hashfunc in ('sha1', 'md5'):
        module.locals[hashfunc] = fake.locals[hashfunc]

def register(linter):
    """called when loaded by pylint --load-plugins, register our tranformation
    function here
    """
    MANAGER.register_transformer(hashlib_transform)

The idea is to write a fake python implementation only documenting the prototype of the desired class, and to get an astng from it, using the string_build method of the astng builder. This method will return a Module node containing the astng for the given string. It's then easy to replace or insert additional information into the original module, as you can see in the above example.

Now if I run pylint using the updated plugin:

[syt@somewhere ~]$ pylint -E --load-plugins astng_hashlib example.py
No config file found, using default configuration

No error anymore, great!

What's next?

This fairly simple change could quickly provide great enhancements. We should probably improve the astng manipulation API now that it's exposed like that. But we can also easily imagine a code base of such pylint plugins maintained by each community around a python library or framework. One could then use a plugins stack matching stuff used by its software, and have a greatly enhanced experience of using pylint.

For a start, it would be great if pylint could be shipped with a plugin that explains all the magic found in the standard library, wouldn't it? Left as an exercice to the reader!

blog entry of

Logilab.org - en