from
Francisco Borges <f.borges at rug dot nl>
subject
[Python-projects] "Char coding" check bug (E0501 - PEP 263)
date
Hello,2005/06/23 12:07
So I have all these beautiful files full of beautiful chars like ±, á or
€. Python seems to be fine with it but not Pylint.
This is the example:
#! /usr/bin/env python
# -*- coding:iso-8859-15 -*-
In [62]:import re
In [63]:EMACS_ENCODING_RGX = re.compile('[^#]*[#\s]*-\*-\s*coding: ([^\s]*)\s*-\*-\s*')
In [64]:print EMACS_ENCODING_RGX.match(line)
None
In [65]:print EMACS_ENCODING_RGX.search(line)
None
## now with the actual regexp __defined__ at PEP263:
In [66]:PEP263 = re.compile("coding[:=]\s*([-\w.]+)")
In [67]:print PEP263.search(line)
<_sre.SRE_Match object at 0x40761b60>
Was there any reason to use (EMACS|VI)_ENCODING_RGX than the one
actually used by Python?
#--------------------------
PEP263, says:
"...a magic comment must be placed...
More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name."
It's funny because they give an exact regexp but they only mention in an
earlier paragraph that the line must be commented (and not a string).
CPython (Parser/tokenizer.c:get_encoding_spec) does require the line to
be a comment, while "coding[:=]\s*([-\w.]+)" does not.
I would suggest you to use "#.*coding[:=]\s*([-\w.]+)", which seems to
be the best regexp for what CPython is doing.
Cheers,
Francisco
_______________________________________________
Python-Projects mailing list
Python-Projects@lists.logilab.org
http://lists.logilab.org/mailman/listinfo/python-projects
