fix file encoding detection with python2.x (closes #70494)

file encoding declaration shall be in a magic comment.

from PEP0263:

More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.
authoralain lefroy
changeset8c2df89b15dc
branchdefault
phasepublic
hiddenno
parent revision#5bc62366ccda cleanups
child revision#28f65e556047 test and fix relative import inference pb, detected with python 3
files modified by this revision
ChangeLog
builder.py
test/unittest_builder.py
# HG changeset patch
# User alain lefroy
# Date 1310141053 -7200
# Fri Jul 08 18:04:13 2011 +0200
# Node ID 8c2df89b15dc7d59c6fdb3b8fd83273d2fe61d2b
# Parent 5bc62366ccda104ac27c8311ea5c32c7227c9ede
fix file encoding detection with python2.x (closes #70494)

file encoding declaration shall be in a magic *comment*.

from PEP0263::

More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.

diff --git a/ChangeLog b/ChangeLog
@@ -4,10 +4,11 @@
1  --
2      * added column offset information on nodes (patch by fawce)
3      * #70497: Crash on AttributeError: 'NoneType' object has no attribute '_infer_name'
4      * #70381: IndendationError in import causes crash
5      * #70565: absolute imports treated as relative (patch by Jacek Konieczny)
6 +    * #70494: fix file encoding detection with python2.x
7 
8  2011-01-11  --  0.21.1
9      * python3: handle file encoding; fix a lot of tests
10 
11      * fix #52006: "True" and "False" can be assigned as variable in Python2x
diff --git a/builder.py b/builder.py
@@ -58,11 +58,11 @@
12          return stream, encoding, data
13 
14  else:
15      import re
16 
17 -    _ENCODING_RGX = re.compile("[^#]*#*.*coding[:=]\s*([^\s]+)")
18 +    _ENCODING_RGX = re.compile("\s*#+.*coding[:=]\s*([-\w.]+)")
19 
20      def _guess_encoding(string):
21          """get encoding from a python file as string or return None if not found
22          """
23          # check for UTF-8 byte-order mark
diff --git a/test/unittest_builder.py b/test/unittest_builder.py
@@ -715,10 +715,22 @@
24 
25              ### vim:fileencoding= ISO-8859-1
26              ''')
27              self.failUnlessEqual(e, None)
28 
29 +        def test_wrong_coding(self):
30 +            # setting "coding" varaible
31 +            e = guess_encoding("coding = UTF-8")
32 +            self.failUnlessEqual(e, None)
33 +            # setting a dictionnary entry
34 +            e = guess_encoding("coding:UTF-8")
35 +            self.failUnlessEqual(e, None)
36 +            # setting an arguement
37 +            e = guess_encoding("def do_something(a_word_with_coding=None):")
38 +            self.failUnlessEqual(e, None)
39 +
40 +
41          def testUTF8(self):
42              e = guess_encoding('\xef\xbb\xbf any UTF-8 data')
43              self.failUnlessEqual(e, 'UTF-8')
44              e = guess_encoding(' any UTF-8 data \xef\xbb\xbf')
45              self.failUnlessEqual(e, None)