Following the recent introduction of Python type annotations (aka "type hints") in Mercurial (see, e.g. this changeset by Augie Fackler), I've been playing a bit with this and pytype.
pytype is a static type analyzer for Python code. It compares with the more popular mypy but I don't have enough perspective to make a meaningful comparison at the moment. In this post, I'll illustrate how I worked with pytype to gradually add type hints in a Mercurial module and while doing so, fix bugs!
The module I focused on is mercurial.mail, which contains mail utilities and that I know quite well. Other modules are also being worked on, this one is a good starting point because it has a limited number of "internal" dependencies, which both makes it faster to iterate with pytype and reduces side effects of other modules not being correctly typed already.
shell
$ pytype mercurial/mail.py
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[19/19] check mercurial.mail
Success: no errors found
The good news is that the module apparently already type-checks. Let's go deeper and merge the type annotations generated by pytype:
$ merge-pyi -i mercurial/mail.py out/mercurial/mail.pyi
(In practice, we'd use --as-comments
option to write type hints as
comments, so that the module is still usable on Python 2.)
Now we have all declarations annotated with types. Typically, we'd get many things like:
sourceCode
def codec2iana(cs) -> Any:
cs = pycompat.sysbytes(email.charset.Charset(cs).input_charset.lower())
# "latin1" normalizes to "iso8859-1", standard calls for "iso-8859-1"
if cs.startswith(b"iso") and not cs.startswith(b"iso-"):
return b"iso-" + cs[3:]
return cs
The function signature has been annotated with Any
(omitted for
parameters, explicit for return value). This somehow means that type
inference failed to find the type of that function. As it's (quite)
obvious, let's change that into:
sourceCode
def codec2iana(cs: bytes) -> bytes:
...
And re-run pytype:
$ pytype mercurial/mail.py
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[1/1] check mercurial.mail
FAILED: .pytype/pyi/mercurial/mail.pyi
pytype-single --imports_info .pytype/imports/mercurial.mail.imports --module-name mercurial.mail -V 3.7 -o .pytype/pyi/mercurial/mail.pyi --analyze-annotated --nofail --quick mercurial/mail.py
File "mercurial/mail.py", line 253, in codec2iana: Function Charset.__init__ was called with the wrong arguments [wrong-arg-types]
Expected: (self, input_charset: str = ...)
Actually passed: (self, input_charset: bytes)
For more details, see https://google.github.io/pytype/errors.html#wrong-arg-types.
ninja: build stopped: subcommand failed.
Interesting! email.charset.Charset is apparently instantiated with the wrong argument type. While it's not exactly a bug, because Python will handle bytes instead of str well in general, we can again change the signature (and code) to:
sourceCode
def codec2iana(cs: str) -> str:
cs = email.charset.Charset(cs).input_charset.lower()
# "latin1" normalizes to "iso8859-1", standard calls for "iso-8859-1"
if cs.startswith("iso") and not cs.startswith("iso-"):
return "iso-" + cs[3:]
return cs
Obviously, this involves a larger refactoring in client code of this simple function, see respective changeset for details.
Another example is this function:
sourceCode
def _encode(ui, s, charsets) -> Any:
'''Returns (converted) string, charset tuple.
Finds out best charset by cycling through sendcharsets in descending
order. Tries both encoding and fallbackencoding for input. Only as
last resort send as is in fake ascii.
Caveat: Do not use for mail parts containing patches!'''
sendcharsets = charsets or _charsets(ui)
if not isinstance(s, bytes):
# We have unicode data, which we need to try and encode to
# some reasonable-ish encoding. Try the encodings the user
# wants, and fall back to garbage-in-ascii.
for ocs in sendcharsets:
try:
return s.encode(pycompat.sysstr(ocs)), ocs
except UnicodeEncodeError:
pass
except LookupError:
ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
else:
# Everything failed, ascii-armor what we've got and send it.
return s.encode('ascii', 'backslashreplace')
# We have a bytes of unknown encoding. We'll try and guess a valid
# encoding, falling back to pretending we had ascii even though we
# know that's wrong.
try:
s.decode('ascii')
except UnicodeDecodeError:
for ics in (encoding.encoding, encoding.fallbackencoding):
ics = pycompat.sysstr(ics)
try:
u = s.decode(ics)
except UnicodeDecodeError:
continue
for ocs in sendcharsets:
try:
return u.encode(pycompat.sysstr(ocs)), ocs
except UnicodeEncodeError:
pass
except LookupError:
ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
# if ascii, or all conversion attempts fail, send (broken) ascii
return s, b'us-ascii'
It quite clear from the return value (last line) that we can change its type signature to:
sourceCode
def _encode(ui, s:Union[bytes, str], charsets: List[bytes]) -> Tuple[bytes, bytes]
...
And re-run pytype:
$ pytype mercurial/mail.py
Computing dependencies
Analyzing 1 sources with 36 local dependencies
ninja: Entering directory `.pytype'
[1/1] check mercurial.mail
FAILED: .pytype/pyi/mercurial/mail.pyi
pytype-single --imports_info .pytype/imports/mercurial.mail.imports --module-name mercurial.mail -V 3.7 -o .pytype/pyi/mercurial/mail.pyi --analyze-annotated --nofail --quick mercurial/mail.py
File "mercurial/mail.py", line 342, in _encode: bad option in return type [bad-return-type]
Expected: Tuple[bytes, bytes]
Actually returned: bytes
[...]
For more details, see https://google.github.io/pytype/errors.html.
ninja: build stopped: subcommand failed.
That's a real bug. Line 371 contains
return s.encode('ascii', 'backslashreplace')
in the middle of the
function. We indeed return a bytes
value instead of a
Tuple[bytes, bytes]
as elsewhere in the function. The following
changes fixes the bug:
sourceCode
diff --git a/mercurial/mail.py b/mercurial/mail.py
--- a/mercurial/mail.py
+++ b/mercurial/mail.py
@@ -339,7 +339,7 @@ def _encode(ui, s, charsets) -> Any:
ui.warn(_(b'ignoring invalid sendcharset: %s\n') % ocs)
else:
# Everything failed, ascii-armor what we've got and send it.
- return s.encode('ascii', 'backslashreplace')
+ return s.encode('ascii', 'backslashreplace'), b'us-ascii'
# We have a bytes of unknown encoding. We'll try and guess a valid
# encoding, falling back to pretending we had ascii even though we
# know that's wrong.
Steps by steps, by replacing Any
with real types, adjusting "wrong"
types like in the first example and fixing bugs, we finally get the
whole module (almost) completely
annotated.
- hgview 1.2.0 released
- FOSDEM 2013
- hgview 1.1.0 released
- Text mode makes it into hgview 1.4.0
- hgview http://www.hgview.org/ qt4 and text based Mercurial log navigator