Thursday, August 14, 2014

Annotations and type-declarations

Today I finished reviewing and retesting book, mainwindow and editdata. Just editview to go, and on the casual schedule I laid out tuesday.

Also today I learned about Guido's proposal for more detailed function annotations based on mypy. In the Reddit comments for that I learned about the Obiwan package for function annotation by Will Edwards (corrected link). And from both, learned that Python 3 already has support for type-annotating at least function arguments and return values.

All really interesting in an academic way. I like the idea of type-annotating functions and return values, but not if it is merely a formal type of commentary—which, it appears, it is and would remain under all these proposals.

My antique coding style, evolved from years of writing code professionally in both strictly-typed languages like C and Pascal and in completely un-typed assembly languages, is to make the types of inputs and outputs of every function, as well as the type of every variable, crystal clear, at least in my own mind. I don't fudge things and I never overload the meanings of a functions. It would be nice to have a way of documenting this in Python syntax if that would get me some additional function from the language—like compile-time error checking, or code optimization. So I did a quick trial of the Python 3 annotations.

def check(fname:str):
  enc = 'UTF-8'
  if '-l.' in fname or '-ltn.' in fname or fname.endswith('.ltn'):
    enc = 'LATIN-1'
  return enc

check('asd')
'UTF-8'
check('x.ltn')
'LATIN-1'
check(5)
Traceback (most recent call last):
  File "/Applications/WingPersonal.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in 
    # Used internally for debug sandbox under external interpreter
  File "/Applications/WingPersonal.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 3, in check
    pass
builtins.TypeError: argument of type 'int' is not iterable

No error checking. The clearly incorrect call check(5) slides right by the compile phase and traps at just where it would without an annotation.

So what's the point of annotating? To the human reader/maintainer, it is blindingly obvious (from comparisons to string literals and the use of .endswith()) that fname is expected to be a string and only ever a string. The human doesn't need the annotation (at least for this simple application) and the Python interpreter effectively ignores it. It's just a pointless decoration.

I can see that you could write a lint-like program that would use annotations to check for obvious type transgressions. Maybe pylint or some other tool does that now. But there are two major shortcomings to such a pre-processor. First, you have to incorporate it into your workflow somehow. Maybe it could be an automatic part of a test suite? But it will always be an extra step that complicates the programmer's work flow.

Second and more seriously, a static lint-like tool cannot catch the great majority of the type errors that arise. In fact, it can probably only catch errors in (1) literal argument values, where the type error is manifest in the code and is also (2) in the same source module as the annotated function.

If the annotated function is in an imported module (from utilities import check; check(5)) the lint has to examine the imported module to even be aware of the annotation. And if the function is defined in the present module but the argument comes from another module (import constants as C; check(C.INT_VALUE)) again the lint has to have executed the import to know anything about the referenced literal.

And a static type-checker is helpless if the argument value is not a literal—a function return, or simply the result of an expression. What can it do with check(foo(bar(baz())))? Type errors in this case can only be caught by the Python interpreter at the time it is compiling the expression. Then the dynamic AST could in principle provide enough information to predict an impending type error.

So until the interpreter itself implements and checks type annotations, I don't see any value to using them. That interpreter doesn't have to be CPython, note; in fact I would think that the PyPy or Cython teams would see annotations as a major asset for optimization, with checking falling out of an optimized execution.

No comments: