Tuesday, June 23, 2015

Learning about Python scoping

Variable scoping is a big topic in some programming languages, or was. "Scoping" refers to the rules about how the system resolves a variable name to its value. Here's an example of the problem I ran into while writing the HTML translator.

def parent():
    par_var = True
    def child():
        if par_var :
            print('yes')
        else :
            print('no')
    child()

When you execute parent(), what happens? Specifically, how is the child function's reference to par_var resolved to a value? My naive expectation was that Python would look first in the scope of child(), and fail; then look in the scope of parent(), and succeed, finding a value of True and printing "yes". Which it does! Expectations confirmed! But make this small change:

def parent():
    par_var = True
    def child():
        if par_var :
            print('yes')
            par_var = False
        else :
            print('no')
            par_var = True
    child()
    child()
    child()

What do you think? If you execute parent(), will it perhaps print yes, no, and yes? Nope. It will not run at all! It will immediately terminate with an error, "builtins.UnboundLocalError: local variable 'par_var' referenced before assignment".

What!?!

In the first example, Python had no problem resolving the child's reference to the parent's variable. But with this small change—assignment of values to par_var—the scoping rule changed. Now Python searches in the scope of child(), fails, and stops looking. It doesn't look out in the enclosing scope.

A little research turns up agreement that yes, this is the rule: if a variable is assigned a value in the scope of a function, Python assumes (in fact, insists) that the variable is local to that function. The only exception is if you specifically include a global statement. So let's do that:

def parent():
    par_var = True
    def child():
        global par_var
        if par_var :
            print('yes')
            par_var = False
        else :
            print('no')
            par_var = True
    child()
    child()
    child()

Does it work now? Nunh-unh. "builtins.NameError: name 'par_var' is not defined". The global statement modifies the scoping rule, all right. But it does not simply say, "global to me"; it says "global in the sense of being at the top level of this namespace/module". Which it is not, in that example. The only way to make the above code work is to move the first assignment to par_var outside the body of the parent function.

So in Python, it is possible to have a variable that is "relatively global"—not local but declared in some a containing scope—but only if that variable is read-only. As soon as the inner function attempts assignment, the variable must be either purely local, or purely global.

This is kind of wacky. It made me have to revise a bunch of code I'd written, where a parent function declared a whole batch of little helper child functions, and shared the use of the parent's variables. All the variables had to move out to the module level and get ALLCAP names. Also, I have this little evil thought: what if the child function does not assign to par_var but instead passes it to another function, and that function assigns to it? par_var might have to be a list or other mutable collection to make that work, but... hmmm.

Whatever, that's done. HTML conversion works nicely and I am happy to say, is really quick. It takes less time to translate a document and create a new document, than it does to load the source document in the first place. Later this week I will be packaging PPQT for release.

No comments: