Thursday, September 11, 2014

Chardata Done, But TIL...

Today I started and finished the chardata module, the data store for the character table. This went very quickly because I had actually started to do this function as part of the worddata module that was the second module I completed. I quickly realized then that char data deserved its own module, and I yanked out the partly-written chunks of it to its own module. So today I had to tidy that up and finish it, and of course in the process, rewrote a good bit. Then I wrote a unit-test driver to force it through all its error conditions, and in the course of that, I found out something I never knew about Python!

One key function of the char data module is get_tuple(j), which returns (to the character table code still to be written) a tuple, (character, count) for the j'th item in the sorted sequence of characters. That lets the table view build itself by just calling in a loop from 0 to the size of the table, and getting characters in sorted order.

The magic behind that is a sorteddict from the blist module. I put all the characters as keys into a sorteddict, with their counts as values. Then I get a KeyView and a ValueView object on the dict, and I can index them by an integer for O(1) access to the j'th key or value in sorted order. Slick.

    def get_tuple(self,j):
        try :
            return (self.k_view[j], self.v_view[j])
        except :
            cd_logger.error('Invalid chardata index {0}'.format(j))
            return ('?',0)

So I'm thinking like a QA person writing my unit test driver, and I have coded tests of the normal function of get_tuple(), and want to exercise all the things that can make it get into the except clause. So I code this:

assert ('?', 0) == cd.get_tuple(3) # there are only 3 chars in the database
check_log(etxt,logging.ERROR) # check for the error message in the log
assert ('?', 0) == cd.get_tuple('x')
check_log(etxt,logging.ERROR)
assert ('?', 0) == cd.get_tuple(-1)
check_log(etxt,logging.ERROR)

And the test fails, assertion error on assert ('?', 0) == cd.get_tuple(-1). What? So a little experimenting shows that I can index the dictionary key and value views with -1 and -2, returning the third and second characters of the three in it.

So I innocently file an issue on blist, asking "is this expected behavior?". The reply comes in an hour. Daniel Stutzbach, owner of blist, writes "Yes. This is how all sequences in Python work, such as list() and tuple()."

Oops. The blist doc (linked above) clearly says that key and value views support all Python sequence operations, and the 3.3 docs say of indexing a sequence, "If i or j is negative, the index is relative to the end of the string: len(s) + i or len(s) + j is substituted."

I had no idea! I understand and use negative indices in slices, but had always supposed that was special to slicing. I have never had occasion to use a negative index on a list, and cannot imagine a case where I would want to do so. But there it is.

No comments: