Sunday, June 7, 2015

Test Translator running

My coding time has been limited the past few days and will continue to be so as we fly off to Seattle for a few days to visit relatives, although I'll get in some. I hope by this time next week the complete Translator interface will be working.

What is working now is: document parsing, finding Translator modules, building the submenu, responding to the selection of a Translator from the menu, displaying a Translator's option query dialog, and calling the Translator's initialize(), translate() and finalize() entries with appropriate arguments. A special demo Translator named "Testing" exists and accepts calls at those entries. All it does is produce lines of output documenting what it is called with, including displaying each document "event" it gets.

So far, the output of Testing (or any other Translator, should one exist, but none do) is only dumped with print statements. But a whole lot of machinery had to work in order to get this far.

One bug I had to work through was a classic "lost reference" bug. In most Qt interfaces, when you build a widget and give it to another widget to manage, the manager widget becomes a "parent" and retains the child widget, so it continues to exist. That's not the case with menus. A QMenu does not take ownership of a QAction or a sub-QMenu. I forgot that. I modified the main window to call the translator support module to build the submenu of translators, but didn't save the reference returned. Just handed it to the File menu's addMenu() method.

The result was an interesting bug: the Translators... sub-menu would appear the first time you opened the File menu. Then the second or third time (or sometimes the first time), there would be no submenu. It would disappear. It all depended on when the Python garbage collector got around to demolishing the submenu QMenu object. That took half an hour of trying and thinking before I finally twigged, did a classic Doh! face-palm, and did a one-line change to have mainwindow save a reference to the submenu.

Another problem was my lack of experience with Python generators. A generator is a function that contains a yield statement, but what is tricky is, it is not that function you call in your for-loop; it is that function's return value. And the function call gets parenthesized arguments, while the iterator does not. I almost had it right but had to review the python docs and recode at two different points.

I used a generator in my "scanner" that overrides the built-in token scanner for the YAPPS parser. My scanner class took an iterator—which is anything that responds to the next() built-in function—as an initialization parameter. It calls next(iterator) to get the next line of the document being parsed.

For unit-test purposes, it was initialized with a Python StringIO object loaded with a triple-quoted literal. A StringIO responds to next() with the next line of stuff. But for the real thing, I needed to pass an iterator that yields the next line of the real document.

Months ago I coded such an iterator into the editdata module. It's a simple function:

    def all_lines(self):
        tb = self.begin() # first QTextBlock in document
        while tb.isValid():
            yield tb.text()
            tb = tb.next()

There's the magic yield statement. But when I passed editdata.all_lines to initialize the scanner, I got an error about cannot iterate a method. What I had to pass to the scanner in place of a StringIO was not all_lines but all_lines(), the returned value of calling the generator. That's the iterator that can respond to a next() call.

I made the exact inverse goof in quickly whipping up the Testing Translator. The translate() method of a Translator is passed an iterator that returns "events" it is to process. I was correctly passing it the result of calling my event-generating function with a yield statement. But in the Translator where it invoked the iterator, I coded for (code, text, stuff, lnum) in event_iterator() and got a different error message, about "cannot call an iterator". Had to remove the parens, is all.

I want to modestly point out that when a Translator gets such a Python error, it is caught and displayed with some helpful info in an error dialog to the user. That code's working too.

What's not working? Two tricky bits. I've promised in the Translator API doc that there will be a tokenize function that breaks a line of text up into mini-events so that the Translator doesn't need to duplicate the logic to extract such things as superscripts, subscripts, footnote anchors, and markups like italics. That will take a bit of thinking to make clean and bulletproof.

And the final step of doing translation: taking the text that the Translator writes into a MemoryStream object, combining it with metadata extracted from the source Book object, and creating a new Book object with the translated text and at least some of the source metadata. That's going to take some fiddly code, probably involving JSON editing.

And finally, back in the mainwindow code, installing a new Book and making it visible for editing. That's just a special case of the File>New operation, hopefully.

No comments: