Tuesday, June 30, 2015

All Built, Waiting to Announce

I ran the PyInstaller builds on all four platforms, Mac OS, Windows 7 32-bit, and Ubuntu 32- and 64-bit. The process in each case is

  • Build the app and run it for a sanity check
  • create the PPQT2 folder containing the app, the README, the COPYING.TXT, and the extras folder
  • zip that folder with output to ~/Dropbox/Public/PPQT2
  • do something else while Dropbox loads the ~50MB file to the cloud
  • suspend that development VM
  • activate the test VM of the same platform (identical OS but no Python or Qt installed)
  • do something else while Dropbox updates that VM's copy of /Public
  • copy the zip file to the local desktop and unzip it
  • take a deep breath and hold it
  • run the app and exhale Yessssss! when it comes up and all features work.

Repeat for the next VM.

I was relieved to find that a 32-bit Win7 app runs just fine on a 64-bit Win7 installation. I was also relieved that the dev version of PyInstaller for Python3 worked perfectly in all three OS's. This is such a change from just a few months ago when I went through several weeks of agony trying different ways to bundle an app, including cxFreeze, pyqtdeploy, and nuitka.

What changed everything was that Hartmut Goebel and the other key maintainers of PyInstaller suddenly returned to activity with a long (and continuing) flurry of updates and fixes. The long-idle Python3 fork was merged to the mainline and got some key updates, and everything got good again. I feel a lot of gratitude toward those guys.

So now what for PPQT2? Right now I am waiting for the right moment to announce its availability on the DP forum. I think that will be Wednesday night; then I will be available to respond to forum comments promptly for four days straight. Meantime I am thinking hard about how to announce it, especially how to convey the importance of third-party contribution of Translators.

I have a couple of enhancements still to code. I promised I would write an ASCII Translator, just because I don't want my Knuth-Pratt paragraph justification code to be lost. And of course there will be some bug reports to deal with. But with any luck this project will be finito by the end of July.

Saturday, June 27, 2015

Major oops! revealed in windows test

So I had intended to release PPQT2 on Windows 7 64-bit only. But a query from an alpha user made me rethink that, and I decided better to build and release it on Windows 7 32-bit, which I trust and hope will work also on a 64-bit system. Unlike Linux, where I have to create versions for both widths.

Anyway, that meant setting up a 32-bit Win7 system. A couple weeks ago I went on eBay and bought a Win7 Professional 32-bit DVD. And installed it in a VM and provisioned it for development -- a process I meant to blog about today. But...

When I got to the point where I could run PPQT from source, I did so, and did a superficial checkout of various features. And immediately noticed that the Edit panel appeared to be using a non-monospaced font, probably Arial or whatever the Win7 default font is. Opened the preferences dialog, looked at the Edit Font choice popup menu. There were my expected Liberation Mono and Cousine font choices, along with several mono fonts from the local environment, like Consolas and Courier New. But selecting either Liberation Mono or Cousine from the menu and clicking Apply had no effect. The Edit panel remained showing Arial. Selecting a local font like Consolas did have immediate effect, so the whole mechanism of choosing and applying a font was working. But the two preferred fonts were not.

Hmmmm.

They were working perfectly well on Mac OS and Linux.

Weren't they?

Well, they seem to work.

To review, these two fonts are good looking mono fonts that both have a very wide repertoire of Unicode glyphs. I want them available to all PPQT users. To make that happen, I carefully built a resource file naming them, along with some minor graphic icon pngs, and compiled it with pyrcc5 into a Python file, resources.py. As a result, this code at the head of fonts.py:

_FONT_DB = QFontDatabase()
_FONT_DB.addApplicationFont(':/liberation_mono.ttf')
_FONT_DB.addApplicationFont(':/cousine.ttf')

...should prepare a local font database that includes all locally-available fonts plus those two, loaded from the Qt resources. They should then show up in the following code which prepares the list of families from which the "choose an edit font" popup is built.

# Return a list of available monospaced families for use in the preferences
# dialog. Because for some stupid reason the font database refuses to
# acknowledge that liberation mono and cousine are in fact, monospaced,
# insert those names too.

def list_of_good_families():
    selection = _FONT_DB.families() # all known family names
    short_list = [family for family in selection if _FONT_DB.isFixedPitch(family) ]
    short_list.insert(0,'Liberation Mono')
    short_list.insert(0,'Cousine')
    return short_list

Read the comments... I think I may know what that "stupid reason" was, now. Because, although those two family names were in the menu, choosing them had no effect. The Qt font database is carefully designed so that it always returns a font when you ask for one, falling back to some default if it can't give you what you ask for. Clearly in Windows, it couldn't provide the "Cousine" family and was falling back to the system default.

But not in Mac OS or Linux...

But you know, in both those dev systems, I believe I had installed both fonts locally...

Could it be that the font database had never been supplying those fonts from the resources module?

OK, so where did I import that module? I apply Wing IDE's "search in files" widget to look for "resources" in all project files.

I never did import it. I built the ****ing resources file and never imported it to the running app!

Added one line, import resources, to PPQT2.py and reran the app. Bingo! I can now set Liberation Mono or Cousine as Edit fonts!

It's a bit of a mystery how I managed to get along so far without encountering any problem from the missing resources. But I don't care. Just glad to have found it before shipping.

Thursday, June 25, 2015

A tough bug

So before I ship PPQT for real, I thought I better take one more look at an elusive problem. I had noticed that sometimes, keying ^f in the Edit panel did not bring the Find panel to the front as it is supposed to do. But the behavior seemed to be intermittent.

Recall that PPQT has on the left, a tabset of Edit panels, one per open document. And on the right, a tabset of various functional Panels like the Notes panel, Images panel, and Find panel. Each open document has its own set of function panels. So when you switch to a different document's Edit panel, the right half of the window is repopulated with the function panels for that document.

So what would happen is this. Say there are two documents open and document A is active (its Edit panel is visible). And on the right, the Images panel is displaying a scanned page from A. The user is typing and wants to find something, so keys ^f. What should, and usually does, happen is that the Find panel replaces the Images panel and the cursor focus moves to the Find text field.

What sometimes happened was that instead, the Images panel remained, but a narrow blue keyboard-focus rectangle appeared over it, outlining the position of the Find text field on the (invisible) Find panel.

It took a half-hour of repeated experiments to work out the failing conditions. There had to be at least two documents open. You had to switch between them in a certain order.

Internally, what I supposed was happening was that the Edit panel was trapping the ^f keystroke in its key event handler, and immediately issuing a signal named editKeyEvent, which was connected to a slot in the Find panel code. Using a breakpoint I verified that control got to the Find panel, and that it would call into a main window function to get itself displayed. The main window code is simple,

    def make_tab_visible(self, tabwidg):
        ix = self.panel_tabset.indexOf(tabwidg)
        if ix >= 0 : # widget exists in this tabset
            self.panel_tabset.setCurrentIndex(ix)
            return
        mainwindow_logger.error('Request to show nonexistent widget')

It asks the active tabset if it knows this widget; if it does, please make it active. What was happening was that the Find panel making this call was not the Find panel in the active tabset. It was the Find panel widget of the other document.

Wut?

This sent me off on a long tail-chase on the signal/slot setup. Somehow, I thought, the Edit panel for one book must have connected its ^f keystroke event signal to the wrong Find panel. But that code was all solid.

In the course of investigating the hook-up of the Edit signal to the Find panel, I noticed that there was another way for that signal to be emitted. It might come from the keyPressEvent handler of the edit panel. But it might also come from the Find... action of the Edit menu. Oho!

Months ago I implemented variable Edit menus. Each Edit panel has its own Edit menu, in which the menu Action for Find... (with a ^F keyboard accelerator option) was hooked to call a little function that emitted that Edit panel's signal to that Edit panel's Find panel. The Word panel and Notes panel also have their own Edit menus. This was all to get around Qt problems that plagued version 1, where different panels feuded over the ownership of the single global Edit menu. Now the Words panel has its own Edit menu that does things that relate to Words panel facilities, etc.

When any of these panels get a focusIn Event, they immediately call the main window asking it to put up their own custom Edit menu. When I implemented this, I put in a little gimmick to save some time.

_LAST_KEY = None
def set_up_edit_menu(key, action_list) :
    global _EDIT_MENU, _LAST_KEY
    if key != _LAST_KEY :
        _LAST_KEY = key
        _EDIT_MENU.clear()
        # code to populate the Edit menu from action_list
    _EDIT_MENU.setEnabled(True)

Each caller would pass a simple letter key, and the main window could avoid clearing and populating the menu when, as often happens, focus goes in and out of the same panel over and over.

The problem was, every Edit panel called with a key of 'E'. Ooops! That meant that when the focus moved from one document's Edit panel to another's—without a stop between in some other panel that had an Edit menu—the Edit menu would not be repopulated. It would still have the Actions defined by the first document. And that included a signal to the first document's Find panel when ^f was keyed or when Edit>Find was selected. So the signal would go to the wrong Find panel widget. It would see it wasn't visible, so it would call the main window; the main window would not find that widget in the current tabset; no Find panel would be displayed; but the Find panel would then call for keyboard focus to its Find text field, resulting in a focus rectangle over the top of the Images panel.

The fix was to make the key be something unique to the caller, and Python supplies a unique id via the id() built-in function. For a bonus, the callers no longer have to provide that key as an argument.

def set_up_edit_menu(action_list) :
    global _EDIT_MENU, _LAST_MENU
    if id(action_list) != _LAST_MENU :
        _LAST_MENU = id(action_list)
        _EDIT_MENU.clear()
        # populate the Edit menu from action_list
    _EDIT_MENU.setEnabled(True)

Wednesday, June 24, 2015

Learning about callable types

Another technique I used in the HTML translator is the Python equivalent of a "computed go-to". The Translator API passes a series of "events" and any translator has to deal with them something like this:

    for (code, text, stuff, lnum) in event_generator :
        # deal with this event

There are 34 possible event codes. A naive way of "dealing with this event" would be to write an if..elif..elif stack 34 items high. If you put the most frequent codes at the top this would not be too bad in performance, but it makes for rather unwieldy code to edit. A better way is to to have a dict in which the keys are the 34 code values, and the values are the actions to be performed for a given key. I've provided a skeleton of a Translator module with code like this:

    actions = {
        XU.Events.LINE          : None,
        XU.Events.OPEN_PARA     : None,
        XU.Events.CLOSE_PARA    : None,
...
        XU.Events.PAGE_BREAK    : note_page_break ,
...
        XU.Events.CLOSE_TABLE   : "</table>" ,
        }

    for (code, text, stuff, lnum) in event_generator :
        action = actions[ code ]
        if action : # is not None or null string,
            if isinstance( action, str ) :
                BODY << action # write string literal
            else :
                action() # call the callable
        # else do nothing

In this version, items in the dict can be one of three things:

  • None or a null string, meaning either "do nothing" or "not implemented yet"
  • A string literal to be copied to the output file immediately
  • A reference to a callable which will do something more involved, perhaps formatting the text value in some way before writing it.

(The callable functions named in this action dict were the "flock of little helper functions" that I referred to in yesterday's post—the ones that needed access to variables initialized by their parent function, and which had to become globals due to Python's eccentric scoping rules.)

For one part of the HTML Translator, I had a similar action dict but in it I wanted to have four possible types of actions:

  • None, meaning "do nothing"
  • A string literal to be copied to the output file immediately
  • A reference to a callable that would do something more involved such as setting or clearing a status flag.
  • A lambda that yielded a string value based on the text value but didn't actually contain a file-output call.

No problem, thought I. The above loop logic can easily be stretched to deal with this along these lines:

   for (code, text, stuff, lnum) in event_generator :
        action = actions[ code ]
        if action : # is not None or null string,
            if isinstance( action, str ) :
                BODY << action # write string literal
            elif type(action) == types.LambdaType :
                BODY << action() # invoke lambda, write its value
            else : # type(action) == FunctionType
                action() # call the callable
        # else do nothing

Surprise! It didn't work. Why? Turns out, although the types module has distinct names FunctionType and LambdaType, they are equal. You cannot distinguish between a reference to a lambda and a reference to a function based on type().

That kinda makes sense, in that we are told repeatedly that a lambda is just shorthand for an anonymous function. But it would have been handy to tell the difference.

In the end, for this part of the code (it was not the main translate loop but one with fewer possible codes) I made each value of the action dict a tuple ('f', funct_name) or ('s','literal string'). That allowed me to distinguish between a lambda that generated a string, and a function that didn't. But the whole thing felt like a kludge and I believe I will go back and recode that particular loop as an if/elif stack instead.

Tuesday, June 23, 2015

Learning about Python scoping

Variable scoping is a big topic in some programming languages, or was. "Scoping" refers to the rules about how the system resolves a variable name to its value. Here's an example of the problem I ran into while writing the HTML translator.

def parent():
    par_var = True
    def child():
        if par_var :
            print('yes')
        else :
            print('no')
    child()

When you execute parent(), what happens? Specifically, how is the child function's reference to par_var resolved to a value? My naive expectation was that Python would look first in the scope of child(), and fail; then look in the scope of parent(), and succeed, finding a value of True and printing "yes". Which it does! Expectations confirmed! But make this small change:

def parent():
    par_var = True
    def child():
        if par_var :
            print('yes')
            par_var = False
        else :
            print('no')
            par_var = True
    child()
    child()
    child()

What do you think? If you execute parent(), will it perhaps print yes, no, and yes? Nope. It will not run at all! It will immediately terminate with an error, "builtins.UnboundLocalError: local variable 'par_var' referenced before assignment".

What!?!

In the first example, Python had no problem resolving the child's reference to the parent's variable. But with this small change—assignment of values to par_var—the scoping rule changed. Now Python searches in the scope of child(), fails, and stops looking. It doesn't look out in the enclosing scope.

A little research turns up agreement that yes, this is the rule: if a variable is assigned a value in the scope of a function, Python assumes (in fact, insists) that the variable is local to that function. The only exception is if you specifically include a global statement. So let's do that:

def parent():
    par_var = True
    def child():
        global par_var
        if par_var :
            print('yes')
            par_var = False
        else :
            print('no')
            par_var = True
    child()
    child()
    child()

Does it work now? Nunh-unh. "builtins.NameError: name 'par_var' is not defined". The global statement modifies the scoping rule, all right. But it does not simply say, "global to me"; it says "global in the sense of being at the top level of this namespace/module". Which it is not, in that example. The only way to make the above code work is to move the first assignment to par_var outside the body of the parent function.

So in Python, it is possible to have a variable that is "relatively global"—not local but declared in some a containing scope—but only if that variable is read-only. As soon as the inner function attempts assignment, the variable must be either purely local, or purely global.

This is kind of wacky. It made me have to revise a bunch of code I'd written, where a parent function declared a whole batch of little helper child functions, and shared the use of the parent's variables. All the variables had to move out to the module level and get ALLCAP names. Also, I have this little evil thought: what if the child function does not assign to par_var but instead passes it to another function, and that function assigns to it? par_var might have to be a list or other mutable collection to make that work, but... hmmm.

Whatever, that's done. HTML conversion works nicely and I am happy to say, is really quick. It takes less time to translate a document and create a new document, than it does to load the source document in the first place. Later this week I will be packaging PPQT for release.

Friday, June 19, 2015

HTML Translator going together fast

I spent several hours coding up an HTML translator and have it 75% complete. Another coding session will finish it; then I'll have to test it a wee bit. Finished by Tuesday, I reckon.

I only found one awkward spot in the design of the Translator API. An Illustration markup like this:

[Illustration:fig_55.png|fig_55.jpg Fig. 55 The whang-dangle is..

Produces a sequence of "events" as follows:

OPEN_ILLO with the two filenames
OPEN_PARA
LINE "The whang-doodle..."

So at the time I am generating the code for an image div, I have only the filenames. (Passing the filenames after the colon is a PPQT extension. If the user doesn't do that, the Translator has no way to get them.) The HTML Translator can use this to build a nice opening string of:

<div class=image>
<a href='images/fig_55.jpg'>
<img src='images/fig_55.png' /></a>

However, it would be a useful service if it could also generate alt= and title= attributes from the starting text of the caption, for example alt="Fig. 55 The whang-dangle.... But that can't happen because the opening text of the caption will not arrive for two more events.

I thought about postponing actual output of the image markup until the first paragraph arrived, but that would make the code just horribly ugly. On every OPEN_PARA event you'd have to ask, is this the start of a caption?

I put in a similar but lesser kludge in translating a Footnote. The OPEN_FNOTE event has the footnote key value, so it can generate the target anchor and the back-link to the footnote reference. Unfortunately all that goes inside the first paragraph of the note:

<div class='footnote'>
<p><a id='Footnote_X'></a><a href='FNref_X'>[X]</a>

So here is the <p> being put out ahead of the coming OPEN_PARA event. So I had to create a switch footnote_starting and test it in the OPEN_PARA code, and when it is on, not generate anything and turn it off.

But by and large, the HTML Translator just kind of fell together. You just consider each of the 34 event codes in sequence, write a little bit of code, and repeat.

Tuesday, June 16, 2015

Just about ready for official release

I committed all the translator material to github. Except (OMG!) it appears I forgot to add the two new modules, translators.py and xlate_utils.py. OK, done. So that is all wrapped up and documented very well if I do say so (and as a retired tech writer, I think I know when an API is properly documented). I'm very pleased at how I used a formal syntax to define and verify the document structure. The code of translators.py is clean and well-organized. The mechanism for defining and displaying an "Options Dialog" is simple and (I hope) understandable.

There are parts of the xlate_utils.py module that I'm not quite so proud of. The tokenize() function is going to be fairly heavily used and I confess there are parts of it that are rather ad-hoc not to say downright kludgy. And not superbly well tested, yet. However, in coming days I'll be writing an HTML translator that will test it.

That done, on Monday and Tuesday I turned to my list of issues and resolved most of them. One that came out better than I originally expected was this: PPQT2 expects input files to be encoded UTF-8. The user can get a Latin-1 file correctly opened by renaming it to a suffix of .ltn, but if that is not done, the file will be input through the UTF-8 codec. Some special characters will not decode right and will be replaced with Unicode \ufffd, the "replacement character". And if this isn't noticed right away, and the file is saved, there's permanent loss of data.

So I very much wanted to catch this error early and warn the user. But how? I researched the methods of QTextStream, QFile, and QTextCodec. I know that while QTextStream is executing a readAll() call, it must use a QTextCodec.toUnicode() function. That function is capable of returning a count of invalid characters, but there doesn't seem to be any way to find it out.

It looked as if the only ways I could use to find out if the file decoded properly would be either, one, to read it with Python, in which case the readall method would throw an exception; or two, to use QTextStream.readAll() into a string and search the string for replacement characters. Either method would require me to change the API between the main window and the Book, or else to read the possibly-large document file twice.

Then it dawned on me that the QPlainTextEditor has a perfectly good find() method. All I had to do was, in the Book just after it has loaded the editor from the file, to call the editor's find() to look for a replacement character. One hopes the search fails. But if it does not, I can notify the user with a warning message, including the character position of the first replacement character. I made the warning message detailed and also included a pointer to the Help topic.

Another long-standing issue, more of a major loose end to clean up, was logging. There are lots and lots of log messages being issued all over the program. But the logging output was going nowhere. I had initially thought that I would add argument parsing, and use it to support --log-path= and --log-level= parameters. But that's dumb; I'm packaging the Windows and Mac OS versions as clickable apps, with no command-line input. And the Linux version doesn't have to be launched from a command line. So I did some study and reading and chose writable locations for log files based on the platform: /var/tmp for Linux, ~/Library/Logs in Mac OS, and \Windows\Temp in Windows. Oops, I just realized I committed the code for that, but I should also update the Help file to document it. Or maybe put it in the README for each version?

Anyway: tomorrow is Museum day. Thursday I need to spend most of my free time studying the docs for a new volunteer gig. I have an online training session for that at 4pm that day and I want to be prepared. But Friday I will start coding an HTML Translator. Should have that done early next week. By the end of next week I should have PPQT2 packaged up ready to announce. Can't wait.

Saturday, June 13, 2015

Translator API complete

So I continued to code in bursts over the week away in Seattle, and today at home I was able to put in several hours and finished the Translator API, including documenting it. This makes PPQT2 just about functionally complete. What I need to do over the next week—to be honest, probably two weeks—is:

  • Commit all this work and check it in to github,
  • Write a real Translator to serve as an example—I think probably an HTML one as it would be the easiest,
  • Clean up all the "issues" I have been posting for myself on the github page,
  • Create 32-bit Win7 dev and test VMs to replace the 64-bit ones I've been using,
  • Bundle the package and release it—as a beta? Or final? No, Beta.

That done, I will spend the early part of July awaiting bug reports, writing another Translator, either fpgen or ppgen, and finally implementing drag-out tabs. By mid-july I intend this project to be done.

Sunday, June 7, 2015

Test Translator running

My coding time has been limited the past few days and will continue to be so as we fly off to Seattle for a few days to visit relatives, although I'll get in some. I hope by this time next week the complete Translator interface will be working.

What is working now is: document parsing, finding Translator modules, building the submenu, responding to the selection of a Translator from the menu, displaying a Translator's option query dialog, and calling the Translator's initialize(), translate() and finalize() entries with appropriate arguments. A special demo Translator named "Testing" exists and accepts calls at those entries. All it does is produce lines of output documenting what it is called with, including displaying each document "event" it gets.

So far, the output of Testing (or any other Translator, should one exist, but none do) is only dumped with print statements. But a whole lot of machinery had to work in order to get this far.

One bug I had to work through was a classic "lost reference" bug. In most Qt interfaces, when you build a widget and give it to another widget to manage, the manager widget becomes a "parent" and retains the child widget, so it continues to exist. That's not the case with menus. A QMenu does not take ownership of a QAction or a sub-QMenu. I forgot that. I modified the main window to call the translator support module to build the submenu of translators, but didn't save the reference returned. Just handed it to the File menu's addMenu() method.

The result was an interesting bug: the Translators... sub-menu would appear the first time you opened the File menu. Then the second or third time (or sometimes the first time), there would be no submenu. It would disappear. It all depended on when the Python garbage collector got around to demolishing the submenu QMenu object. That took half an hour of trying and thinking before I finally twigged, did a classic Doh! face-palm, and did a one-line change to have mainwindow save a reference to the submenu.

Another problem was my lack of experience with Python generators. A generator is a function that contains a yield statement, but what is tricky is, it is not that function you call in your for-loop; it is that function's return value. And the function call gets parenthesized arguments, while the iterator does not. I almost had it right but had to review the python docs and recode at two different points.

I used a generator in my "scanner" that overrides the built-in token scanner for the YAPPS parser. My scanner class took an iterator—which is anything that responds to the next() built-in function—as an initialization parameter. It calls next(iterator) to get the next line of the document being parsed.

For unit-test purposes, it was initialized with a Python StringIO object loaded with a triple-quoted literal. A StringIO responds to next() with the next line of stuff. But for the real thing, I needed to pass an iterator that yields the next line of the real document.

Months ago I coded such an iterator into the editdata module. It's a simple function:

    def all_lines(self):
        tb = self.begin() # first QTextBlock in document
        while tb.isValid():
            yield tb.text()
            tb = tb.next()

There's the magic yield statement. But when I passed editdata.all_lines to initialize the scanner, I got an error about cannot iterate a method. What I had to pass to the scanner in place of a StringIO was not all_lines but all_lines(), the returned value of calling the generator. That's the iterator that can respond to a next() call.

I made the exact inverse goof in quickly whipping up the Testing Translator. The translate() method of a Translator is passed an iterator that returns "events" it is to process. I was correctly passing it the result of calling my event-generating function with a yield statement. But in the Translator where it invoked the iterator, I coded for (code, text, stuff, lnum) in event_iterator() and got a different error message, about "cannot call an iterator". Had to remove the parens, is all.

I want to modestly point out that when a Translator gets such a Python error, it is caught and displayed with some helpful info in an error dialog to the user. That code's working too.

What's not working? Two tricky bits. I've promised in the Translator API doc that there will be a tokenize function that breaks a line of text up into mini-events so that the Translator doesn't need to duplicate the logic to extract such things as superscripts, subscripts, footnote anchors, and markups like italics. That will take a bit of thinking to make clean and bulletproof.

And the final step of doing translation: taking the text that the Translator writes into a MemoryStream object, combining it with metadata extracted from the source Book object, and creating a new Book object with the translated text and at least some of the source metadata. That's going to take some fiddly code, probably involving JSON editing.

And finally, back in the mainwindow code, installing a new Book and making it visible for editing. That's just a special case of the File>New operation, hopefully.

Saturday, June 6, 2015

Making Translators real -- and why I hate Markdown

"Smelling the barn"—an expression that horsey people use to describe the sudden enthusiasm of a horse when it nears the end of a long ride. Do horses really do that? Whatever; I'm starting to smell the end of this project. I'm alternating between writing the code that will call a Translator and updating the document that tells the Translator's author what will happen. And it all feels like it's coming together at last. I have serious hopes of delivering the 1.0 product by the end of this month!

But the API doc... I started writing it in Markdown. It's simple; it's designed for documenting code; it's also slightly less stupidly inconsistent than reST. But then I tried to process it for display. Aagghhh!

One would expect that John Gruber, the inventor of Markdown, could write an accurate preview widget—wouldn't one? So the first place I tried my document was at his "markdown dingus". And it is terrible! Although the sidebar clearly states that a four-space indent means a code block, my document has numerous indented code blocks and Gruber's "dingus" renders them all in normal text font. It preserves the line breaks in these sections, at least, but does not preserve the white-space indents.

Worse, it does not automatically escape either underscores or HTML codes within a code block. So where the code contains an underscore in a variable name, the underscore disappears and the "code" suddenly turns italic. And where there's a bit of HTML in the code block, it takes effect. It's just a mess. Shame on you, John Gruber.

There's a pretty decent Markdown extension for Chrome. It renders my code blocks fine. But I'm just confused because for Gruber's widget I have to escape underscores inside variable names, but then the Chrome widget renders the backslash. I'm going to assume that the Chrome extension is accurate (whatever "accurate" means when discussing Markdown). But Markdown is a mess, a very unsatisfactory medium. Ordinary manual HTML, like I do in this blog, would be less confusing for sure.

Tuesday, June 2, 2015

Implementing translators 1: API spec

With the document parse coded and tested, I turned my attention to coding the function that will actually invoke a Translator when one is selected from the menu. I spent a couple of hours tidying and rearranging the code of translators.py and coding the framework of the function. My approach to this is to write a comment prolog saying what I'm going to do, and revising that over and over as I realize why I can't do that, at least not in that sequence, so I'm actually going to do this, etc. etc.

The first steps are known and I could code them: find if the Translator had a user-options dialog and presenting it; parsing the document. But then: time to start actually calling the Translator. This brought me face-on to the question of what exactly is the interface to the Translator. I had a detailed sketch of a design from weeks ago but knew more now. So I changed modes and spent about 3 intense hours rewriting the API spec. I made several on-the-fly decisions that simplified it from before. Really, any competent Python programmer—who also understands the full range of things that can occur in a DP document—and who also understands in detail the syntax of the target document format—should have no difficulty writing a Translator.

When I put it that way, it suggests a Venn diagram of three circles for which the little triangle where all three overlap might be rather small. But for the select group of people in that happy spot—no problem.

Monday, June 1, 2015

Parsing a document, 7: testing (and ranting at Enum)

I am now beginning to test my document-parsing code and things are going very well. It amounted to about 250 LOC (plus the code generated by YAPPS, another 200 or so). For first execution of brand new code things went well. After I picked off 6 or 8 stupid coding errors (like: defining some regexes as class members and forgetting to use "self." to reference them) and a couple of small logic errors, it is happily parsing a simple document.

One problem I ran into that took a bit of finagling was this. The generated parser comes from a "grammar" file. I've shown some preliminary grammar code in previous posts. One tricky production is the one for heads:

rule HEAD:      EMPTY {{print('head...')}} ( PARA {{ print( "...3") }}
                            | EMPTY EMPTY PARA+ EMPTY {{ print("...2") }}
                            )

The items in {{double braces}} are Python statements which YAPPS will insert into the generated parser code at the point where parsing reaches that part of the production. In that code the statements are print() calls. But what I really needed was this:

rule HEAD:      EMPTY {{ open_head() }} ( PARA {{ close_head(3) }}
                            | EMPTY EMPTY PARA+ EMPTY {{ close_head(2) }}
                            )

In other words, call functions of mine that will set up a start-heading work unit, and, when the type of heading is known—only after processing the paragraph(s) of text within the heading—back-patch the open-head unit with the type of head it turned out to be, and append the close-head unit.

Well, that code died with an exception because "function open_head() not found." Wut? I was importing the parser with:

from dpdocumentsyntax import DPDOC

which should make the parser class part of the active namespace where the functions like open_para() were defined. But no. I tried several ways to work around this. You can include blocks of code in the generated parser, but if I defined the helpers like open_para() there, they could not see the globals like the WORK_UNITS list they had to modify. Eventually I had to do it in a not very pretty way,

import dpdocumentsyntax
dpdocumentsyntax.open_para = open_para

That is, manually inserting those definitions into the imported namespace.

Anyway, as it parses, the code builds a list of "work unit" objects that will eventually be fed to a Translator as "events". A typical sequence of work units, or events, would be,

  • Open head(2) (Chapter head)
  • Open paragraph
  • Line (text: "CHAPTER ONE")
  • Close paragraph
  • Close head(2)
  • Open paragraph
  • Line (text)
  • Line (text)
  • Close paragraph

And so forth. There are all told 30 different possible "events" and I expect to pass each to a Translator with a code signifying what kind of event it is, e.g. Open Paragraph, close BlockQuote, or Open Illustration Caption, etc. So how should these codes be defined? Obviously there must be names for them, like OPEN_PARA, CLOSE_FNOTE and so forth. And obviously these will be in a module the Translator can include, perhaps so:

from xlate_utils import EVENTS

Then the coder can make decisions by comparing to EVENTS.OPEN_PARA and the like.

Looks like a job for an Enum, right? The Enum "type"—it isn't a type—was added to Python in version 3.4, and having played with it, I cannot fathom why they bothered. It has to be the most useless piece of syntax ever. But check this out.

from enum import Enum
class ECode( Enum ):
  VAL1 = '1'
  VAL2 = '2'
ECode.VAL1
<ECode.VAL1: '1'>
'1' == ECode.VAL1
False
edict = { ECode.VAL1: 1, ECode.VAL2: 2 }
edict
{<ECode.VAL2: '2'>: 2, <ECode.VAL1: '1'>: 1}
edict['1']
Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
builtins.KeyError: '1'
ECode.VAL1 < ECode.VAL2
Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
builtins.TypeError: unorderable types: ECode() < ECode()

Now, for something completely different:

class HCode( object ) :
  VAL1 = '1'
  VAL2 = '2'
HCode.VAL1
'1'
HCode.VAL1 == '1'
True
hdict = { HCode.VAL1 : 1, HCode.VAL2 : 'to' }
hdict
{'1': 1, '2': 'to'}
hdict[ '2' ]
'to'
hdict[ HCode.VAL1 ]
1
HCode.VAL1 < HCode.VAL2
True

What I'm saying is that a simple class definition accomplishes everything that the "Enum" class does, and also has "real" values that can be compared and ordered. There is the one tiny drawback that a user could assign to HCode.VAL1 but that can, I believe, be prevented by adding a decorator.

So I will be providing the 30 event codes as a class EVENTS that is really a class and performs what a C header file does: give names to arbitrary literals.