Sunday, June 29, 2014

Design Breakthrough, and July Off

A couple of posts back I agonized about how PPQT should relate to the new, competing markup styles. Today I figured out how to handle this.

The recommended and expected workflow will be that the PPer first brings the text to a state of completion with respect to the DP formatting guidelines. Heal all page breaks, fix all typos, deal with all proofer notes, move and renumber footnotes, process all gutcheck/bookloupe diagnostics. Delete all page separator lines. Also, use a new dialog to set metadata for the book: title string, author string at least.

At this point the PPer will choose File>Generate... and get a dialog with a menu of possible "translators", each able to translate a DP text into some other markup. From the user's point of view, now something magical happens: after a moment of high CPU usage, a new book appears. Its filename is "Untitled-n" (same File>New makes) and its contents are the translated text of the starting book. The user then uses File > Save As to save it under some appropriate name and suffix.

The magic that happens under the covers is this. Code that I will write parses the input text, extracting all possible information out of the DP markup conventions. Using a simple and rather elegant API that I have in mind, it will pass these data to the chosen translator. This API will make it so stupidly simple to generate a translated document that anybody could write one.

I will write a plain-text translator and an HTML translator. These can be used as models. Anybody else who wants to write a translator is welcome to do so and to issue a pull request. A translator for fp; a translator for fpgen; a translator for XML; a translator for Markdown or reStructured text or whatever, I don't care, let a thousand goddam flowers bloom.

Holidays

In a couple of days we are off for a month in Scandinavia: Copenhagen, Bergen, Trondheim, Stockholm. In the unlikely event that any reader of this blog wants to follow our very mild adventures, you can do so in our Scandinavian travel blog.

Posting in this blog will resume in August. See you then.

Thursday, June 26, 2014

Doing It "Pythonically" (with added thought)

I'm starting work on the Find module. One of the features of that is that each of the input fields—the Find text and each of three independent Replace texts—has a memory pop-up: a button that pops up a menu of the ten most recent strings used in that field. It's a very nice help when you are alternating between two or three complex regex searches. (And stolen from Guiguts and BBEdit.)

The details of this widget were of course encapsulated in a class; for V.2 it is class RecallMenuButton. In V.1 this was a QComboBox because that is easy to present. I maintained the recent strings in a QStringList, and any time the list was updated the widget could reload itself in one call, self.insertItems(list). However, I wanted it to look like a square button, not a list, so I had it set its own max width to 29px. Under Windows that didn't work with the default style, so I had to set it to a non-native style "CleanLooks", and that no longer exists.

Anyway for V.2 I am using a "command button" which is a button that has an associated menu. I maintain a python list of QActions, one for each string. When the menu for the button emits the aboutToShow signal, I clear the menu and add the list of actions to it.

So I was starting to code the remember() method of this class, which takes a string and, if it is in the list now, deletes it; then adds it to the front of the list. But some considerations:

  • The string might very likely be in the list already as the first item, because it will be frequent to Find or Replace the same string over and over.
  • The list might be empty, at least the first time.
  • The string might not be in the list at all.
  • The list should never exceed MAX_STRINGS in length.

So the V.1 code is about a dozen lines, with a Fortran-like loop over the list to find and remove the string if it exists. But I'd like to do it more "pythonically" this time. I came up with this:

    def remember(self, string):
        new_stack = [act for act in self.string_stack if string != act.text()]
        new_stack[0:0] = QAction(string)
        self.string_stack = new_stack[0:self.MAX_STRINGS]

After composing which I said,

Now, this does not try to short-cut the presumptively common case of the string already being on top of the stack. If the added string is at the front of the stack it will be removed and added back in the same position. Is this a waste of time? Yes, but the test for that special case would look like:

    def remember(self, string):
        if len(self.string_stack):
            if string == self.string_stack[0].text() :
                return

...and that much extra code would surely waste as much time as it saved, or nearly. Or would it? Hmmm.

Edit: no, it would not be worth it and here's why. The Find UI is only going to "remember" a find or replace string if that string is manually edited by the user. When the Find or Replace line-edit field is filled by the program (from the remembered-strings popup menu or from a user-loadable macro button) a flag is cleared. Said flag is only set when the user edits in the field (the textChanged signal). When the field is actually used (Find or Replace action done), the flag is tested and remember() is called only if the flag is set. Thus remember() is only ever called for strings that the user has entered or altered. This greatly reduces both the number of calls to remember() and the chances that a remembered string already exists at any position in the stack. So there is no point in guarding against the input string being at stack[0].

Tuesday, June 24, 2014

Notes functional but ugly

I finished and pushed the notesview module and a cursory unit test for it. I suppose I should test it more thoroughly but I exercised all its features including errors, so pffft.

Part of this job was a bit of refactoring back in the editview module. It has a line-number widget that the user can type into. On hitting Enter, the editor jumps to that line number. Similarly in the image name widget: type '125' and hit enter to jump to the page for image 125.png. These were coded as slots to receive the returnPressed signals from those widgets.

Well, about the only feature the Notes panel has, other than being a plain text editor, is that you can hit shift-ctl-M to record the current edit line number in your notes as {nnn}. And then you can put the cursor in or by a line number and hit ctl-M to jump the editor to that line.

n.b. it started out as [shift-]ctl-L, mnemonic for "line", but that turned out to have a dedicated use in Ubuntu. Locking the screen, I seem to recall; embarrassing thing to have happen when you only meant to note a line number.

Also, shift-ctl-P records the current page's image name as [xxx] in your notes, and you can put the cursor in or near a page name and it ctl-P to jump to that page.

n.b. yes, this is the accelerator for Print and Print Setup in many apps. PPQT doesn't support printing, so too bad.

Jumping to lines or pages of course necessitates that the Notes panel code interact with the Edit panel code. And only belatedly did it occur to me, that the ctl-M action was identical to hitting Enter in the line number widget box, and ctl-P was the same as hitting Enter in the image name widget box. So I had to go into the editor and factor out the guts of each action from the signal-slot it was buried in, make them separate and public methods, so the Notes panel could call them.

UI issues

Both the edit widget and the notes are derivatives of QPlainTextEdit. However they behave differently, and not for any reason I can find.

The editview editor displays selected text under the default highlight color, a nice lemon yellow. Then when you focus out of it (click in some other widget), the selection highlight changes to medium gray.

The noteview editor does not do this. It always displays selected text using the mid-gray "inactive" highlight color—even when it definitely has the focus and should be active. More puzzling, I put code in the focusInEvent() member to get its QPalette and print out the "current color group" and the name of the highlight color. The group was "Active" and the color was the same (#fbed73) lemon-yellow as the other editor. But the actual display is still gray. So this is baffling and I have posted a query at the Qt forums to see if I can get some help.

Meanwhile, on to experimenting with using the Qt Designer to lay out the complex Find panel. Probably won't work, but maybe.

Sunday, June 22, 2014

More Thoughts on Pandoc, etc. [Updated]

Some additional considerations that have occurred to me since writing the previous post.

Plain Text Output

One output format is absolutely required of the Post-Processor: the complete book as a plain text file. Formerly this had to be ASCII, not even Latin-1, with accented characters in an expanded format like [:u] or [c~]. Nowadays, PPers usually provide the ASCII, plus a Latin-1 or UTF-8 version with accented characters in place.

Regardless of the encoding, the plain etext has no formatting markup. It is simply the text with headings set off by newlines, paragraphs wrapped to a 72-character margin, and any other formatting, like poetry, tables, or centered text implemented with spaces and newlines.

PPQT V.1, like Guiguts before it, does quite a nice job of converting DPM to plain etext. I took considerable pride in implementing the Knuth-Pratt algorithm for optimal paragraph reflow, as a point of differentiation from Guiguts. And PPQT reflows tables (coded with the unique PPQT markup, of which more later) nicely also.

It occurred to me to wonder how well Markdown or any other markup accepted by Pandoc did at this task. And to my surprise, it appears that none of them do it at all!

The venerable Markdown for example plainly states that "Markdown is a text-to-HTML conversion tool." Not a plain-text generating tool, an HTML one, which means it hands off all responsibility for paragraph reflow to the web browser. Similarly AsciiDoc and reStructuredText mention only output to HTML, PDF, EPub and the like. (Well, rST mentions output to Python Docstrings, but doesn't say whether it reflows paragraphs for them.)

It seems quite likely—although I would love to be corrected on this!—that it is not possible to go into Pandoc with any markup and come out with a plain UTF-8 text file acceptable to Project Gutenberg!

Edit: According to someone on the Pandoc mailing list, there is indeed a plain-text output "writer", and pandoc -t plain my-input.txt should produce what I want. I haven't installed an actual pandoc so can't try this, for example I don't know what the paragraph reflow is like, or whether there is any way to control widths. So still not certain if plain text is really feasible from pandoc. To be investigated later this year.

Further edit: my innocent question on the Pandoc list has produced this interesting thread with some knowledgeable comments about the history of PG and its format (esp. its ambiguities, which make it very hard to back-convert PG to some markup, one example, use of CAPS for emphasis), and this from John MacFarlane (Pandoc author): "I think a pg writer is a nice idea. It would be fairly easy, I think, to do.... [it] would involve a new option and a few different behaviors." From this I deduce that the existing "plain" output mode is not fully PG-compliant.

And another edit: On the same message thread linked above, John MacFarlane (Pandoc author) now says, "I've started a gutenberg branch on github. It should be fairly easy to add a writer that uses PG conventions." A Fred Zimmerman, presumably a PG or DP contributor? adds "i'm very interested in the gutenberg branch -- great idea." So this may develop into something good, and soon!

This considerably reduces the value of Pandoc to DP. The plain etext is not a negotiable requirement. Not surprisingly the Python programs that process fpn and fpgen do promise plain-text outputs in addition to Epub, HTML, etc.

The question now is: should PPQT V.2 continue the ability to convert DPM to etext? There's a fair amount of code and GUI widgetry behind it. Or should I just assume everyone will be converting to fp[ge]n markup and getting their etext from the batch programs that support those markups?

Translation UI

The relationship between PPQT and the three competing markups (DPM, fpn, fpgen) is quite unclear to me, as should be apparent. I'm thinking I badly need to know what the potential user community actually needs—indeed, if a user community actually exists at all!. If I don't get some useful comments on this blog, I need to go to the forums and make a nuisance of myself to get some answers.

But supposing a community exists, here's what I think I know. First, DPM will not go away. What I'm calling DPM is the sum of the rules in the Distributed Proofreaders Formatting Guidelines. It is deeply embedded in the whole DP infrastructure. I don't think DP will ever rewrite their guidelines to make the volunteer proofers in the Formatting rounds insert fp[ge]n syntax instead.

If I'm right about that, then DPM is what the PPers will continue to receive as their input. The initial stages of PP work—fixing up things separated by page breaks, double-checking bold and italic markups, renumbering and moving footnotes, and running spellcheck—will be done in the context of a DPM text.

Then, most likely, the PPer will want to do a one-time bulk conversion to one of the fp[ge]n markups. This, PPQT could facilitate, in the following ways.

First, provide "File > Export to" command options. As I noted in the prior post, it would not be difficult to convert DPM to either fp* markup. This command would act like File > Save As, bringing up a file-save dialog to pick a name, and writing a new or replacement file consisting of the active (DPM) document translated to another markup: mybook.txt is saved as mybook.fpn.

Second, have a way the user can opt to save metadata with the translated file, mybook.fpn.meta. The meta file would include the pointers to where page boundaries were (adjusted to still be accurate in the translated source file), as well as the notes, bookmarks, vocabulary etc.

This pretty much means that you could now open the fp* markup file and still have your scan images, your notes, word and character tables... not sure about the footnotes. But you could go on editing as before.

Saturday, June 21, 2014

Looking ahead to Pandoc

This is to order my thoughts about output formats and markup systems, and to gather links to these in one convenient place.

History

PPQT is intended to support the work of volunteers finishing etexts for Distributed Proofreaders, aka PGDP. PGDP was one of the first "crowd-sourced" volunteer sites on the internet, organizing thousands of volunteers to find the typos in OCR images of public-domain texts, one page at a time. At the end of the process, a different set of volunteers, the "post-processors" or PPers, have the job of splicing together the individually-proofed pages of each book to make one smooth etext. That's the task that PPQT aimed to assist.

The original PGDP workflow ended with an ASCII etext, no more. There are hundreds (thousands?) of PGDP-proofed etexts at Project Gutenberg. By 2002 or so, most PPers also prepared HTML versions of their texts. And in recent years there's been demand for other formats such as EPUB.

Markup Systems

A text passing through PGDP gets formatted with a particular markup style documented in the Formatting Guidelines. Although PGDP did not label the guidelines as a "markup system" that is what they constitute: a set of rules for representing a book's typography and layout in a plain text document. PGDP never gave their markup system a catchy name; let's call it DPM.

dpm

DPM can be compared to other plain-text markups such as Markdown and reStructured Text. It comes off quite well in these comparisons. The other markups were devised by (mostly) programmers for use in (mostly) documenting code, they don't support typography beyond emphasis, and layout beyond code-blocks. Some of the things that DPM supports and others do not include footnotes, poetry (in the sense of being able to specify line breaks and indentation), and simple right-alignment of text, as in a citation within a block quote.

fpn

In recent years, PGDP volunteers motivated in part by the need to auto-convert etexts to new formats such as EPUB (and in part, I'm sure, by simple N.I.H. syndrome), have devised new markup styles. One is fpn devised by Robert Frank (rfrank at PGDP) and announced in February 2014. This markup uses different syntax to support the features of DPM, and adds a number of minor features. In general Robert Frank favored a terse syntax reminiscent of 1980s TROFF syntax. It would not be difficult to convert a DPM-marked text to one that is marked up with basic fpn using search and replace; for example chapter heads in DPM are marked with four newlines, and in fpn with a leading .h2.

fpgen

A bit earlier, in July 2013, the independent volunteers of PGDP Canada announced their own new markup style, fpgen. Documented in the DP-canada WIKI, fpgen is also the work of an "rfrank", in this case Roger Frank, who tended to favor an XML-like bracketed syntax. Again it would not be difficult to convert a DPM text to an fpgen one; for example a DPM chapter head marked with four newlines would become <heading level='1' id="ch01">Head Text</heading>

my-dpm (blush)

I am not immune to N.I.H. and the temptation to define markup syntax. In PPQT version 1 I supported a number of extensions to DPM, including right-aligned text and a syntax for tables. I designed these features based off of PPQT's model, Guiguts, which had its own simple extensions of DPM. For example, in DPM a block quote is

/Q
Quote text...
Q/

Guiguts extended this to allow specifying the first, left, and right indents so:

/Q[8,4,12]
Quote text with 8-char first indent, 4-char left indent, 
and 12-char right indent...
/Q

My version supported in PPQT V.1 allowed instead,

/Q F:8 L:4 R12
Quote text with 8-char first indent, 4-char left indent, 
and 12-char right indent...
Q/

Guiguts had a simple ASCII table markup; I extended it with additional syntax for column alignments and widths. I also added /R..R/ for right-aligned text and /C..C/ for centered text.

What to support with PPQT?

In a way, the choice of markup hardly matters, because the markup disappears before the book reaches its destination at Gutenberg.org. A marked-up document is a transient state between the original OCR text and the final etext/html/EPUB files. So the choice of markup is merely a convenience for the PPer. It is a way for her to encode decisions about how the book should be formatted: these lines are a poem, these lines are a table; this is emphasized text, etc.

However, the choice of markup is controlled by the software used for creating the final output. Robert Frank has a Python program to convert an fpn text to EPUB and HTML. Roger Frank of PGDP Canada has a, guess what, Python 3 program to convert fpgen to EPUB and HTML.

And both Guiguts and PPQT V.1 have code to convert DPM to HTML.

What should PPQT V.2 do? Should it contain code to convert fpn or fpgen to some other output format? Should it retain the V.1 HTML converter? Or should it be markup-agnostic?

Agnosticism

By markup-agnostic I mean, have only the features needed to make a clean job of finalizing an etext,including:

  • Support for image display alongside text,
  • proofer's notes saved in the metadata,
  • an extensive find/replace,
  • the character and word tables (so important for spell-check and finding other missed errors),
  • the Footnote panel with essential aids for cleaning and renumbering footnotes,
  • automatic calling of gutcheck or (better) bookloupe and a tabular display of the resulting diagnostics,

And just stop there, and say: ok, PPer, now you have a smooth DPM text, you can go on to use the editor and regex find/replace to convert this to any markup you like, and save the file, and process it using software from whomever.

Translators

Another option would be to offer automated translation from DPM to fpn and/or fpgen. And further, it would be possible to foist the job of coding those translators off on the people who want those markups. I've already floated this as an idea to DP Canada: that they could write the fpgen-erator to some API that I could provide.

Enter Pandoc

And then, there's Pandoc. Pandoc is a universal markup-translator. It reads texts in a variety of markup styles, and it writes output in an even wider variety including EPUB, LaTex, and PDF. It is widely used and widely praised, and in principle, could completely replace the programs written by both rfranks, generating any possible desired output format from code that is widely used and supported by an active community.

All that is needed is a way to get a post-processed etext into Pandoc. Unfortunately although Pandoc accepts a number of markups, that list does not include dpm, fpgen or fpn.

Pandoc does offer two general input formats. One is its own internal format represented in JSON format. The other is its own extended Markdown. This markup—let's call it pem for Pandoc's Extended Markdown—supports everything that dpm supports. It does not support quite everything that fpn supports, and I'm not sure about whether it is a superset of fpgen or not.

I can picture PPQT supporting a batch conversion of dpm to pem, for example as a command under the File menu: File > Save to Pandoc, and this as a replacement for the old HTML conversion step. This would convert a dpm text to a pem text and write it. However that wouldn't be much use to a PPer who didn't have access to Pandoc, because only Pandoc supports pem.

If I can work out how to distribute a Pandoc executable with PPQT in any platform, I can also imagine having automated "Save to Epub" and "Save to HTML" commands, that would generate a pem stream and feed it down a pipe to a Pandoc command, with the output to the designated file.

What About HTML?

PPQT V.1 has only one aid for HTML, the HTML Preview panel. When editing an HTML document, you can get a rendering of it in a QWebFrame. It's only a bit more convenient than saving the file and opening it in a separate browser (you avoid having to do a ctl-s in PPQT, click in the browser, click the reload button, then click back to PPQT to edit).

But should V.2 have any HTML support at all? The point in editing HTML inside PPQT was that the auto-converted HTML is awful to look at and needed a lot of hand-tweaking and customizing. But when HTML conversion is pushed off to an external program, whether that is Pandoc or an effort by one of the rfranks, is there any point in editing the resulting HTML output? Or is one supposed to do confine one's tweaking to the marked-up fp[ge]n file, and treat the HTML as a write-only output?

And supposing PPQT retained its HTML preview panel, should it also have a preview panel for EPUB? (Is that possible?) It would be kind of slick if, when you opened a file with an html suffix, you automatically got an HTML preview panel on the right, while if you opened a file with a .mobi suffix, you got an EPUB preview panel...

Also given HTML support of any kind, what about W3C Validation? Back when I assumed V.2 would have its own HTML conversion, I also speculated that it should support an automatic upload to the validation site, automatic download of the error list, and display of that in a panel such that you could click on a diagnostic and jump the editor to the referenced line. Is that still useful when HTML is being generated by an external program? Or is validation and W3C conformity now the responsibility of the external program?

I welcome the comments of any of my readers on these issues.

Friday, June 20, 2014

Now you see it, now you don't

First I added about 4 more tests to the mainwindow script shown in a video in the prior post. One of these revealed an interesting bug. The test was, to use File>Open and pick a file that was already open. The result should be to cause the existing edit view of that file to pop to the front (not to open a second copy of the file).

That worked as it should, but then I combined it with another test. The other test was to choose, not somebook.txt but somebook.txt.meta. This should check that somebook.txt exists, and if it does, open it (treating the book and its .meta file as identical). That test also worked as it should. But when I combined the two, opening somebook.txt.meta when somebook.txt was already open, I got two open copies of somebook.txt. Huh?

It was a matter of when the different checks were made, of course, and a small rearrangement of code fixed it. But it's fun taking an attack stance toward my own code and breaking it.

Then I moved on to the notes panel, and the first test of a disappearing Edit menu and it works a treat. I had to modify the mainwindow so that it offered a module method for retrieving the Menu Bar. Then the notes widget could do this:

        ed_menu = QMenu(C.ED_MENU_EDIT,self)
        ed_menu.addAction(C.ED_MENU_UNDO,self.undo,QKeySequence.Undo)
        ed_menu.addAction(C.ED_MENU_REDO,self.redo,QKeySequence.Redo)
        ...cut, copy, paste...
        self.edit_menu = mainwindow.get_menu_bar().addMenu(ed_menu)
        self.edit_menu.setVisible(False)

Adding a menu to the menu bar returns a "menu action", and a QAction has a property visible which, if it is set to False, means the action, or in this case the menu, is not shown. Further down in the notes panel code is:

    def focusInEvent(self, event):
        self.edit_menu.setVisible(True)
    def focusOutEvent(self, event):
        self.edit_menu.setVisible(False)

And it works just loverly. Click in the Notes panel, the Edit menu appears. Click in the edit view, it disappears. When I finish building notes view I will go back into Edit view and give it an Edit menu of its own, with quite a few more actions.

Tuesday, June 17, 2014

Mainwindow test and Sikuli movie

Finally completed a Sikuli script to drive the mainwindow code through much of its function. In the course of this, found several bugs, but that's what testing is for he said through gritted teeth. Then used Screen Cast O Matic to make a 1-minute movie of the test running, because Sikuli is kind of fun to watch.

Roadmap

Where to now? Well, the next piece of code is the Notes panel. When that is in place I will have two text editors that will contend for the use of the Edit menu, and I can experiment with my tentative plan for operating multiple Edit menus.

After that, the Find panel, which if you know PPQT V.1, is a rather complex chunk of UI, and a lot of its code is dependent on [Py]Qt4 features and has to change. No more QStrings, no more QRegExps. All Python strings and the regex library. So that'll be a bit of work.

With Notes and Find, V.2 will be functional enough that a person could use it for editing a book. However, I don't expect to get to that point in this month of June. Unfortunately (for PPQT but not for me!) I'm going to be on holiday all July, so I would guesstimate completing Find around mid-August.

Saturday, June 14, 2014

Well, that was easy -- I think

The issue with the focus-in was bugging me during the night. Lying in the dark with my eyes closed, debugging. Not productive! But got up this morning and modified the event handler in my editview, the one that up until a little while ago read:

    def eventFilter(self, obj, event):
        if event.type() == QEvent.KeyPress :
            return self._editorKeyPressEvent(event)
        if event.type() == QEvent.FocusIn :
            self.focusser()
        return False

Slight digression here. Because the actual QPlainTextEditor is part of a layout created with the Qt Designer, there is no way to add any kind of event handler to it. In particular no way to add an overriding keyPressEvent() method, which this editor needs in order to support things like the go-to-bookmark and zoom-font keystrokes. Also no way to provide an overriding focusInEvent() handler which I needed as a signal that it was time to display all the "panels" associated with that particular book (or so I thought).

So instead, the editview module, after initializing its Designer-written UI, installs an "event filter" on it, the above code. An event filter is a "man in the middle" that returns True if it has dealt with the event, or False to say, send it along to the original address for handling. All events directed to the UI widgets—the editor and the line-number and page-name widgets—pass through this code first. It picks off the key presses and looks for special keys, returning False for those it didn't handle.

It also used the FocusIn event as a sign it should call the focussing function it was passed by the Book when instantiated. (Which calls the main window, which displays all the related panels, long story.)

That's what I found yesterday wasn't reliably working. When the editview was created and assigned to a QTabWidget tab, it got a FocusIn. Then it got no more of those until the user manually clicked in the editor. Even if you brought another tab to the front, hiding this one, then brought it back: no events unless it had received at least one real click. You could tell it was in this state because the QPlainTextEdit widget did not display a blinking cursor. After a click, it would. Or, I found out, after a tab key. If you press tab an unpredictable number of times, eventually the new editview would get a FocusIn. I'd tried adding calls to self.setFocus(Qt.TabFocusReason) in several places during initialization, to no avail.

So what I did this morning was add to the above code a call to a routine I'd written some time back, to display the event stream. Here's the event sequence, after opening two books, but before clicking in either of them. Just clicking the tabs to alternate between the two editviews, the events are:

2two_test.txt event type  Show 
2two_test.txt event type  UpdateLater 
2two_test.txt event type  Paint 
2two_test.txt event type  Hide 
3three_test.txt event type  Show 
3three_test.txt event type  UpdateLater 
3three_test.txt event type  Paint 
3three_test.txt event type  Hide 

After having manually clicked in each widget, the one last clicked upon gets FocusOut and FocusIn events, but the others don't. All told, it seems that the FocusIn event is nowhere near as consistent or reliable as I'd assumed. There are no doubt various window-system-related issues that I'm not aware of. (And given that thought, very probably there are platform differences as well!)

What the event trace printout does show to be reliable, is the Show and Hide events. So I simply changed my event filter to say, if event.type() == QEvent.Show... and things began to work the way I expect them to.

Friday, June 13, 2014

Me 'n Sikuli are good; Recursive bug

So it turned out that Sikuli actually had a feature that made all my laborious capture code yesterday unnecessary. Completely so. And on top of that the feature is discussed in their top-level "hello world" tutorial. Maybe I'm losing it. Anyway you can capture parts of another app very nicely, interactively. Which I started doing today and got about a third of the way through a mainwindow test before I got distracted by a couple of bugs that turned up. Which is what testing is for, ok.

One bug was quite interesting although hard to describe. The symptom was that, when two books were open and one opened a third, suddenly the second and third books were sharing the same set of "panels". Same image display, primarily, as that's the only functioning panel, although the other placeholder panels were also swapped. So after opening book 3, it and book 2 had the identical image displays, even though they should show different images. Verrrry peculiar.

Well, long story short, after about a half hour of debugging I realized what had to be happening was that the "focus me" function, which is called when any book's edit window gets the focus in order to display that book's other panels, was being called recursively! It would be half-way through focusing the newly opened book when it got called again to focus the newly opened book. A little more tracing and I figured out why that happened: when the new book's edit window got added to its tabset, it received a focus-in event from Qt, so it called the focus-me routine which was still unfinished. Hilarity ensued.

That was easy to fix, once figured out. But another is still pending. This is another issue of getting the Qt focusIn event. As just recounted, one of those events is delivered to an edit window the instant it is added to a tab set. And later on, when you click on the tab for a book's edit window, it should get another focus-in event. And it does—but only if it has had at least one real user mouse-click.

I put a print in the focus-event handler to prove it. Here's the sequence: open three books. Each one gets its edit panel added to the edit tabset. And each one at that time gets a focus-in event.

Now, click back and forth on the three tabs. Each edit panel is revealed appropriately. But they do not get a focus-in event. Click all the tabs you want; they are displayed but they don't get the event, and therefor they don't call focus-me and their image and other panels don't get displayed.

Now click the mouse in one of the panels. It gets a focus-in event and from then on it also gets a focus-in event every time its tab comes up. No manual click: no events. One manual click: events.

I've tried variations on calls to QWidget.setFocus() during the creation of the edit view, but it doesn't have any effect. Very mystifying.

Tuesday, June 10, 2014

Tidying up, but trouble with Sikuli

Spent some time cleaning up test files, editing and re-running prior unit tests, and generally making the Tests folder nice. Picked off a couple more minor bugs.

Then started the main effort, a Sikuli script to really test all the mainwindow features accessed by the File menu. I've already been using these in casual tests in coding but I want to see them all exercised in formal fashion. I thought Sikuli could do this, after my previous success in using it with the edit view window. Wow, that was April. How time flies.

So it turns out Sikuli isn't so helpful when testing menus in the Mac OS menu bar. The problem is the interactive capture of target app menus. Running in the Sikuli IDE you want to say "click on this" and it gives you a view of the whole screen minus the Sikuli windows, and you are supposed to be able to drag over it to select where to click. Unfortunately, the left-hand end of the menu bar still has the Sikuli application's rather lengthy name and no File menu. So, where to tell it to click?

Well, it works to drag over a rectangle about where the File menu will be when the target app is in front, so executing that script does cause the target's File menu to open. But now I want to tell Sikuli, "find( - image of open file menu here - )". But when I go back to the Sikuli IDE and activate the "find" dialog and it tells me to select a piece of the screen—the File menu has disappeared.

This wasn't a problem when capturing the pop-up menu in the edit view window. A pop-up context menu stays open until dismissed. But the File menu from the Mac OS Menu Bar does not stay open when you switch away from the app that owns it. So I'm not seeing how to get Sikuli to verify the contents of my File menu or click on items in it. Put a question in and we'll see.

Monday, June 9, 2014

Unit testing

Spent a pleasant 4 hours writing a detailed unit test driver for utilities.py, the module into which I have chucked everything to do with QFile and QFileDialog and friends. The testing turned up several bugs, which is what it's all about.

This was the module for which I developed my variation on defaultdict, of which I was quite proud. Proud enough that I also posted it in reddit/r/python. And today in exercising the functions that use that clever class, I discovered that it wasn't working! Well, it did work in the sense that it always returned correct answers, but it didn't in the sense that it wasn't doing any caching of results. It always calculated the result anew for every key, quite the opposite of a cache.

This was due to my misunderstanding what the Python docs were saying about defaultdict. In my defense I'll say that the Python writeup is not all that clear. But still, just stepping through the code a couple of times to verify it did what I thought, would have revealed instantly that it didn't. Anyway, I went and updated both this blog and the reddit posting.

That was one bug. Several others were simple typos and mis-codings in code that hadn't be executed yet, typically the code that logged errors. Then I found what looks a lot like a bug in PyQt 5.3. I've posted it to the mailing list and we'll see. The test case I posted is so simple it's hard to imagine how it could be an error by me.

Saturday, June 7, 2014

Plugging along

In two marathon sessions — well, marathon for me, about four consecutive hours each — I completed functional coding of the mainwindow, all the elements of the File menu (Open, Recent->list of recent, Save, Save As, Close). This entailed major and frequent revisions and additions to the "utilities" module, where I have corralled all uses of QFile, QFileDialog, QMessageBox and the like. And some changes to the Book object, trying to get its relationship to the main window just right, efficient and also clear.

But that's pretty well done now. Next big task is to write unit test drivers for both utilities.py and mainwindow.py to ensure every branch is exercised. For the main window that will mean using Sikuli to do visual testing of the GUI, and that's time-consuming (but mega-fun to watch it run when finished). This will take much of next week.

Edit Menu(s)

Meanwhile I've been mulling how to handle the Edit menu. It was a problem with V.1; I never could, for example, get the Edit menu to work right when one of the panels, like the Word panel, had the focus.

Working on the File menu I realized that the menu actions are key. Each menu consists of a list of QActions; and each QAction has a "triggered" signal that is bound to a slot in some QWidget derivative. Well, what happens when the widget to which, say, the Edit>Cut action is bound, is not the focused widget?

This is particularly important when contrasting the Edit menu when a document (QPlainTextEdit) is focused, versus when, say, the Word table (QAbstractTableModel) has the focus. Edit>Copy in the first case means, the current selection of text to the clipboard. In the second, the current selection is some number of cells available as a list, and there may well be processing needed to format them before their value(s) go to the clipboard.

For the editor, there are Edit menu actions I want to support like to-lowercase, to-uppercase that are not appropriate for other panels, and shouldn't even appear (or at least, not be enabled).

And then there's the issue of having multiple books open (which changes everything, as I've lamented frequently before). So let us say that the Edit>To &Uppercase menu action is bound to a slot in the editview object for Book 1. And now the user clicks in the edit tab bar to make Book B the focus of his typing. Different editor contents, different selection, all handled in a different goddam object. Choice of Edit>To &Uppercase now should affect the current selection in Book B, but how is Qt to know that? How to keep it from sending the signal to the editview object representing Book 1?

So I've about concluded that every widget that supports Edit actions needs its own unique Edit menu. And somehow (!) when any such widget gets the focus (focus-in event), it puts its own Edit menu into the application's menu bar; and when it loses the focus, it removes it again.

This takes care of the problem of signal-binding. Each widget that wants an Edit menu, creates its own Edit menu and populates it with such actions as it supports, binding each to the slots in its own code where it does things its way, upon its own data.

But how to swap Edit menus in the menu bar, quickly and simply? I note that what the QMenuBar supports is not menus per se but menu Actions. And I note that QAction supports a property "visible" with a method "setVisible(bool)". So tentatively I am thinking I will have any Edit-supporting widget, upon creation, add its own Edit menu to the app's menu bar. There might be a dozen Edit menus in the menu bar! But it adds it with visible False, and sets visible True on focus-in and False on focus-out. So hopefully only one (or possibly zero) Edit menus will actually be visible in the menu bar.

Does QMenuBar actually support such shenanigans? Damn if I know! If it doesn't, plan B is to have each widget add its menu on focus-in, and remove it again on focus-out, which seems uglier.

Stay tuned, it could be a rocky night...

Tuesday, June 3, 2014

Default value of f_of_x

Whoof! What started as a modest revamp of my first-draft file handling code has turned into a major exercise of revising and refactoring. But I also found a cool thing to share!

The main window manages the File menu. An important feature of the File menu, as I mentioned yesterday, is the sub-menu of Recent documents. This has to be built and populated dynamically, upon the signal aboutToOpen from the File menu's action. Which means that the program should not spend much time formatting that menu.

Input to the Recent menu is a list of previously-opened files, held as a list of path-strings. The list is kept in sequence by use-time, with the most-recently-used files first. As I mentioned in the prior post, this list can't be sorted by filename, because it might contain two files with the same name but different folder paths.

When populating the menu, the code cannot assume that these files exist. One might have existed yesterday, but it was on a USB stick that is not mounted at this time. So that file should not be shown in the menu. However, a file should not be discarded from the list just because it is not available one time. Before the next time the File menu is shown, the USB stick could be inserted. Then the file should be shown as available.

So there's the situation: at the instant the user clicks on File in the menu bar, the code must run down a list of as many as 10 filepath strings and determine for each,

  • Does it exist?
  • If so, make a menu action for it

Of what should the menu action text string consist? In BBEdit, the string is "Filename emdash folderpath". In the Wing IDE that I use, it's "Filename (folderpath)". Mac default apps like TextEdit and Numbers don't bother with paths; they just show filenames. OpenOffice has "n: fullpathstring" for n from 1 to 9. PPQT V1 also displays an index number, "n filename" (no colon). The index number is also set as the accelerator key for Windows, so a really hot Windows user could key alt-F alt-R alt-3 to open the 3rd most recent file. A pretty silly feature IMO. I believe for V2 it will be "n filename (folderpath)".

This means that the logic of populating the sub-menu is, for each pathstring in the list:

  • if the file isn't accessible, continue
  • split the path into (filename, folderpath)
  • make the string "{0} {1} ({2}).format(index,file,folder)"
  • make a QAction and add it to the menu

Caching

ALERT! The following had a major although un-obvious error, which is corrected below.

Checking for accessibility means asking, does the file exist, is it really a file (not a directory), and is it readable? These are methods of the QFileInfo object, as is the ability to fetch the filename and folderpath separately. But that means instantiating a QFileInfo, a moderately expensive process. Since the same set, or an overlapping set, of path strings will be checked again and again, it makes sense to cache the QFileInfo objects. The first time a path is checked, make a QFileInfo. If (when) it is checked again, re-use that QFileInfo. Good concept; how to implement?

There exist nice "memoization" solutions for Python, generic code that lets you put a decorator @memoize on a function to automatically cache the result that corresponds to each unique argument. However in this case, I don't want to cache the result for a given path string, I want to cache an intermediate value, the QFileInfo for that string, so I can use it in a different function instead of recreating it. So memoization is out.

What I need is something like collections.defaultdict: a dictionary whose keys are pathstrings, and whose values are QFileInfo objects upon those pathstrings. When the dictionary is queried for a given key, and key is not present, it should set a default value of QFileInfo(key).

Unfortunately, this is not what collections.defaultdict provides. It allows you to provide a "default factory" that is a fixed value, or a classname such as list, in other words the default factory is a constant function f(k). What I want is a defaultdict that provides a default factory f(key), where the default value is a function of the missing key. And indeed, defaultdict provides a way to do that, by overriding its __missing__() method. Update: When I first wrote this, I thought that __missing__ did the whole thing: provided a default value for the missing key, and stored that value as the value of the key. This was not so! It is necessary for __missing__ to save the default value explicitly, as shown below.

Without further ado, here is generic default dictionary that does this:

from collections import defaultdict
class key_dependent_default(defaultdict):
    def __init__(self,f_of_x):
        super().__init__(None) # base class doesn't get a factory
        self.f_of_x = f_of_x # save f(x)
    def __missing__(self, key): # called when key is not defined
        ret = self.f_of_x(key)  # calculate the default value of key
        self[key] = ret         # save for future uses
        return ret

To use this, create an object of this class passing the name of a function of one argument that returns an appropriate default value. In my case I used it so:

_FI_DICT = key_dependent_default( QFileInfo )

def file_is_accessible(path):
    global _FI_DICT
    qfi = _FI_DICT[path]
    return qfi.exists() and qfi.isFile() and qfi.isReadable()

# Split a full path into a tuple (filename, folderpath)
def file_split(path):
    global _FI_DICT
    qfi = _FI_DICT[path]
    return ( qfi.filename(), qfi.canonicalPath() )

The mainwindow will call file_is_accessible for some path, and soon after, will call file_split for the same path. The first time, a QFileInfo is created. On every subsequent call, the same QFileInfo is interrogated.

The code for key_dependent_default imposes no limit on the size of the dictionary. In my case, there will be at most 10 recent files in the settings when the app starts up, plus as many unique files as the user opens during the session. So I am not concerned about the size of the dictionary. But it would be possible to code the class so it limited its own size. When it reached the limit, it could simply stop caching new items, or it could delete one of its own members at random, or with more work, it could delete the least recently used member.

Monday, June 2, 2014

Nesting in Parentheses

Today I meant to tear into both the Book and Mainwindow modules for major changes. As mentioned in Friday's post, mainwindow needs to stop keeping track of files as a dictionary {filename:path-to-file}. Instead, I worked out over the weekend, it must keep lists strictly of entire, absolute (non-relative) path-strings. Nevertheless, there are often times when the code, given a filepath, needs to get the basename from it and similar tasks handled, in Python, by members of os.path. There is an interesting comparison to be drawn between the facilities of os.path and Qt's QFile, QDir, and QFileInfo. Perhaps I'll find time to get into that tomorrow.

Because today I sat down at my desktop machine with its Cinema Display to do some development there. Most of the coding I've done so far this year has been performed slouched on my spine with my laptop on my tummy. But for making major changes spread over two biggish files, I wanted the big screen, better keyboard, and trackball of the desktop machine.

Almost immediately this ran into problems. Running any kind of test produced, first, an error on import regex, the extended regular expression module. Oh. Hadn't installed that on the big machine. Ok, do that: and then followed a half hour's digression attempting to get easy_install (misnomer!) and pip (it's not a pip) to work, and finally downloading the module from pypi and manually running its setup.

Restart and immediately... No module named hunspell. Oh, right. I never installed the hunspell interface on this machine. So do that. Almost immediately after, No module named blist. Oh. Right. So back to pypi and get the blist package and install it, and it produces a baffling error message. Its __init__.py imports blist._blist which is right there next to blist itself but Python can't seem to find it. In trying to figure that one out, I happened to notice that in /Library/Frameworks/Python/Versions, the link Current pointed to 2.7, not to 3.3, although 3.3 is being used. Hmm. sudo rm Current and suddenly blist imports like a charm.

Well, all righty then. Back to trying to run book_test. Oops, an error from code that's been working for weeks,

self.scroll_area.setSizeAdjustPolicy(QAbstractScrollArea.AdjustToContents)

...produces an error, "QScrollArea object has no member setSizeAdjustPolicy. Well, it certainly does, or I wouldn'a coded it. Verify that by looking in the Qt Assistant. Yup, there it is, oh wait, "added in Qt 5.2". Ooooohhhh. Hastily enter in the Wing IDE Python window,

from PyQt.Qt import PYQT_VERSION_STR
PYQT_VERSION_STR
'5.1.1'

So although this machine has Qt 5.2, it has a back-level PyQt. Which means, it's time to upgrade the machine to PyQt/Qt 5.3. Download the latest Qt, Sip, PyQt, run their configure, make, and install steps. And there goes most of the rest of the afternoon. Sure glad I don't have an anxious manager wanting to know if I'm going to meet my delivery goal...