Tuesday, April 29, 2014

Faster, faster!

In the preceding post I described the fairly naive algorithm I've been using to find the white borders of a scanned page's image, in order to automatically scale it to fill the image display window. The time taken to scan about a million white pixels was rather distressingly long. In fact it was worse than I described there.

A New Baseline

I was testing the find_image_margins() function by calling it through the profiling function: cProfile.run('find_image_margins(qimage)', 'profdata') but I hadn't actually looked at the margins it was returning. When I did look, I discovered that the left margin was only 2, when to the eye it should be at least an inch-worth, 150 or more. So I looked closer at the test image and found there was a little patch of black at the extreme lower left corner. The test image originally had some black crud on the left side and I'd cleaned it up in Photoshop but had missed this little patch.

As a result, at the end of the first inner loop, from the middle of the left side down, it found an unrealistically small left margin of 2. Then the second inner loop, from the top to the middle, never looked past pixel 2, which made it unrealistically fast.

After erasing the speck on the image, I made one logical change to the inner_loop code. When it stops, it has found a black patch 3 pixels wide, and its margin variable indexes the innermost pixel of that patch. It was returning that value, but it ought to return the index of the outermost pixel of the three. So it now read:

        pa, pb = 255, 255 # virtual white outside column
        for row in row_range:
            for col in range(col_start, margin, col_step):
                pc = color_table[ ord(bytes_ptr[row+col]) ]
                if (pa + pb + pc) < 24 : # black or dark gray trio
                    margin = col # new, narrower, margin
                    break # no need to look further on this row
                pa, pb = pb, pc # else shift 3-pixel window
        return margin - (2*col_step) # allow for window width

With that change and a realistic test image, cProfile now returned the following numbers:

         1171608 function calls in 2.348 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    2.282    0.571    2.348    0.587 imagetest.py:20(inner_loop)
  1171591    0.066    0.000    0.066    0.000 {built-in method ord}
        1    0.000    0.000    2.348    2.348 imagetest.py:12(find_image_margins)

Bottom line: 2.35 seconds to examine 1.17 million pixels.

Getting rid of ord()

One thing that bugged me about the above code is the need to take the ord() of the pixel byte, in order to use it as a list index. This is because Python, for reasons best known to itself, gives an error if you try to use a byte value as an index (and not an IndexError, either, but a typeError; try it: [1,2][b'0']). Well, what structure will accept a byte as an index? A dictionary. I changed the list comprehension that created the color table, into a dict comprehension:

    color_table = { bytes([c]): int((image.color(c) >> 8) & 255)
                     for c in range(image.colorCount()) }

The bytes() function requires an iterable, hence it is necessary to write bytes([c]), converting the scalar integer c into a list so that bytes() will make it into a scalar byte. But whatever; the extra code at this point is executed only once per color. The overhead is trivial compared to code that is executed once per pixel. That code could now read:

                pc = color_table[ bytes_ptr[row+col] ]

Using the byte returned by the voidptr directly to get a color. Did it save any time?

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    2.179    0.545    2.179    0.545 imagetestB.py:20(inner_loop)
        1    0.000    0.000    2.179    2.179 imagetestB.py:12(find_image_margins)
        1    0.000    0.000    2.179    2.179 {built-in method exec}

Yes, a little. Total time dropped from 2.348 to 2.179, a saving of about 7%. I thought and thought about some way to code the three-pixel window scan in a better way, and could not. If I've missed something, please tell me in a comment! But now I turned my attention to the other attack, reducing the number of pixels examined.

Skipping rows

To look at single pixels is to examine a page image at extremely fine detail. Is there a valid character that is less than three pixels tall? No. So why am I looking at every row? To look at every row of pixels means looking at every line of characters at least four times, more likely eight or ten times. Let's skip a few!

It turned out to be trivially easy to look at every second row of the image. Recall that the call to the inner_loop passes a range iterator:

    left_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride),
                    0, int(cols/2), 1

The first argument to range is the starting value, in this case, the byte-offset to the middle row of the image. The second is the end value that the range output will never exceed. In this example, that's the offset to the last row of the image. The third is the step value, the number of bytes from one row of pixels to the next. In order to look at only every second row, I added two characters to that statement:

    left_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride*2),
                    0, int(cols/2), 1

This had a good effect on the cProfile stats:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    1.125    0.281    1.125    0.281 imagetestB2.py:20(inner_loop)
        1    0.000    0.000    1.125    1.125 imagetestB2.py:12(find_image_margins)
        1    0.000    0.000    1.125    1.125 {built-in method exec}

From 2.179 seconds down to 1.125, a reduction of 49%. Not a surprise: do half the work, take half the time; but still, nice. And the returned margin values were almost the same.

Shrinking the image

It would be easy to try skipping three of every four rows, but that might result in missing something like a wide horizontal rule. Instead, I thought, what about scaling the image down by half, using a smooth translation? That would have something like the effect on the eye of holding the page at arm's length: shrink it and blur it but retain the outline. To run the inner loops on a half-size image would mean looking at 1/4th the pixels (half the columns of half the rows). The returned margins could be scaled up again.

I added the following code to the setup:

    scale_factor = 2
    orig_rows = image.height() # number of pixels high
    orig_cols = image.width() # number of logical pixels across
    image = image.scaled(
        QSize(int(orig_cols/scale_factor),int(orig_rows/scale_factor)),
        Qt.KeepAspectRatio, Qt.SmoothTransformation)
    image = image.convertToFormat(QImage.Format_Indexed8,Qt.ColorOnly)

I found that the QImage.scaled() method could change the format from the Indexed8 that it started with, so it was necessary to add the convertToFormat() call to restore the expected byte-per-pixel ratio. (Which meant, it was no longer necessary to enforce that format before calling this find_image_margins function.)

The rest of the setup was just as before, setting the row and column counts and the stride, but for this reduced image. The final line was no longer return left_margin, right_margin but this:

    return left_margin*scale_factor-scale_factor, right_margin*scale_factor+scale_factor

This worked, and returned almost the same margin values as before, different by only a couple of pixels, much less than 1%. And the total execution time was now 0.333 seconds, distributed as follows:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.294    0.074    0.294    0.074 imagetestC.py:20(inner_loop)
        1    0.032    0.032    0.032    0.032 {built-in method scaled}
        1    0.006    0.006    0.006    0.006 {built-in method convertToFormat}
        1    0.000    0.000    0.333    0.333 imagetestC.py:12(find_image_margins)

That was such a success, running in 30% of the previous best time, that I tried increasing the scale_factor to 4, with this result:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.167    0.042    0.167    0.042 imagetestC.py:20(inner_loop)
        1    0.031    0.031    0.031    0.031 {built-in method scaled}
        1    0.001    0.001    0.001    0.001 {built-in method convertToFormat}
        1    0.001    0.001    0.001    0.001 imagetestC.py:52(<dictcomp>)
        1    0.000    0.000    0.200    0.200 imagetestC.py:12(find_image_margins)

Total time of 0.2 seconds. That's a reduction of only 1/3 from the scale factor of 2, so clearly we are having diminishing returns. But the total is now only 8% of the execution time of the original algorithm. Not too shabby! The returned margins were still the same. The total time is now small enough that the contribution of the dictionary comprehension for the color table is noticeable. And the largest component, after the inner loop, is the QImage.scaled() call.

This is fast enough that there should be no objectionable delay on clicking the To Width button, even on slow hardware. I will proceed to integrate this into the imageview module and its unit-test. When that's done, I will be able to proceed to a preliminary version of the main window!

Monday, April 28, 2014

Scanning the pixels

Last post I described how you can get a "voidptr" through which you can PEEK the bytes of an image. This is needed to implement zoom-to-width and zoom-to-height buttons on the imageview. The code that I wrote for this in V.1 never felt right. It's a pure CPU-burning FORTRAN-style loop process; even more so than the character or word census of a large text. The scan image for a typical page is several megapixels (the test image used below is about 1800x2700, or 5Mpx) and the only process I could find for determining the margin has to access a good fraction of them. Clicking "To Width" or "To Height" on an older laptop incurs a noticeable pause of a second or more. I want to shorten that.

(Note that in what follows, the code for to-width and to-height is virtually the same, just substituting "top" and "bottom" for "left" and "right". I'm writing about to-width; whatever lessons I learn are immediately applicable to to-height.)

Starting point

Let's look at the code as it is now. The point is to find out how much of the left and right margins of the image are all-white, then scale and center the image to exclude them. To find the left margin, I look at the pixels of each row from left to right, stopping at a black spot. Once I've found a black spot, I never have to scan farther to the right than that point. I keep going to successive rows, hoping to find a black spot even more to the left. After looking at all rows I know the pixel count to the left of the leftmost black spot. The same logic applies to the right margin: look at all rows from right to left, in order to find the rightmost black spot.

An early complication was that finding a single black pixel threw up false positives. Scan images often have one-pixel, even two-pixel "fly specks" outside the text area. So I modified the code to stop only on a dark spot of at least 3 pixels, which added some overhead. I added a heuristic. Noting that many pages have large white areas at the top and bottom, I started the scan with the middle row, hoping to quickly find some black pixels nearer the left margin.

The following code is heavily rewritten and refactored. You can view the original if you like, but don't bother. The original has four nearly-identical loops for scanning the left margin, middle to end then top to middle, then the right margin the same. These loops—I realized yesterday—differ only in the start, stop and step values of their controlling ranges. So I factored the inner loops out to this:

    def inner_loop(row_range, col_start, margin, col_step):
        '''
        Perform inner loop over columns of current margin within rows of
        row_range. Look for a better margin and return that value.
        '''
        pa, pb = 255, 255 # virtual white outside column
        for row in row_range:
            for col in range(col_start, margin, col_step):
                pc = color_table[ ord(bytes_ptr[row+col]) ]
                if (pa + pb + pc) < 24 : # black or dark gray trio
                    margin = col # new, narrower, margin
                    break # no need to look further on this row
                pa, pb = pb, pc # else shift 3-pixel window
        return margin

The key statement is pc = color_table[ ord(bytes_ptr[row+col]) ]. From inside, out: bytes_ptr[row+col] peeks the next pixel value which is an index into the color table. ord() is needed because the voidptr access returns a byte value and (for reasons that escape me) Python will not permit a byte type as a list index. The color_table is a list of the possible GG values from RRGGBB, smaller numbers being darker. I'll talk about it in a minute.

The operation of this code should be fairly clear: slide a 3-pixel window along the pixels of one row up to a limit; when they comprise a dark spot, break the loop and set a new, more stringent limit. Return the final limit value.

Now let's look at the code that sets up for and calls that inner loop.

def find_image_margins(image):
    '''
    Determine the left and right margin widths that are (effectively)
    all-white in an image, returning the tuple (left,right). The image is
    presumed to be a scanned book page in the Indexed-8 format, one byte per
    pixel, where a pixel value is an index into a color table with 32-bit
    entries 0xAARRGGBB (AA=alpha channel).
    '''
    rows = image.height() # number of pixels high
    cols = image.width() # number of logical pixels across
    stride = (cols + 3) & (-4) # scan-line width in bytes
    bytes_ptr = image.bits() # uchar * a_bunch_o_pixels
    bytes_ptr.setsize(stride * rows) # make the pointer indexable
    # Get a reduced version of the color table by extracting just
    # the GG values of each entry. If the image is PNG-1, this
    # gives [0,255] but it could have 8, 16, even 256 elements.
    color_table = [ int((image.color(c) >> 8) & 255)
                     for c in range(image.colorCount()) ]

The first five lines of code set up the voidptr to the image bytes. The stride value is used because the Qt documentation notes that every pixel row occupies a whole number of 32-bit words.

Note the remark about the "presumed" image format? In early testing I forgot to force the image to Indexed-8. The calculation stride * rows yielded about 5MB but that was much larger than the true memory size of a compressed 1-bit PNG file. The result was that the first time the inner loop tried to access a byte of a middle row—Python seg-faulted. Yes, you can bring down, not just your own app, but the whole interpreter, by mis-using a voidptr.

If the image file is really monochrome and stored as PNG-1, the color table will have two entries. But I can't require that or assume that. It will be a PNG but it might be a PNG-8 with actual colors. So make a color table of just the GG values (Green is commonly used as a proxy for pixel brightness) and index it by the pixel value.

Now, to work:

    # Some pages start with many lines of white pixels so in hopes of
    # establishing a narrow margin quickly, start at the middle, go to
    # the end, then do the top half. Begin: left side from the middle down.
    left_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride),
                    0, int(cols/2), 1
    )
    # With hopefully narrower margin, scan from the top to the middle:
    left_margin = inner_loop(
                    range(0, int(rows/2)*stride, stride),
                    0, left_margin, 1
                    )
    # Now do exactly the same but for the right margin, taking columns
    # from the rightmost, inward.
    right_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride),
                    cols-1, int(cols/2), -1
                    )
    right_margin = inner_loop(
                    range(0, int(rows/2)*stride, stride),
                    cols-1, right_margin, -1
                    )
    return left_margin, right_margin

Each call to the inner loop passes a range of rows to examine—in the form of pre-calculated byte offsets—and the three factors for a range of columns. The end result is the width of the white margins of the page.

Timings of this code

When I use the ctime package to time one application of find_image_margins() to a representative page image, I get these not awfully helpful stats:

         871938 function calls in 1.921 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    1.815    0.454    1.921    0.480 imagetest.py:20(inner_loop)
   871922    0.106    0.000    0.106    0.000 {built-in method ord}
        1    0.000    0.000    1.921    1.921 {built-in method exec}
        1    0.000    0.000    1.921    1.921 imagetest.py:12(find_image_margins)

The remaining lines are all-zero times. So: running under cprofile, this takes almost two seconds to run. On a 2.8GHz CPU, that's roughly six billion instructions. Wow! This would have been completely infeasible on, say, my old Mac Plus where it would have taken something like 20 minutes to complete on its 8 megahertz CPU. On my current Macbook Pro, it's a minor annoyance.

The count of 871,922 calls to ord tells how many pixels were examined in the inner loop. The total time of 1.815 tells how much time the inner loop spent on those pixels, or about 108,000,000 microseconds on 900,000 pixels giving 120 microseconds per pixel examined.

Optimizing

I see two approaches to optimizing this program. One: reduce the 120μsec spent in the inner loop on each pixel. Two: reduce the number of pixels examined in the inner loop. Reducing either number by half would halve the program execution time.

I believe I will spend a bit of time exploring both approaches, and report in the next post.

Saturday, April 26, 2014

Remembering PEEK and POKE

Remember PEEK and POKE? Back in the Precambrian (or pre-PC) era of computing, the primary way to program the Apple and TRS-80 and Commodore machines was with BASIC. And really the only way to do the cool stuff was to get in and diddle system memory and hardware registers. The way to read memory was PEEK int, which returned the value of the byte at int. And with POKE you could store a value into a byte of memory. Ahhh... good times...

Well, crazily enough, PyQt offers an equivalent: a way to get direct access to data in memory. It's called a voidptr and is a type defined by SIP, PyQt's interface to the C++ world of Qt. You can get a voidptr from PyQt wherever the Qt documentation says a method returns uchar *. Such places are not common, but one emerged when I first implemented PPQT's scan image display.

A major feature of PPQT is that it shows the scan image from which the book's text was OCR'd, alongside the text itself in the editor window. As you move the edit cursor from page to page, the image viewer tracks it, flipping from scan image to image. It's one feature that PPQT has over its predecessor Guiguts, which requires the user to have a separate image-display app.

Here is the V2 imageview undergoing unit test. As you see it offers an adjustable zoom. Sometimes, especially when proofing Greek or a small-print footnote, the user needs to zoom in and peer closely. But usually you just want the whole page visible so you can scan for italics and bold, or check hyphenations.

The To-Width and To-Height buttons are supposed to set the zoom automatically so that the printed part of the page just fills the window side-to-side or top-to-bottom. When I first implemented these buttons back in V.1 I found it quite the coding challenge.

Here's what To-Width has to do:

  1. Scan the pixels of the image to find the width of the nonwhite area.
  2. Get the ratio of that width to the image's viewport width.
  3. Set that ratio as the zoom factor and redraw the image.
  4. Set the scroll position of the scroll area so as to center the nonwhite block.

Implementing these steps took me into back-alleys of Qt where I'd never been before, and introduced me to the SIP voidptr. In order to do step 1, I have to inspect the pixels of the image row by row, looking for the left and right edges of the text. I make sure the image is in the 8-bit Indexed Color mode, so that each pixel is one byte. Then the method QImage.bytes() returns a uchar * pointing to the bytes/pixels that comprise the image. The PyQt translation of uchar * is SIP.voidptr.

You can't use the voidptr as returned. First you must define the amount of memory it represents:

    vp = my_qimage.bytes()
    vp.setsize(my_qimage.width() * my_qimage.height())

Now that PyQt knows the bounds of the addressed memory—and one wonders: does it check the size; or would it let you define a very large size and potentially examine things you shouldn't?—anyway once you have set the bounds, you can index the voidptr as if it were a Python bytes string:

    if vp[j] < b'\xff': # nonwhite pixel

Which is exactly PEEK j! plus ça change, plus c'est la même chose...

Next post: performance tuning, or how fast can you finger the pixels?

Monday, April 21, 2014

GUI Testing with Sikuli

Sikuli is an open-source tool for automating tests of GUI applications. It is platform-independent (has Mac OS, Linux, and Windows versions) and is not linked in any way to a GUI package such as Qt. It uses little screen-grabs to locate targets of operation. Here, for example, are four lines from a Sikuli test script:

Line 28 finds the image of my editview context menu somewhere on the screen. That verifies that the menu is active. The next line sends the mouse to click on the third line of the menu. That opens a Mac OS file selection dialog, and I have arranged that the current working directory at this time will include the file I want. So line 30 of the script finds and clicks-on that filename, and the next line finds and clicks on the Open button of the file dialog.

This exactly answers the problems I was listing in the immediately prior blog post. You can specify mouse actions in terms of the look of their targets, regardless of whether these are Qt widgets or operating system dialogs. You can make it "find" the exact image of what you expect to have on the screen; for example, to find the image of a checkable menu item with its check mark in place, or to find the image of a word with or without a spelling-error underline.

A Sikuli script is a Python 2.7 script, but that's not a problem even though my code uses Python 3. The Sikuli app is a Java app which embeds its own Jython 2.7! So Sikuli's Python is completely unconnected from the target app's Python.

The test setup needed to combine Sikuli tests and ordinary Python test scripts is a bit complex. Here's the folder setup:

ppqt
   all app source scripts
   Tests
      all unit-test scripts run by pytest
      Sikuli
         all Sikuli test scripts

The unit-test driver, Tests/editview_sikuli.py reads like this:

path_to_self = os.path.realpath(__file__)
path_to_Sikuli = os.path.join(os.path.dirname(path_to_self),'Sikuli')
import subprocess
r = subprocess.call(['/Applications/SikuliX-IDE.app/Contents/runIDE',
                 '-r',
                 os.path.join(path_to_Sikuli,'editview.sikuli')])
assert not(r)

So this simple script just fires off the Sikuli app asking it to run Sikuli/editview.sikuli. That is actually a folder containing a python script and all the little .png files that represent target images.

The Sikuli script (which is in Python 2.7 syntax, remember) starts out by firing off a copy of the test GUI:

import subprocess
subprocess.Popen(['/usr/local/bin/python',
            '/Users/dcortes1/Dropbox/David/PPQT/V2/ppqt/Tests/Sikuli/editview_runner.py'])

The new subprocess based on editview_runner.py sets up and launches the editview test window. That's the window where Sikuli will find visual matches as it runs its "click" and "find" statements. It does that, and if everything matches, the script ends with "type('q',KeyModifier.META)", in other words, command-Quit. That terminates the test app. The Sikuli script ends with a 0 return code. Back in the test driver, the subprocess call ends and assert not(r) is satisfied.

The Sikuli interactive IDE is a bit clumsy to use, and preparing a useful test is a bit tedious. (But on the other hand, it also revealed a couple of small bugs.) I expect to get good use out it the rest of the way.

As of now, the editview module is functionally complete. As are several auxiliary modules: colors (keeps track of default colors and highlights, eventually will link up with a Preferences dialog); dictionaries (keeps track of available spelling dictionaries and defines the Speller class of spellcheck objects); and the basic skeleton of the Book class. So: on to imageview! When that's working, it will be time to set up the main window and actually have a whole app to play with.

Friday, April 18, 2014

Context Menus are Hard

Really, context menus are the orphan children of Qt. First, you cannot define one using Qt Designer. So even if all the rest of your UI (including the "real" menus) is designed with the designer, your context menu has to be hand-coded. And second, it appears to be impossible to test one using QTest.

As reported yesterday, it doesn't seem possible to use QTest mouse events to invoke a context menu—at least for a QTextEdit under Mac OS. So today I'm constructing a QContextMenuEvent object my ownself, and passing it to the editor to process. This works, kind of, but not in a useful way for testing. Here's some code.

ev = the_book.editv.Editor
ev_w = ev.width()
ev_h = ev.height()
ev_m = QPoint(ev_w/2,ev_h/2)

cxe = QContextMenuEvent(
    QContextMenuEvent.Mouse,
    ev_m, ev.mapToGlobal(ev_m),
    Qt.NoModifier)

app = QCoreApplication.instance()
app.notify(ev,cxe)
cxm = ev.childAt(ev_w+5,ev_h+5)

QCoreApplication.notify(target,event) "Sends event to receiver... Returns the value that is returned from the receiver's event handler." My supposition was that the context menu would pop up, and I'd get control back and use childAt() to get a reference to that menu, and then I could feed it keystrokes, for example a couple of down-arrows and a return to select the third item.

Well as so often happens, my supposition was wrong. Because I don't get control back! Notice what the documentation says, "the value that is returned from the receiver's event handler"? But that event handler consists entirely of, quote:

    def contextMenuEvent(self, event):
        self.context_menu.exec_(event.globalPos())

It won't be returning until the menu's exec_() method returns! The nice editview window sits there with its context menu open in the middle, and waits for user input. It won't finish until the menu is satisfied with a Return or dismissed with Escape. At which time, there is no longer a menu at any location to be captured or tested.

Another problem I realized yesterday evening was this. Suppose I could get a reference to the menu object and feed it keystrokes. Two of its actions are toggle-able or check-able choices; in order to verify they had worked, I'd have to somehow sample their toggle status. How does one ask, does this menu choice have a check-mark on it?

The other two choices are worse. One invokes a native OS file-selection dialog, which is certainly beyond the reach of QTest or anything based on the Qt event mechanism. So how could my unit-test code browse to a test scanno file and click OK? And then how could it verify that (at least one of) the expected words was now highlighted?

And the other pops up a dialog with a drop-down menu of available dictionary tags. I'd like to have a test that verified that this dialog appeared, that it had the expected number of entries, that the last-chosen dictionary tag was the selected one, that another could be chosen, and so forth. But my test code doesn't get control back while the context menu is up, and it won't end until that dictionary dialog ends, so... bleagh.

Thursday, April 17, 2014

What's Happening?

The last couple of days have been a long digression into the world of QTest and QEvents. It was motivated by my desire to make an automated unit test particularly of the context menu I've added to the editview.

The motivation for a context menu, like so many changed features of PPQT2, is that there can be multiple books open at once. In V1, there was only one book and accordingly only one:

  • Primary spelling dictionary
  • File of "scannos" (common OCR errors like "arid")
  • Choice of whether or not to highlight spelling errors
  • Choice of whether or not to highlight scannos

But now these choices have to be individualized per book. In V1, there could be a File menu action, "Choose scanno file..." but in V2, if that action was in the File menu, there'd have to be a convention about which of the (possibly several) open books those scannos should apply to. The one with the keyboard focus? Suppose the keyboard focus is in the Help panel? Similarly for the V1 View > Choose Dictionary menu item. And for the View > Highlight Scannos/Spelling toggles. All these choices need to be clearly associated to a single book. Hence, a context menu in the editor panel with four actions. When you have to right-click on the editor in order to choose a scanno file, you know where those scannos will be applied.

The little context menu is in and working, at least to casual testing. But I'm trying to do this shit right; and that means, an automated unit test. And that meant, I presumed, using QTest's mouse actions to simulate a control-click on the edit widget.

So I wrote up a test case that went in part like this:

ev = the_book.editv.Editor # ref to QPlainTextEdit widget
ev_w = ev.width()
ev_h = ev.height()
ev_m = QPoint(ev_w/2,ev_h/2) # the middle
QTest.mouseClick(ev, Qt.LeftButton, Qt.CtlModifier, ev_m, 1000)

...and, it didn't work. Nothing. I tried all sorts of mouse actions using both QTest's methods (mouseClick, mouseDblClick, mousePress, mouseRelease) and actually composing my own QMouseEvent objects and pushing them in with QApplication.postEvent(). Fast forward through about six hours of fiddle-faddling over three days. Sometimes I could get a double-click to work and sometimes not. Mouse presses or clicks with any button and modifier—nada. zip.

Now as it happens, I have an "event filter" on the editor. This is because I want to handle certain keystrokes, as described previously. But the edit widget is created by code generated from Qt Designer. That means it can only be a standard QPlainTextEditor. The normal way to intercept keystrokes is to subclass a widget and override its keyPressEvent() method. Don't think there's a way to get Qt Designer to plug in a custom subclass of a widget type.

However there's a way that you can install an "event filter" on any widget. That directs all events for that widget through the filter function. If it handles the event it returns True; if not, it returns False and the event is presented to the widget in the normal way. So editview puts a filter on events to the edit widget, picks off just the keyEvents it wants, and passes the rest on.

So I took advantage of this to just print out all the events passing through the edit widget so I could find out just what the heck mouse events it was getting when I clicked to bring up the context menu.

Surprise! It doesn't get any!

The Qt docs would have one believe that as a mouse moves over and clicks or drags on a widget, there's a constant flow of QMouseEvent objects to it. Nope. Not on my Macbook, anyway.

There are Enter and Leave events as the mouse pointer comes into and out of the frame of the widget. These aren't mouse events as such. There are lots of other sorts of events like Paint and Tooltip. But there are almost no mouse events posted. What does appear while the mouse is active, on every click and streaming during any drag, is QInputMethodQuery events. During a mouse click or drag, when I'd expect a stream of QMouseEvent postings, all that comes is a stream of QInputMethodQuery.

This peculiar class has only one property, a query with a not very helpful list of values. Of these possible "queries" only one is being sent in my system, the query IMEnabled meaning "The widget accepts input method input". The receiver is supposed to set something from a set of even less-interesting values in the event. Of course, my event filter doesn't see what is being set; it only sees the event on its way in.

Something nefarious is going on here. Perhaps it is only in the Mac OS; perhaps it only affects QTextEdit and derivatives (QTest mouse actions directed to other widgets seem to work). But for the editor, on my macbook, the whole mouse event architecture is effectively being ignored, replaced by something only minimally documented and not amenable to code introspection for unit-testing.

There are also a few InputMethodQueries issued just before any keyPressEvent and I am deeply suspicious that this is related to the inconsistent handling of the Mac keyboard I noted earlier.

That aside, the net from all this investigation is to realize that I don't really need to simulate the mouse at all. All I need to do is to fabricate a QContextMenuEvent with a given position in the middle of the editor. Post that; then use getChildAt that same position to get a reference to the context menu, and then I can send it keystrokes using QTest.

To be tried tomorrow.

Tuesday, April 15, 2014

Current Line Revisited

A few days ago I described my progress on editview, but just today I stumbled on a big improvement. Here's how it looks now.

If you click through you find that's a quite large image. The reason is, it's from a retina macbook so what looks like quite a modest window, when captured, comes out 1500px wide. Here are the improvements from the prior version.

  • The current-line highlight now extends the full width of the window. Before it was only as long as the text on that line.
  • Scanno highlighting (the lilac highlights) is implemented. You can load a file of common OCR errors and they are marked wherever they appear.
  • Spellcheck highlighting (wiggly magenta underlines) is implemented, including alternate dictionaries. Note the line with <span lang='fr_FR'>; those words get checked against the french dictionary instead of the default one.

Pretty much all that remains is to finish an automated unit test of these features. I have one simple unit test driver now that uses QTest to automate a number of keystrokes, but I need to also automate exercising a pop-up context menu. That'll be an adventure I'm sure.

In the previous post I kvetched about how, although a QTextBlock has a format (QTextBlockFormat), you could only interrogate it, and modifying it didn't change the format. As a result, what I expected would be a simple way to set a current-line highlight, by setting the background brush of the current text block, didn't work.

Then today, browsing around the QTextCursor documentation, what should my eye fall upon but a setBlockFormat method! You can ask a QTextBlock for its format, but in order to set it, you have to aim a QTextCursor at that block, and then tell the cursor to set the block's format.

Bizarre.

Well, at any rate, not how I'd have designed it. But I didn't, so...

So I realized that my previous method of highlighting the current line using the extraSelections mechanism was over-complicated. I changed the logic to set a background brush on the current block. The cursor-moved logic now reads as follows:

Note: The following is still not the correct way to set a current-line highlight. Do not emulate this code. See this post for the problem with it and a later post for the correct approach.

    def _cursor_moved(self):
        tc = QTextCursor(self.Editor.textCursor())
        self.ColNumber.setText(str(tc.positionInBlock()))
        tb = tc.block()
        if tb == self.last_text_block:
            return # still on same line, nothing more to do
        # Fill in line-number widget, line #s are origin-1
        self.LineNumber.setText(str(tb.blockNumber()+1))
        # Fill in the image name and folio widgets
        pn = self.page_model.page_index(tc.position())
        if pn is not None : # the page model has info on this position
            self.ImageFilename.setText(self.page_model.filename(pn))
            self.Folio.setText(self.page_model.folio_string(pn))
        else: # no image data, or cursor is above page 1
            self.ImageFilename.setText('')
            self.Folio.setText('')
        # clear any highlight on the previous current line
        self.last_cursor.setBlockFormat(self.normal_line_fmt)
        # remember this new current line
        self.last_cursor = tc
        self.last_text_block = tb
        # and set its highlight
        tc.setBlockFormat(self.current_line_fmt)

Monday, April 14, 2014

Case of the Shredded Data

A persistent source of PyQt bugs is the problem that as soon as a variable "goes out of scope"—that is, cannot be referenced from any statement—it gets garbage-collected and either re-used or made unreachable by the virtual memory hardware. Newbies to PyQt get bitten by this early, often, and hard. It usually shows up as a segmentation fault that takes down the Python interpreter and your app. And there's no obvious bread-crumb path back to the problem.

The usual problem is that you build one object A based on another object B. Then you pass object A around and try to use it in another part of the program. Meanwhile, object B, which was just input material to making A, has gone out of scope and been shredded. Then any use of A references non-existent memory and segfaults, or produces weird results because it is accessing memory that doesn't contain what it should.

Case in point: QTextStream. This is a useful class, very handy for reading and writing all kinds of files. You could use the Python file class instead, but you need to standardize on one paradigm or the other, Python's files or Qt's QTextStreams. And I've gone with the latter, but they have this little problem of segfaulting if you are not careful with them.

A QTextStream is built upon some source of data, either a QFile or an in-memory QByteArray. The class constructor takes that source object as its only argument, as in this perfectly innocuous function:

def get_a_stream(path_string):
    '''Return a QTextStream based on a path, or None if invalid path'''
    if not QFile.exists(path_string):
        return None
    a_file = QFile(path_string)
    if not a_file.open(QIODevice.ReadOnly):
        return None
    return QTextStream(a_file)

Lovely, what? Simple, clear—and wrong. Because a_file goes out of scope as soon as the function returns. Many statements away in another part of the program, the next use of the returned stream crashes the program. This is (in my oh-so-humble opinion) a stupid design error in Qt (it affects C++ users too) but fortunately it is easy to work around. You just use the following instead of QTextStream:

class FileBasedTextStream(QTextStream):
    def __init__(self, qfile):
        super().__init__(qfile)
        self.save_the_goddam_file_from_garbage_collection = qfile

That's it! An object of this class FileBasedTextStream can be used anywhere a QTextStream would be used, but it does not require you to find some way to save the QFile from the garbage collector. The single reference to the QFile keeps it alive until the stream object itself is freed.

I solved this same issue earlier, for memory-based text streams. These are very handy for nonce files, and my unit test code builds lots of them.

class MemoryStream(QTextStream):
    def __init__(self):
        # Create a byte array that stays in scope as long as we do
        self.buffer = QByteArray()
        # Initialize the "real" QTextStream with a ByteArray buffer.
        super().__init__(self.buffer)
        # The default codec is codecForLocale, which might vary with
        # the platform, so set a codec here for consistency. UTF-16
        # should entail minimal or no conversion on input or output.
        self.setCodec( QTextCodec.codecForName('UTF-16') )
    def rewind(self):
        self.seek(0)
    def writeLine(self, str):
        self << str
        self << '\n'

It's just a QTextStream based on an in-memory buffer, but the buffer can't go out of scope as long as the object exists. It adds a couple of minor features that QTextStream lacks.

Friday, April 11, 2014

Why Can't Huns Spell?

Lordy but I hate the kind of work described below. It's really stressful. (I know, kvetch, kvetch, kvetch.)

I'm to the point where I want to test the ability to mark words that fail spellcheck. To do that, I need the ability to check spelling, duh!

PPQT version 1 went through stages of spell-checking, each representing many hours of effort. First, trying to send words over a pipe to Aspell running as a subprocess. Then I wrote my own all-Python spell-checker to use the Myspell/OpenOffice dictionary format. That was a useful learning exercise. I learned:

  • All about the format and content of the .dic/.aff dictionary files.
  • That German is a damned hard language to spell-check.
  • That there are a lot of subtleties to the spell-check algorithms.

In the end German defeated my code. I just couldn't get it to handle multiple affixes properly. In the nick of time I found this Python wrapper for Hunspell. Like a lot of FOSS, it was created by someone who needed it a few years ago, and that person has apparently moved on and left it dangling unmaintained. But it can be made to work, with effort—for Python 2.x. And it was suuu-weet once I got it going, blazing fast and reliable. I made it work for Mac OS and for Linux, but blew many hours failing to make it work for my Windows distribution. Eventually I went on ELance and paid a dude $150 to make the Hunspell wrapper work on Windows. Money well spent.

But PPQT2 is built on Python 3.3 (well, probably 3.6 by the time it's done) and the Hunspell wrapper doesn't work for that. However, another user posted a diff file that, he claimed, made it work with Python 3. So I spent some hours today getting it compiled and installed.

It should be a one-liner, python setup.py install, but of course that don't work because there are things in the setup.py script that assume Linux, and a prior release of Hunspell. So you tweak that a while. Reviewing my notes from last fall, you run python setup.py build and it fails, then you manually run a compile command that works to actually create the module, then python setup.py install will install it. The compile command that worked for 2.7 (contributed to that wiki by another user, bless her heart) was:

gcc -fno-strict-aliasing -fno-common -g -fwrapv\
 -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64\
 -pipe -D_LINUX -I/usr/include/hunspell\
 -I/usr/include/python2.7 -lpython -lhunspell-1.2\
 -shared hunspell.c -o build/hunspell.so

But that doesn't work after the diff was applied for Python 3. It coughed up an unresolved symbol _PyModule_Create2 for no apparent reason. So, what's a search engine for if not to find obscure error messages? And Da Google turned up many people with this problem dating back to 2010. A stackoverflow response, although not directly responsive, pointed to lack of inclusion of the python3.3 library, and that was it. Here's the command that actually compiles and links hunspell for Python 3:

P=/Library/Frameworks/Python.framework/Versions/Current
gcc -fno-strict-aliasing -fno-common -g -fwrapv\
 -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64\
 -pipe -D_LINUX -I/usr/local/include/hunspell\
 -I$P/include/python3.3m\
 -L$P/lib -lpython3.3 -lhunspell-1.3 -shared\
 hunspell.c -o build/hunspell.so

So I now have a working Hunspell that I can start playing with, on Mac OS at least. Such a relief! OK, Ghu alone knows what it will take to make it work on other platforms, but that's months away. For now, my Huns can spell.

Wednesday, April 9, 2014

Which Line Is It, Anyway?

The editview module is getting pretty complete. The only missing function is the dreaded syntax-highlighter to highlight scannos or spelling errors. Here's what it looks like now.

Today I added the code to highlight the current line. That's why one line has a sort of pale-lemon background. In V1, there was no current line highlight, and it was quite easy to lose sight of the cursor, and have to rattle the arrow keys to find it. (The string shown in dark gray is selected text and is actually bright yellow; the Grab utility did something to the colors.)

Qt's method of doing this was surprising to me.

In a QPlainTextEdit, there is a 1:1 correspondence between text blocks and logical lines. Each line of text is in one QTextBlock. Now, QTextBlock has a property blockFormat which is a QTextBlockFormat, which is itself a QTextCharFormat derivative, i.e. it can be used to set the font, color, background brush and so on. So when I started looking at how to make the current line a different color, I saw this and supposed it would be a matter of, each time the cursor moved:

  • Get the text block containing the cursor, a single method call,
  • Clear the background brush of the previous line's text block,
  • Set the current text block's blockFormat to a different background brush

But in fact QTextBlock lacks anything like a setBlockFormat, so the property is read-only. And setting the background property of the returned QTextBlockFormat object was accepted but had no visible effect.

Sigh, back to the googles to find a number of places in the Qt docs, stackoverflow and the like, where the question is raised and answered.

QPlainTextEdit supports a property extraSelections, which is a list of QTextEdit::ExtraSelection objects. This is the first and I think only time I've seen a class documented as child of another class. And it's a weird little class; it has no methods (not even a constructor), just two properties, cursor and format. So it's basically the C++ version of a python tuple.

What you do is, you get a QTextCursor to select the entire line, and you build an ExtraSelection object with that cursor and the QTextCharFormat you want to use, and assign that to the edit object's list of extra selections. This is a lot of mechanism to just highlight one line. Apparently the intent is to support an IDE that, for example, wants to put a different color on each line set as a breakpoint, or such.

Note: The following is not the correct way to set a current-line highlight. Do not emulate this code. See this post for the problem with it and a later post for the correct approach.

Anyway for the curious, this is the code that executes every bloody time the cursor moves:

    def _cursor_moved(self):
        tc = QTextCursor(self.Editor.textCursor())
        self.ColNumber.setText(str(tc.positionInBlock()))
        tb = tc.block()
        ln = tb.blockNumber()+1 # block #s are origin-0, line #s origin-1
        if ln != self.last_line_number:
            self.last_line_number = ln
            self.LineNumber.setText(str(ln))
            tc.movePosition(QTextCursor.EndOfBlock)
            tc.movePosition(QTextCursor.StartOfBlock,QTextCursor.KeepAnchor)
            self.current_line_thing.cursor = tc
            self.Editor.setExtraSelections([self.current_line_thing])
            pn = self.page_model.page_index(tc.position())
            if pn is not None : # the page model has info on this position
                self.ImageFilename.setText(self.page_model.filename(pn))
                self.Folio.setText(self.page_model.folio_string(pn))
            else: # no image data, or positioned above page 1
                self.ImageFilename.setText('')
                self.Folio.setText('')

In sequence this does as follows:

  • Get a copy of the current edit cursor. A copy because we may mess with it later.
  • Set the column number in the column number widget.
  • Get the QTextBlock containing the cursor's position property (note 1 below).
  • Get the line number it represents.
  • If this block is a change from before (note 2):
    • Set the line number in the line number widget.
    • Make the cursor selection be the entire line ("click" at the end, "drag" to the front)
    • Set that cursor in a single ExtraSelection object we keep handy.
    • Assign that object as a list of one item to the editor's extra selections.
    • Get the filename of the current image file, if any; and if there is one, display it and the logical folio for that page in the image and folio widgets.

Note 1: If there's no selection, a text cursor's position is just where the cursor is. But if the user has made a selection, the position property might be at either end of it. Drag from up-left toward down-right and the position is the end of the selection. Drag the other way, it's at the start. Drag a multi-line selection that starts and ends in mid-line. One of the lines will have the faint current-line highlight: the top line if you dragged up, the bottom line if you dragged down. I don't think anyone will notice, or care if they do. I could add code to set the current line on min(tc.position(),tc.anchor())—but I won't.

Note 2: Initially, there was no "if ln != self.last_line_number" test; everything was done every time the cursor moved. And actually performance was fine. But I just could not stand the idea of all that redundant fussing about happening when it didn't have to.

Friday, April 4, 2014

Further on the Mac Option Key

The Qt Forum post I made about the Option-key problem, after 22 hours, has been viewed 32 times but drawn no responses. I also posted a respectful query on the pyqt list this morning (after obsessing about the issue some of the night).

I also spent a couple more hours delving deeply into the QCoreApplication, QGuiApplication, and QApplication docs, hoping to find some kind of magic switch to change the behavior of the key interface. I speculate that Qt5 has better Cocoa integration and as a result is getting the logical key from a higher-level interface than before.

Supposing it can't be fixed or circumvented, what I will have to do is: In constants.py where the key values and key sets are determined, check the platform and use Qt.MetaModifier instead of Qt.AltModifier when defining keys for Mac. This substitutes the actual Control shift for the Option shift.

That would be the only module with a platform dependency. Others just use the names of keys and key-sets defined in constants.py. For the user, I will have to have separate documentation about bookmarks, for Mac and non-Mac. For non-Mac, it'll remain "Press control and alt with a number 1-9 to set that bookmark." For mac it will be "Press the Control key and the Command key together with a number 1-9..." And the beautiful consistency ("where you see 'alt' think 'option'" at the front and never mention it again) is gone.

Another issue is the use of ctl-alt-M and ctl-alt-P in the Notes panel, to insert the current line or image number. Possibly I can just change the key definition in constants to whatever the mac keyboard generates for option-M and option-U (pi and mu, it seems). Or keep the directions consistent, and completely wipe out any use of Option-keys in Mac.


Also today I tested and committed the zoom keys, which work a treat. The unit test module buzzes up 10 points and down 15, looks great.

Thursday, April 3, 2014

A Bump in the Road

Today I thought I'd add in the special keystrokes to the editview. There are three groups of them: a set that interact with the Find dialog (^f, ^g, etc), and these I'm deferring until I actually work on the Find panel; a bookmark set, (ctl-1 to 9 to jump to a bookmark, ctl-alt-1 to 9 to set one); and ctl-plus/minus to zoom. All of these were implemented and working in version 1, using the keyPressEvent() method to trap the keys.

So I messed around and tidied up the constants that define the various groups of keys as sets, so the keyPressEvent can very quickly determine if a key is one it handles, or not: if the_key in zoom_set and so on.

With the brush cleared, I copied over the keyPressEvent code from V1 and recoded it (smarter and tighter) for V2 and ran a test, and oops something is not working.

Specifically, it is no longer possible to set bookmark 2 by pressing ctl-alt-2. On a mac, that's command-option-2, which Qt delivers as the Qt.ALT_MODIFIER plus Qt.CTL_MODIFIER and the key of Qt.KEY_2.

Or rather, it used to do that. I fired up PPQT version 1 just to make sure. Yup, could set a bookmark using cmd-opt-2. But not in the new version. Put in debug printout. The key event delivered the same modifier values, ctl+alt, but the key value was... 0x2122, the ™ key? And cmd-alt-3 gave me Qt.KEY_STERLING, 0xA3. And cmd-alt-1 is a dead key.

Pondering ensued. OK, these are the glyphs that you see, if you open the Mac Keyboard viewer widget and depress the Option key. So under Qt5, the keyboard event processor is delivering the OS's logical key, but under Qt4 in the same machine at the same time it delivers the physical key.

Oh dear.

I spent several hours searching stackoverflow and the qt-project forums and bug database but nothing seemed relevant. I posted a query in the Qt forum. But I have little hope. It looks very much as if I'll have to change they key choices for bookmarks, and make them platform-dependent. In Windows and Linux they can continue to be ctl[-alt]-1 to 9, but in Mac OS this will change. The only reliable special key modifiers are control (Command) and meta (the Control key!).

In V1 it was great that I could document just once at the top of the docs, that in Mac, "ctl means cmd" and "alt means option". And that was consistent throughout. Now it won't be because the Option key is effectively dead for my purposes. I'll have to tell the mac user, "when I say control I mean command, but when I say alt, I mean control." Won't that be nice? Plus, I'll have to have code that looks at the platform and redefines the key sets for Mac at startup. Very disappointing.