This Page Intentionally: PyQt5

Showing posts with label PyQt5. Show all posts

Friday, August 31, 2018

A new PyQt post: Retaining widget shape

Just for fun I'm working on a Tetris game. I based this on the code originally published by Jan Bodnar, cleaning it up and commenting the living bejeezus out of it.

My code is here on Github. The file standard.py is playable, but it has one big flaw. If you start it up and then drag on the corner of the window, the tetris pieces become misshapen, distorted.

So how can I make (Py)Qt5 retain the aspect ratio of a widget while still allowing the user to stretch or shrink its container?

Not surprisingly, I am not the first to wonder this. (It would come up when coding any kind of game, I should think.) Search the Qt General Forum on the string "heightForWidth" (which is the method name that turns up in most answers to this issue) and you'll find several postings from as long as seven years ago, and as recently as one year. A more general search turns up Stack Overflow posts. Almost all the proposed solutions are wrong. However, the solution offered in this S.O. post has code that works.

Most of the solutions say that to make a widget keep its aspect ratio, you

Give it an explicit SizePolicy in which
you call setHeightForWidth(True) and
in the widget itself, override the hasHeightForWidth() and heightForWidth() methods:

    def hasHeightForWidth(self):
        return True
    def heightForWidth(self, width):
        return width

Except some posts say the widget has to be in a layout (e.g. QVBoxLayout), and others seem to say that you have to implement a custom version of a box layout yourself. I've got a simple test case right here in which I implemented all those things in every combination. You're welcome to play with it.

Bottom line is, no heightForWidth() method is ever called, that I could find. The whole approach probably works for some combination of options—it certainly appears to have been devised to do this—but it flat doesn't work in any combination that I could devise.

Like I said, download that test code and see if you can find the magic.

What does work is this: in the widget class that you want to always be square, implement the resizeEvent() handler. In it, check the new size of the widget and adjust its contentsMargins appropriately to compensate. Thus:

    def resizeEvent(self, event):
        # setContentsMargins(left,top,right,bottom)
        d = self.width()-self.height()
        if d : # is not zero,
            mod1 = abs(d)//2
            mod2 = abs(d)-mod1
            if d > 0 : # width is greater, reduce it
                self.setContentsMargins(mod1,0,mod2,0)
            else : # height is greater, reduce it
                self.setContentsMargins(0,mod1,0,mod2)
        super().resizeEvent(event)

The same code, with a little more math, could be used to maintain some other aspect ratio than a square.

Tuesday, May 6, 2014

Signal from a module

So the fonts module knows all about QFonts. Its main purpose is to keep track of the user's preference for the mono font (used in the editor) and the general font (used by all the UI widgets), getting these from the app settings at startup and recording them in settings at shutdown, and supporting the yet-to-be written preferences dialog. It offers services like

get_fixed() to return a QFont set to the user's preferred monospaced family and size,
scale(updown,qfont) to return a qfont scaled up or down one point,
ask_font(mono=True) to present a QFontDialog with an appropriate title and return a selected QFont.

When the user chooses a different mono font, any editview needs to know, because it has to explicitly set the font of its QPlainTextEdit. There could be one editview, or several, or even none. Similarly the main window needs to know if the user wants to use a different font for the UI (unlikely, but why not?). This is just the right place to use a Qt signal. Any editview, or the main window can connect itself to that signal while initializing.

However this turned out to be tricky because fonts is a module, not a class. Thereby hangs a short boring story...

Modul-ism

When I was writing PPQT V.1 I didn't really understand Python namespaces. I was stuck in the mindset of assembly language and C—hey, I'm old, what do you want?—so I vaguely equated Python import with a C #include. They are sort of alike in that both bring some externally-defined code into the scope of a source module. But as I bet you know, they are very, very different in implementation.

Take a stupid example. Let's say you have the following modules:

common.py:
    GLOB = 1
unit_a.py:
    import common
    common.GLOB = 2
unit_b.py:
    import common
    print(common.GLOB)
main.py:
    import unit_a
    import unit_b

When you run main.py, what happens? The first mental hurdle to get over is to realize that import is an executable statement: any code in the imported thing is executed. Normally that's declarative code like def or class but the assignments and the print() in these modules will also execute. So something is going to be printed: what?

If import was like #include, it would print 1 because the common imported into unit_b would be an independent copy of the file common.py. But that's not what happens. The first time import common is executed—as part of executing import unit_a—Python creates a common namespace with the one entry, GLOB bound to a 1, which unit_a then reassigns to 2. The next time import common is executed—as part of executing import unit_b—all that happens is that unit_b's namespace gets a reference to the same common namespace, from which it prints 2.

Although I understood this in a theoretical way, I couldn't quite shake the suspicion that importing the same module more than once was somehow a risky proposition. So I took pains to create a class to hold global values. I instantiated one of that class early, and passed that object into each sub-module. It was over-complicated, unnecessary, and in fact a bad design because the global-holding object ended up an overstuffed portmanteau full of unrelated things.

So, V.2, we do things pythonically. As I said, anything font-related gets handled in the fonts module, which has a number of global values with names like _MONO_FAMILY. These get set when the main window calls fonts.initialize(settings), they may get reset when the yet-to-be-written preferences calls fonts.set_general() or fonts.set_fixed(), and so on. And any module that imports fonts will be using the one and only fonts namespace and the same global values.

Signalling

Fine, but what about that signal? Say that preferences calls fonts.set_fixed() with a new choice of QFont. The fontsChanged signal needs to be emitted. But how, or from what?

The new PyQt5 signal/slot API insists that a signal has to be bound to a QObject instance. Fonts is a module, and it has no need define a class or make an object. But it wants to emit a signal. So this is what I had to do:

class Signaller(QObject):
    fontChange = pyqtSignal(bool)
    def connect(self, slot):
        self.fontChange.connect(slot)
    def send(self,boola):
        self.fontChange.emit(boola)

_SIGNALLER = Signaller()

def notify_me(slot):
    _SIGNALLER.connect(slot)
def _emit_signal(boola):
    _SIGNALLER.send(boola)

Signaller is a QObject with just one attribute, a class variable fontChange that is a signal. The signal carries one parameter, a boolean. (It's True if the font changed was the mono font, False if it was the UI font.)

Signaller has two methods, one to connect its signal to a slot, and one to emit the signal. One Signaller object is created and saved in a global reference.

Now, a call to fonts.notify_me() can be used to hook up any Python executable to the fontChange signal. Within the fonts module, a function like fonts.set_fixed() can call _emit_signal(True) to send the signal.

This works fine; the unit-test driver hooked up a function, called fonts.set_fixed(), and its function was invoked.

Monday, May 5, 2014

Being Resourceful

According to blogger, after each new post here, there's a bit of a spike in views. So at least a dozen people out there have this blog in their RSS readers. Howdy! I'll try to keep it going.

Another Unknown is Known

One of the unknowns that's been niggling at the back of my mind for weeks, is how to use the Qt resource system. I have a font that I want to carry along in the app so I can be sure it will be available. I think there will be some custom icons as well, like this one: which I think may be the icon for a button in the Find dialog that establishes a limited range of text for find/replace.

Anyway all such things shouldn't be carried along as separate files, but should be incorporated right into the program as resources. I knew Qt had a scheme for this and PyQt supported it; and I knew I would need to use it; but I didn't know how it worked and hadn't stirred myself to find out. So today I did.

As usual with these things, once you lay it out clearly it's no big deal. In a nutshell,

You list your resource files using XML syntax, in a file with type .rcc
You run a utility pyrcc5 which writes a python module.
You import that python module.

Then at run time, any file you named in the .rcc is available using the path prefix :/, as in :/hand-gripping.png.

The .rcc file consists of the following lines:

<!DOCTYPE RCC><RCC version="1.0">
<qresource>
   a list of files to embed, one line per file
</qresource>
</RCC>

Each file to embed is described with a line like this:

<file alias='run-time-name-of-resource'>relative-path-to-file</file>

For example, <file alias='hand-gripping.png'>hand-gripping.png</file>

And that's about it. I set up a folder Resources at the same level as my source modules. In it I put the .rcc file and the various files it named. Then from the command line, at the module level, I gave the command

pyrcc5 -o resources.py Resources/resources.rcc

And magically a 1.5MB file resources.py appeared. That file starts out with:

qt_resource_data = b"\
\x00\x04\xc8\x40\
\x00\
\x01\x00\x00\x00\x12\x01\x00\x00\x04\x00\x20\x46\x46\x54\x4d\x61\
\x13\x1d\x0b\x00\x04\x91\x38\x00\x00\x00\x1c\x47\x44\x45\x46\x79\

And ends, 20,000 lines later, with

def qInitResources():
    QtCore.qRegisterResourceData(0x01, qt_resource_struct, qt_resource_name, qt_resource_data)

qInitResources()

And that, executed upon the first import of the module, presumably tells the QApplication about these resources. It's pretty straightforward. Now I can get on with finalizing initialization of the font resources.

Saturday, May 3, 2014

Remembering

So I've moved on to a big, amorphous chunk of code, the "main window" of the app. This is what creates the main window, has all the menu "actions" like Save-As and Open, and generally sets things up and tears things down. It's a lot of code but I've started by nibbling at the edges, with initializing and shutting down. This involves using QSettings.

A QSettings object provides a way to tuck items into a persistent key-value store and get them back later. One common use is to remember the program's screen geometry. The final steps in the main window's termination code are:

        self.settings.setValue("mainwindow/size",self.size())
        self.settings.setValue("mainwindow/position",self.pos())
        self.settings.setValue("mainwindow/splitter",self.splitter.saveState())

The size, screen position, and the relative position of the "splitter" that divides the window into two sections are stored away. When the program starts up, the final steps in the main window's GUI initialization are:

        self.resize(self.settings.value("mainwindow/size", QSize(400, 400)))
        self.move(self.settings.value("mainwindow/position", QPoint(100,100)))
        self.splitter.restoreState(
            self.settings.value("mainwindow/splitter",QByteArray()))

The call to self.resize() sets the window size to whatever was stored in the settings, or, if this is the very first time the app's been launched on this system, to a default value of 400x400.

Version 1 stored a number of things in the settings, but the list of things stored will change a lot for V.2. This is partly because V.2 has more global items to remember: it will support user preferences for colors and fonts and paths to this and that, all of which are best remembered in settings. But values saved in QSettings are by definition global to the application. In V.1, lots of things were treated as global that were actually unique to the current document. That was because there was only one document. In V.2 we support multiple open Books (documents) and that, as I have said ruefully a number times lately, changes everything.

For just one example, the Find panel keeps lists of the last ten search-strings and last ten replace-strings. In V.1 it stored those lists in the settings at shutdown. But those lists shouldn't be global; they are document-unique and will change each time the user brings a different Book into focus. So those will get saved and restored from the each Book's meta-data, not from the global settings.

On the other hand, in V.2 I want to do what Firefox does when it starts up: offer to restore the previous session. I do not want to do what a number of Mac OS apps do since the Mountain Lion release, and restore the last session automatically. Maybe you don't want to re-open all those documents. But I want to offer the option, and that means, at shutdown, recording the list of open documents and their paths in the settings; and recovering the list at startup.

QVariant vanishes

As documented for C++ (link above) the QSettings API makes heavy use of the QVariant class, a way to wrap any other class as something that can be stored as a byte-string. The first argument to both value() and setValue() are the character-string keys, but the second in both cases is supposed to be a QVariant. The value being stored from self.size() is a QSize; the value returned by self.pos() is a QPoint, and the splitter status from self.splitter.saveState() is a 23-element QByteArray. C++ doesn't permit passing such a potpourri of types in the same argument position, so Qt tells you to swaddle them decently as QVariants.

Similarly, the value returned by QSettings.value() is documented as being a QVariant; and you are supposed to use one of that QVariant's many "toXxxx" methods to restore it to whatever class was wrapped in it. For example, in V1 of PPQT, the code to restore the main window splitter's state looked like:

 self.hSplitter.restoreState(
              self.settings.value("main/splitter").toByteArray() )

The call to value() evaluates to a QVariant; that object's toByteArray() method reveals that the value is really a QByteArray, and that is accepted by the splitter's restoreState method.

Well, in PyQt5, Maestro Phil and company did away entirely with QVariant calls. Presumably they are still using them under the covers, in the interface to C++, but the Python programmer is not supposed to use them, and in fact there is no way to import the name QVariant from a PyQt5 module. This simplifies the code of a PyQt5 app, but it adds yet another consideration when moving code from PyQt4.

Where does it all go?

Where does the QSettings object actually put the keys and values you give it? In different places depending on the platform. It's documented under "Platform Considerations" in the QSettings page, but here's the bottom line:

Linux: $HOME/.config/orgname/appname.conf
Mac OSX: $HOME/Library/Preferences/com.orgname.appname.plist
Windows (registry): HKEY_CURRENT_USER\Software\orgname\appname

In each case, orgname and appname are defined at startup to the QApplication, in my case with code like this:

app.setOrganizationName("PGDP")
app.setApplicationName("PPQT2")

Eventually that will be in the top-level module, PPQT2. Since that doesn't exist, it's currently in the unit-test driver, mainwindow_test.py, which prepares an empty settings object, creates the mainwindow object, sends it a close event, and then looks in the settings to verify it stored what it should have.

Actually, QSettings has a multi-level scheme; it will look first for user-level values; then at a system level if necessary. I'm not worrying about that; it's all automatic and platform-independent at the PyQt level. Which is nice.

Monday, April 28, 2014

Scanning the pixels

Last post I described how you can get a "voidptr" through which you can PEEK the bytes of an image. This is needed to implement zoom-to-width and zoom-to-height buttons on the imageview. The code that I wrote for this in V.1 never felt right. It's a pure CPU-burning FORTRAN-style loop process; even more so than the character or word census of a large text. The scan image for a typical page is several megapixels (the test image used below is about 1800x2700, or 5Mpx) and the only process I could find for determining the margin has to access a good fraction of them. Clicking "To Width" or "To Height" on an older laptop incurs a noticeable pause of a second or more. I want to shorten that.

(Note that in what follows, the code for to-width and to-height is virtually the same, just substituting "top" and "bottom" for "left" and "right". I'm writing about to-width; whatever lessons I learn are immediately applicable to to-height.)

Starting point

Let's look at the code as it is now. The point is to find out how much of the left and right margins of the image are all-white, then scale and center the image to exclude them. To find the left margin, I look at the pixels of each row from left to right, stopping at a black spot. Once I've found a black spot, I never have to scan farther to the right than that point. I keep going to successive rows, hoping to find a black spot even more to the left. After looking at all rows I know the pixel count to the left of the leftmost black spot. The same logic applies to the right margin: look at all rows from right to left, in order to find the rightmost black spot.

An early complication was that finding a single black pixel threw up false positives. Scan images often have one-pixel, even two-pixel "fly specks" outside the text area. So I modified the code to stop only on a dark spot of at least 3 pixels, which added some overhead. I added a heuristic. Noting that many pages have large white areas at the top and bottom, I started the scan with the middle row, hoping to quickly find some black pixels nearer the left margin.

The following code is heavily rewritten and refactored. You can view the original if you like, but don't bother. The original has four nearly-identical loops for scanning the left margin, middle to end then top to middle, then the right margin the same. These loops—I realized yesterday—differ only in the start, stop and step values of their controlling ranges. So I factored the inner loops out to this:

    def inner_loop(row_range, col_start, margin, col_step):
        '''
        Perform inner loop over columns of current margin within rows of
        row_range. Look for a better margin and return that value.
        '''
        pa, pb = 255, 255 # virtual white outside column
        for row in row_range:
            for col in range(col_start, margin, col_step):
                pc = color_table[ ord(bytes_ptr[row+col]) ]
                if (pa + pb + pc) < 24 : # black or dark gray trio
                    margin = col # new, narrower, margin
                    break # no need to look further on this row
                pa, pb = pb, pc # else shift 3-pixel window
        return margin

The key statement is pc = color_table[ ord(bytes_ptr[row+col]) ]. From inside, out: bytes_ptr[row+col] peeks the next pixel value which is an index into the color table. ord() is needed because the voidptr access returns a byte value and (for reasons that escape me) Python will not permit a byte type as a list index. The color_table is a list of the possible GG values from RRGGBB, smaller numbers being darker. I'll talk about it in a minute.

The operation of this code should be fairly clear: slide a 3-pixel window along the pixels of one row up to a limit; when they comprise a dark spot, break the loop and set a new, more stringent limit. Return the final limit value.

Now let's look at the code that sets up for and calls that inner loop.

def find_image_margins(image):
    '''
    Determine the left and right margin widths that are (effectively)
    all-white in an image, returning the tuple (left,right). The image is
    presumed to be a scanned book page in the Indexed-8 format, one byte per
    pixel, where a pixel value is an index into a color table with 32-bit
    entries 0xAARRGGBB (AA=alpha channel).
    '''
    rows = image.height() # number of pixels high
    cols = image.width() # number of logical pixels across
    stride = (cols + 3) & (-4) # scan-line width in bytes
    bytes_ptr = image.bits() # uchar * a_bunch_o_pixels
    bytes_ptr.setsize(stride * rows) # make the pointer indexable
    # Get a reduced version of the color table by extracting just
    # the GG values of each entry. If the image is PNG-1, this
    # gives [0,255] but it could have 8, 16, even 256 elements.
    color_table = [ int((image.color(c) >> 8) & 255)
                     for c in range(image.colorCount()) ]

The first five lines of code set up the voidptr to the image bytes. The stride value is used because the Qt documentation notes that every pixel row occupies a whole number of 32-bit words.

Note the remark about the "presumed" image format? In early testing I forgot to force the image to Indexed-8. The calculation stride * rows yielded about 5MB but that was much larger than the true memory size of a compressed 1-bit PNG file. The result was that the first time the inner loop tried to access a byte of a middle row—Python seg-faulted. Yes, you can bring down, not just your own app, but the whole interpreter, by mis-using a voidptr.

If the image file is really monochrome and stored as PNG-1, the color table will have two entries. But I can't require that or assume that. It will be a PNG but it might be a PNG-8 with actual colors. So make a color table of just the GG values (Green is commonly used as a proxy for pixel brightness) and index it by the pixel value.

Now, to work:

    # Some pages start with many lines of white pixels so in hopes of
    # establishing a narrow margin quickly, start at the middle, go to
    # the end, then do the top half. Begin: left side from the middle down.
    left_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride),
                    0, int(cols/2), 1
    )
    # With hopefully narrower margin, scan from the top to the middle:
    left_margin = inner_loop(
                    range(0, int(rows/2)*stride, stride),
                    0, left_margin, 1
                    )
    # Now do exactly the same but for the right margin, taking columns
    # from the rightmost, inward.
    right_margin = inner_loop(
                    range(int(rows/2)*stride, (rows-1)*stride, stride),
                    cols-1, int(cols/2), -1
                    )
    right_margin = inner_loop(
                    range(0, int(rows/2)*stride, stride),
                    cols-1, right_margin, -1
                    )
    return left_margin, right_margin

Each call to the inner loop passes a range of rows to examine—in the form of pre-calculated byte offsets—and the three factors for a range of columns. The end result is the width of the white margins of the page.

Timings of this code

When I use the ctime package to time one application of find_image_margins() to a representative page image, I get these not awfully helpful stats:

         871938 function calls in 1.921 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    1.815    0.454    1.921    0.480 imagetest.py:20(inner_loop)
   871922    0.106    0.000    0.106    0.000 {built-in method ord}
        1    0.000    0.000    1.921    1.921 {built-in method exec}
        1    0.000    0.000    1.921    1.921 imagetest.py:12(find_image_margins)

The remaining lines are all-zero times. So: running under cprofile, this takes almost two seconds to run. On a 2.8GHz CPU, that's roughly six billion instructions. Wow! This would have been completely infeasible on, say, my old Mac Plus where it would have taken something like 20 minutes to complete on its 8 megahertz CPU. On my current Macbook Pro, it's a minor annoyance.

The count of 871,922 calls to ord tells how many pixels were examined in the inner loop. The total time of 1.815 tells how much time the inner loop spent on those pixels, or about 108,000,000 microseconds on 900,000 pixels giving 120 microseconds per pixel examined.

Optimizing

I see two approaches to optimizing this program. One: reduce the 120μsec spent in the inner loop on each pixel. Two: reduce the number of pixels examined in the inner loop. Reducing either number by half would halve the program execution time.

I believe I will spend a bit of time exploring both approaches, and report in the next post.

Tuesday, April 15, 2014

Current Line Revisited

A few days ago I described my progress on editview, but just today I stumbled on a big improvement. Here's how it looks now.

If you click through you find that's a quite large image. The reason is, it's from a retina macbook so what looks like quite a modest window, when captured, comes out 1500px wide. Here are the improvements from the prior version.

The current-line highlight now extends the full width of the window. Before it was only as long as the text on that line.
Scanno highlighting (the lilac highlights) is implemented. You can load a file of common OCR errors and they are marked wherever they appear.
Spellcheck highlighting (wiggly magenta underlines) is implemented, including alternate dictionaries. Note the line with <span lang='fr_FR'>; those words get checked against the french dictionary instead of the default one.

Pretty much all that remains is to finish an automated unit test of these features. I have one simple unit test driver now that uses QTest to automate a number of keystrokes, but I need to also automate exercising a pop-up context menu. That'll be an adventure I'm sure.

In the previous post I kvetched about how, although a QTextBlock has a format (QTextBlockFormat), you could only interrogate it, and modifying it didn't change the format. As a result, what I expected would be a simple way to set a current-line highlight, by setting the background brush of the current text block, didn't work.

Then today, browsing around the QTextCursor documentation, what should my eye fall upon but a setBlockFormat method! You can ask a QTextBlock for its format, but in order to set it, you have to aim a QTextCursor at that block, and then tell the cursor to set the block's format.

Bizarre.

Well, at any rate, not how I'd have designed it. But I didn't, so...

So I realized that my previous method of highlighting the current line using the extraSelections mechanism was over-complicated. I changed the logic to set a background brush on the current block. The cursor-moved logic now reads as follows:

Note: The following is still not the correct way to set a current-line highlight. Do not emulate this code. See this post for the problem with it and a later post for the correct approach.

    def _cursor_moved(self):
        tc = QTextCursor(self.Editor.textCursor())
        self.ColNumber.setText(str(tc.positionInBlock()))
        tb = tc.block()
        if tb == self.last_text_block:
            return # still on same line, nothing more to do
        # Fill in line-number widget, line #s are origin-1
        self.LineNumber.setText(str(tb.blockNumber()+1))
        # Fill in the image name and folio widgets
        pn = self.page_model.page_index(tc.position())
        if pn is not None : # the page model has info on this position
            self.ImageFilename.setText(self.page_model.filename(pn))
            self.Folio.setText(self.page_model.folio_string(pn))
        else: # no image data, or cursor is above page 1
            self.ImageFilename.setText('')
            self.Folio.setText('')
        # clear any highlight on the previous current line
        self.last_cursor.setBlockFormat(self.normal_line_fmt)
        # remember this new current line
        self.last_cursor = tc
        self.last_text_block = tb
        # and set its highlight
        tc.setBlockFormat(self.current_line_fmt)

Monday, April 14, 2014

Case of the Shredded Data

A persistent source of PyQt bugs is the problem that as soon as a variable "goes out of scope"—that is, cannot be referenced from any statement—it gets garbage-collected and either re-used or made unreachable by the virtual memory hardware. Newbies to PyQt get bitten by this early, often, and hard. It usually shows up as a segmentation fault that takes down the Python interpreter and your app. And there's no obvious bread-crumb path back to the problem.

The usual problem is that you build one object A based on another object B. Then you pass object A around and try to use it in another part of the program. Meanwhile, object B, which was just input material to making A, has gone out of scope and been shredded. Then any use of A references non-existent memory and segfaults, or produces weird results because it is accessing memory that doesn't contain what it should.

Case in point: QTextStream. This is a useful class, very handy for reading and writing all kinds of files. You could use the Python file class instead, but you need to standardize on one paradigm or the other, Python's files or Qt's QTextStreams. And I've gone with the latter, but they have this little problem of segfaulting if you are not careful with them.

A QTextStream is built upon some source of data, either a QFile or an in-memory QByteArray. The class constructor takes that source object as its only argument, as in this perfectly innocuous function:

def get_a_stream(path_string):
    '''Return a QTextStream based on a path, or None if invalid path'''
    if not QFile.exists(path_string):
        return None
    a_file = QFile(path_string)
    if not a_file.open(QIODevice.ReadOnly):
        return None
    return QTextStream(a_file)

Lovely, what? Simple, clear—and wrong. Because a_file goes out of scope as soon as the function returns. Many statements away in another part of the program, the next use of the returned stream crashes the program. This is (in my oh-so-humble opinion) a stupid design error in Qt (it affects C++ users too) but fortunately it is easy to work around. You just use the following instead of QTextStream:

class FileBasedTextStream(QTextStream):
    def __init__(self, qfile):
        super().__init__(qfile)
        self.save_the_goddam_file_from_garbage_collection = qfile

That's it! An object of this class FileBasedTextStream can be used anywhere a QTextStream would be used, but it does not require you to find some way to save the QFile from the garbage collector. The single reference to the QFile keeps it alive until the stream object itself is freed.

I solved this same issue earlier, for memory-based text streams. These are very handy for nonce files, and my unit test code builds lots of them.

class MemoryStream(QTextStream):
    def __init__(self):
        # Create a byte array that stays in scope as long as we do
        self.buffer = QByteArray()
        # Initialize the "real" QTextStream with a ByteArray buffer.
        super().__init__(self.buffer)
        # The default codec is codecForLocale, which might vary with
        # the platform, so set a codec here for consistency. UTF-16
        # should entail minimal or no conversion on input or output.
        self.setCodec( QTextCodec.codecForName('UTF-16') )
    def rewind(self):
        self.seek(0)
    def writeLine(self, str):
        self << str
        self << '\n'

It's just a QTextStream based on an in-memory buffer, but the buffer can't go out of scope as long as the object exists. It adds a couple of minor features that QTextStream lacks.

Wednesday, April 9, 2014

Which Line Is It, Anyway?

The editview module is getting pretty complete. The only missing function is the dreaded syntax-highlighter to highlight scannos or spelling errors. Here's what it looks like now.

Today I added the code to highlight the current line. That's why one line has a sort of pale-lemon background. In V1, there was no current line highlight, and it was quite easy to lose sight of the cursor, and have to rattle the arrow keys to find it. (The string shown in dark gray is selected text and is actually bright yellow; the Grab utility did something to the colors.)

Qt's method of doing this was surprising to me.

In a QPlainTextEdit, there is a 1:1 correspondence between text blocks and logical lines. Each line of text is in one QTextBlock. Now, QTextBlock has a property blockFormat which is a QTextBlockFormat, which is itself a QTextCharFormat derivative, i.e. it can be used to set the font, color, background brush and so on. So when I started looking at how to make the current line a different color, I saw this and supposed it would be a matter of, each time the cursor moved:

Get the text block containing the cursor, a single method call,
Clear the background brush of the previous line's text block,
Set the current text block's blockFormat to a different background brush

But in fact QTextBlock lacks anything like a setBlockFormat, so the property is read-only. And setting the background property of the returned QTextBlockFormat object was accepted but had no visible effect.

Sigh, back to the googles to find a number of places in the Qt docs, stackoverflow and the like, where the question is raised and answered.

QPlainTextEdit supports a property extraSelections, which is a list of QTextEdit::ExtraSelection objects. This is the first and I think only time I've seen a class documented as child of another class. And it's a weird little class; it has no methods (not even a constructor), just two properties, cursor and format. So it's basically the C++ version of a python tuple.

What you do is, you get a QTextCursor to select the entire line, and you build an ExtraSelection object with that cursor and the QTextCharFormat you want to use, and assign that to the edit object's list of extra selections. This is a lot of mechanism to just highlight one line. Apparently the intent is to support an IDE that, for example, wants to put a different color on each line set as a breakpoint, or such.

Note: The following is not the correct way to set a current-line highlight. Do not emulate this code. See this post for the problem with it and a later post for the correct approach.

Anyway for the curious, this is the code that executes every bloody time the cursor moves:

    def _cursor_moved(self):
        tc = QTextCursor(self.Editor.textCursor())
        self.ColNumber.setText(str(tc.positionInBlock()))
        tb = tc.block()
        ln = tb.blockNumber()+1 # block #s are origin-0, line #s origin-1
        if ln != self.last_line_number:
            self.last_line_number = ln
            self.LineNumber.setText(str(ln))
            tc.movePosition(QTextCursor.EndOfBlock)
            tc.movePosition(QTextCursor.StartOfBlock,QTextCursor.KeepAnchor)
            self.current_line_thing.cursor = tc
            self.Editor.setExtraSelections([self.current_line_thing])
            pn = self.page_model.page_index(tc.position())
            if pn is not None : # the page model has info on this position
                self.ImageFilename.setText(self.page_model.filename(pn))
                self.Folio.setText(self.page_model.folio_string(pn))
            else: # no image data, or positioned above page 1
                self.ImageFilename.setText('')
                self.Folio.setText('')

In sequence this does as follows:

Get a copy of the current edit cursor. A copy because we may mess with it later.
Set the column number in the column number widget.
Get the QTextBlock containing the cursor's position property (note 1 below).
Get the line number it represents.
If this block is a change from before (note 2):

Set the line number in the line number widget.
Make the cursor selection be the entire line ("click" at the end, "drag" to the front)
Set that cursor in a single ExtraSelection object we keep handy.
Assign that object as a list of one item to the editor's extra selections.
Get the filename of the current image file, if any; and if there is one, display it and the logical folio for that page in the image and folio widgets.

Note 1: If there's no selection, a text cursor's position is just where the cursor is. But if the user has made a selection, the position property might be at either end of it. Drag from up-left toward down-right and the position is the end of the selection. Drag the other way, it's at the start. Drag a multi-line selection that starts and ends in mid-line. One of the lines will have the faint current-line highlight: the top line if you dragged up, the bottom line if you dragged down. I don't think anyone will notice, or care if they do. I could add code to set the current line on min(tc.position(),tc.anchor())—but I won't.

Note 2: Initially, there was no "if ln != self.last_line_number" test; everything was done every time the cursor moved. And actually performance was fine. But I just could not stand the idea of all that redundant fussing about happening when it didn't have to.

Friday, April 4, 2014

Further on the Mac Option Key

The Qt Forum post I made about the Option-key problem, after 22 hours, has been viewed 32 times but drawn no responses. I also posted a respectful query on the pyqt list this morning (after obsessing about the issue some of the night).

I also spent a couple more hours delving deeply into the QCoreApplication, QGuiApplication, and QApplication docs, hoping to find some kind of magic switch to change the behavior of the key interface. I speculate that Qt5 has better Cocoa integration and as a result is getting the logical key from a higher-level interface than before.

Supposing it can't be fixed or circumvented, what I will have to do is: In constants.py where the key values and key sets are determined, check the platform and use Qt.MetaModifier instead of Qt.AltModifier when defining keys for Mac. This substitutes the actual Control shift for the Option shift.

That would be the only module with a platform dependency. Others just use the names of keys and key-sets defined in constants.py. For the user, I will have to have separate documentation about bookmarks, for Mac and non-Mac. For non-Mac, it'll remain "Press control and alt with a number 1-9 to set that bookmark." For mac it will be "Press the Control key and the Command key together with a number 1-9..." And the beautiful consistency ("where you see 'alt' think 'option'" at the front and never mention it again) is gone.

Another issue is the use of ctl-alt-M and ctl-alt-P in the Notes panel, to insert the current line or image number. Possibly I can just change the key definition in constants to whatever the mac keyboard generates for option-M and option-U (pi and mu, it seems). Or keep the directions consistent, and completely wipe out any use of Option-keys in Mac.

Also today I tested and committed the zoom keys, which work a treat. The unit test module buzzes up 10 points and down 15, looks great.

Thursday, April 3, 2014

A Bump in the Road

Today I thought I'd add in the special keystrokes to the editview. There are three groups of them: a set that interact with the Find dialog (^f, ^g, etc), and these I'm deferring until I actually work on the Find panel; a bookmark set, (ctl-1 to 9 to jump to a bookmark, ctl-alt-1 to 9 to set one); and ctl-plus/minus to zoom. All of these were implemented and working in version 1, using the keyPressEvent() method to trap the keys.

So I messed around and tidied up the constants that define the various groups of keys as sets, so the keyPressEvent can very quickly determine if a key is one it handles, or not: if the_key in zoom_set and so on.

With the brush cleared, I copied over the keyPressEvent code from V1 and recoded it (smarter and tighter) for V2 and ran a test, and oops something is not working.

Specifically, it is no longer possible to set bookmark 2 by pressing ctl-alt-2. On a mac, that's command-option-2, which Qt delivers as the Qt.ALT_MODIFIER plus Qt.CTL_MODIFIER and the key of Qt.KEY_2.

Or rather, it used to do that. I fired up PPQT version 1 just to make sure. Yup, could set a bookmark using cmd-opt-2. But not in the new version. Put in debug printout. The key event delivered the same modifier values, ctl+alt, but the key value was... 0x2122, the ™ key? And cmd-alt-3 gave me Qt.KEY_STERLING, 0xA3. And cmd-alt-1 is a dead key.

Pondering ensued. OK, these are the glyphs that you see, if you open the Mac Keyboard viewer widget and depress the Option key. So under Qt5, the keyboard event processor is delivering the OS's logical key, but under Qt4 in the same machine at the same time it delivers the physical key.

Oh dear.

I spent several hours searching stackoverflow and the qt-project forums and bug database but nothing seemed relevant. I posted a query in the Qt forum. But I have little hope. It looks very much as if I'll have to change they key choices for bookmarks, and make them platform-dependent. In Windows and Linux they can continue to be ctl[-alt]-1 to 9, but in Mac OS this will change. The only reliable special key modifiers are control (Command) and meta (the Control key!).

In V1 it was great that I could document just once at the top of the docs, that in Mac, "ctl means cmd" and "alt means option". And that was consistent throughout. Now it won't be because the Option key is effectively dead for my purposes. I'll have to tell the mac user, "when I say control I mean command, but when I say alt, I mean control." Won't that be nice? Plus, I'll have to have code that looks at the platform and redefines the key sets for Mac at startup. Very disappointing.

Friday, March 28, 2014

A First Look at Linguist

So, where were we? I've been away nearly a week, visiting wonderful Ames, Iowa. Not by choice, but by the whimsy of the NCAA Selection Committee, which in its wisdom chose to make the Stanford Women's Basketball team not only a number-2 seed, but make them play the first two rounds in Ames, on the campus of the University of Iowa, where the host school, the 4th-seeded Iowa Cyclones, draw 10,000 screaming fans to every home game.

Well, fortunately the Cyclones faded to a zephyr before the defense of the FSU Seminoles, so for the second game the Cardinal played in front of a half-empty and subdued arena, and won comfortably. Meanwhile we dealt with snow flurries and the difficulties of passing time among the limited amusements of Ames and Des Moines. If you'd rather know about that versus Qt Linguist, check the pictures.

A couple posts back I described the process of designing a widget with Qt Designer and how any string in the design could be designated "Translatable", and how that left distinct code in the generated Python of the widget class. With the result that, when the widget initializes itself, every translatable string will pass through the bowels of the QtCore.QCoreApplication.translate method before being assigned to its QLabel, push button, menu item or whatever its use.

The output of that method—usually just written tr() in the Qt documentation, but for an arcane reason having to do with the relationship of Python classes to C++ classes, PyQt5 needs to always call the Core version not the one inherited by every QObject—the method's output is either the original, or a translated string—if there exists a translation for that string for the current Locale.

But that leaves the question, where do translations come from? From work done by a Translator (a human) using Qt Linguist. I pursued the link between the widget code and Linguist a little further.

The bridge is the PyQt5 utility, pylupdate5. Its use is described in the PyQt5 online docs. One must create a minimal Qt project description file, in this case ppqt2.pro. Actually a make file for the Qt Make program, this file lists the relevant source files and the name of the translation file. Here is what I used:

SOURCES = editview_uic.py
TRANSLATIONS = ppqt2.ts

Listing just one source file now; later there would be many on that line.

Then you turn pylupdate5 loose on the .pro file and it fills up ppqt2.ts with a bunch of XML items like this:

    <message>
        <location filename="editview_uic.py" line="153"/>
        <source>Document filename</source>
        <translation type="unfinished"></translation>
    </message>

Now I could launch Qt Linguist from the Qt distribution, and use it to open the ppqt2.ts file. It presents me with a window whose top is like this:

Every string from every widget (just one widget for now) is shown. Click on one and prepare a translation for it in the bottom part of the window.

For some reason, spaces are shown as gray dots in this part of the window. There exist Qt "phrase books" for many languages, and the French one is open in the above image. It is offering "document" as a translation for "document". Fair enough, but I would have thought "document name" would be a common phrase. Apparently not. I typed in nom du document.

Anyway, that's what the Translator person works with. The texts for the given language would be saved back into the ppqt2.ts file. And somehow become available via the Core Translate method at run-time.

I'm not going to worry further about that last step, for now. I can see how translation would be done. I don't mean to actually do any translations (or request anyone to do any) until the whole app is in near-final state. But at least I know how it all works, I've seen it can work on my system, so that's one Unknown that's Known and I can relax about it.

Thursday, March 27, 2014

Assisting the Upgradement

I got burned by one of what turned out to be quite a list of small incompatibilities between PyQt4 and PyQt5. Just fooling around I tried upgrading one of Mark Summerfield's utilities to PyQt5. It contained the following code:

        path = QFileDialog.getOpenFileName(self,
                "Make PyQt - Set Tool Path", label.text())
        if path:
            label.setText(QDir.toNativeSeparators(path))

Pretty obviously Mark expected getOpenFileName to return a path or a null string. But when executed, and I clicked Cancel in the file dialog, it caused an error in the label.setText statement. Whatever got into path evaluated to True, but wasn't a string.

It turned out to be a tuple with two strings. I documented this to the pyqt mailing list and was embarrassed when Phil just replied with the above link to the list of incompatibilities, one of which is a change to the API of the whole family of five "get..." methods supported by QFileDialog. What had happened to cause this seemingly arbitrary breaking of an existing API? It seems that PyQt4 had introduced some variant methods "to avoid the need for mutable strings". Now these extra "get...Filter" methods were being dropped and their function folded into the basic "get..." methods. And that entailed changing the return value of getOpenFileName from a simple string to a tuple of two strings.

It still seems arbitrary to me, breaking existing code in an unexpected way for no very good reason. But it's a done deal, so how to make sure that this incompatibility, and all the other subtle incompatibilities in the list, don't get overlooked? (And don't miss the fact that one item in the list is open-ended, saying "PyQt5 does not support any parts of the Qt API that are marked as deprecated or obsolete in Qt v5.0." What are those? Are they numerous?)

I decided it wouldn't be hard to write a tool to find and point out all, or anyway a lot of, these issues. In two afternoons of work I put together q45aide.py (click the link to see the Readme and get the code from Github). This is a straightforward source scanner that copies a program and inserts comments above any line that looks as if it will have an upgrade problem.

I'm particularly pleased with two features of this program. One is the way of finding out the modules that contain every Qt class. I needed this because one annoying change from Qt4 to Qt5 is that many classes moved from one import module to another. That invalidates most existing from PyQt4.module import (class-list) statements. I wanted to generate correct, minimal import statements from the class-names used in the program. But that meant having a dictionary whose keys were all the valid Qt class-names (over 880 of them, it turns out) and whose values were the module names that contain them.

I pondered quite a while over how to get such a list of class-names by module. I thought about manually or programatically scraping some pages from qt-project.org. But finally I realized, I could build a complete, accurate list dynamically in the program.

When you import a module, Python creates a namespace. And the names defined in a namespace can be interrogated by querying namespace.__dict__. So the program contains code like this:

    def load_namespace( ):
        global module_dict, import_dict
        # pick off QtXxxx from "PyQt5.QtXxxx"
        module_name = namespace.__name__.split('.')[1]
        for name in namespace.__dict__ :
            if name.startswith('Q') : # ignore e.g. __file__
                module_dict[name] = module_name

    import PyQt5.Qt as namespace ; load_namespace()
    import PyQt5.QtBluetooth as namespace ; load_namespace()
    ...

This loads module_dict with exactly the 880+ class-names related to their include modules, automatically updating should PyQt5 be updated with new or changed class-names.

The other thing that I got a kick out of writing was the way to write a list of class-names in either of two formats, in one statement. The program will generate one "from PyQt5.modulename import (class-name-list)" for each module that the input requires. A program option is -v, asking for the list to be stacked vertically. The only difference is that the class-name-list is either punctuated with comma-space, or with comma-newline-indent. And this is how it comes out:

                out_file.write('from PyQt5.{0} import\n   ('.format(mod_name))
                join_string = ',\n    ' if arg_v else ', '
                out_file.write(join_string.join(sorted(class_set)))
                out_file.write(')\n')

Badda-boom.

Thursday, March 20, 2014

Using Qt Designer

Two unknowns I was fretting about have become less foggy: the work-flow that connects the graphic layout one creates with Qt Designer to the executable code; and the connection between user-visible strings defined to Qt Designer and the Qt translation mechanism.

Qt Designer to code

This part was of course laid out clearly in Summerfield's book. That book was my bible through the early days of building PPQT in 2011-12. Now it shows its age a little because there are minor differences between the PyQt4/Python 2 syntax of its examples, and the PyQt5/Python 3 syntax I'm using. But chapter 7, "Using Qt Designer", covered it all. Here's the sequence.

Run Qt Designer, select a template (in this case, just plain QWidget), and start dragging widgets onto it and laying them out. It's reasonably intuitive, especially if you know the names and uses of most of the widgets and their properties, as I do. It helps a lot to have a big screen. Qt Designer is almost unusable on the macbook because with all its various windows there's no room left for the widget you're designing. I used my desktop system with a 23-inch monitor and it was fine. It took an hour to lay out a satisfactory Edit panel as I described previously. Much of that time was spent in the Properties Editor, checking and specifying and re-specifying the many, many properties of each widget.

In the course of this I quickly discovered part of the answer to the question on translation. The Property Editor entry for any user-visible string—label text, tool-tip, status-tip, whats-this?—has a check-box "Translatable". I checked most of them. This widget has two QLabels and both will have their text filled in dynamically. But all tool-tips need to be translatable.

Having the check-box there keeps one aware that the English text you compose now will have to be translated. My experience in writing for translation, and writing tech material for people who have English as a second language, goes way back, to 1975-6 when I was at IBM World Trade in London and writing material to be read colleagues who were Brits, Swedes, Dutch and Italians. I learned then to keep going over my text to make sure it was simple, terse, and unambiguous; used no colloquialisms or metaphors; used the smallest vocabulary that would express the thought. (Which isn't a bad mindset for any expository writing.)

You save your design to a file of type .ui and invoke pyuic5, a command-line utility that reads it and writes some python source. (Summerfield invokes it by way of a version of make, makepyqt.pyw, but I don't see that in my installation. TB investigated.) The contents of this source were at first a little baffling to me. Here's the start:

class Ui_EditViewWidget(object):
    def setupUi(self, EditViewWidget):
        EditViewWidget.setObjectName("EditViewWidget")
        # ...and 140 more lines of setting-up code such as
        self.DocName = QtWidgets.QLabel(self.frame)
        # ...and the other sub-widgets...

What do I do with this? I wondered. Do I need to instantiate an object of this class? But it doesn't have an __init__; and what's this EditViewWidget being passed to this setupUi method?

Read the manual, doofus. Or in this case, keep reading in chapter 7 of Summerfield.

What I am supposed to do with this is to invoke it in—for the first time in my Pythonic career—a multiple-inheritance class definition, like this:

class EditView(QWidget, Ui_EditViewWidget):
    def __init__(self, my_book, parent=None):
        super().__init__(parent) # initialize QWidget
        self.setupUI(self) # invoke the initializer in class Ui_EditViewWidget

I am creating a QWidget subclass that also incorporates the code prepared by pyuic5. Following the call to setupUI, "self" incorporates all the widgets I defined to Qt Designer, under the object names I gave them in the Properties editor. Further initialization code is needed, for example to connect signals, set up the syntax highlighter, etc. But 150-odd lines of detailed GUI initialization are taken care of separately and more important, can be reviewed and altered at any time with Qt Designer, with no impact on the code.

Translation de-fog

To implement translation, the setupUi method ends with this:

    def retranslateUi(self, EditViewWidget):
        _translate = QtCore.QCoreApplication.translate

...which is followed by a line like this for every string that got the "Translatable" tick-mark:

        self.DocName.setToolTip(_translate("EditViewWidget", "Document filename"))

That pushes the fog of unknowns back a bit. It remains for me to learn how translation actually works.

Tuesday, March 18, 2014

Unknown Unknowns

Yesterday I checked off pagedata as coded and tested. That's the last remaining support or background module needed to allow the editor to run. So the next thing to tackle is editview.py, the visual face of the editable document. I imagine this as a widget containing, principally, the QPlainTextEdit, and below it a bar with five items:

A QLabel containing the filename of the document. This will change its font style with the document's modified status, becoming perhaps bold and magenta when a save is needed (for the document or for metadata).
A QLabel with the current folio number—a label because it isn't changeable by the user; folios are changed by modifying the folio rules in the Pages panel.
A numeric text entry field with room for four digits, displaying the scan image number corresponding to the cursor position. Editable; you can type a new number to effect a jump.
A numeric text entry field with room for 6 digits, displaying the current text block (line) number, updating as the cursor moves. Again, type a new number to jump in the document.
A QLabel with the current "column" number in the current line. Not editable. (Use an arrow key or just click.)

The latter four items of course update dynamically as the cursor moves, on receipt of the cursor movement signal from the editor.

So, if I understand all this (not difficult given it's almost the same as the status bar area of the PPQT main window), where are the Unknown Unknowns of the title? Hah. Well, they are actually known, at least by category, but just the same I feel anxious about launching into this phase. They are:

1. Using Designer

The Qt Designer is a graphic tool for designing a layout. I played with it a bit a couple of years ago when starting PPQT but ended up doing all my widget layouts "by hand" with explicit code like this (one of the simplest ones)

        vbox = QVBoxLayout()
        # the image gets a high stretch and default alignment, the text
        # label hugs the bottom and doesn't stretch at all.
        vbox.addWidget(self.txLabel,0,Qt.AlignBottom)
        vbox.addWidget(self.scarea,10)
        vbox.addLayout(zhbox,0)
        self.setLayout(vbox)

It makes the __init__() rawther lengthy. With Qt Designer you supposedly separate your UI design from the code. Designer saves a file of UI info; you apply a PyQt utility to convert this into something Python can execute; you import it and execute it. Covered in Summerfield's book and in the Qt docs. But I have lots of questions, like: how are signals connected between elements; how are elements connected to the methods that update them; how do label texts set in Designer get tr() translated; just generally a fog of unknowns. But I'd like to give it a try and the editview widget should be a good test case.

I18N

Does anybody use that term any more? "I18N" was a thing back in the 80s, late 70s even at IBM. (It means "Internationalization", duh.) Anyway, I've committed to PGDP Canada that PPQT2 will be translatable. Meaning every damn user-visible text string has to be wrapped in a tr() call. And editview is the first module that has user-visible strings (log messages don't count). The tr() call is all I know about. I am anxious about the whole rest of Qt's I18N system. How do the tr'd strings get collected; how does a translator create an alternate translation; will I have to start using Qt Make; what the heck is Locale and how do I control it for testing purposes... gaaahhh. Much reading to do.

GUI Unit Tests

With this first UI module I enter the world of automated UI testing and I haven't a clue. Well, one clue: QTest. There's a multi-chapter writeup on QTest and simulation of GUI events. I presume I'll use that. There are third-party packages like FrogLogic's "Squish" but they are very expensive, at least that on is. There are open-source packages for test automation but the ones I've seen are single-platform. So I suppose I'll be rolling my own using QTest. But I really have no idea.

So: the known Unknowns are just packed with unknown unknowns. I will be learning as I go, and I will be posting what I learn to this blog. Because that's one way I have of consolidating what I've learned. You're welcome.

Sunday, March 16, 2014

Getting useful info from QFontDatabase

I'll get back to the Last(goddam)Resort font issue shortly. But in learning about it I've had to look a little closer at what Qt knows about fonts, and as a result I've got a function that might be useful to others.

The QFontDatabase object contains whatever Qt knows about fonts that are "available in the underlying window system". However it makes this information available in a rather awkward way. You can call its families() method to get a list of all font names:

from PyQt5.QtWidgets import QApplication
from PyQt5.QtGui import QFont, QFontDatabase
the_app = QApplication([])
f_db = QFontDatabase()
for family in f_db.families():
    print(family)

My macbook is rather over-endowed with fonts, and this prints a list of 238 names.

You can also ask for the recommended or default fixed and "general" fonts,

qf_fixed = f_db.systemFont(QFontDatabase.FixedFont)
print( 'Fixed font:',qf_fixed.family() )
qf_general = f_db.systemFont(QFontDatabase.GeneralFont)
print( 'General font:',qf_general.family() )

Which on my macbook prints

Fixed font: Monaco
General font: .Lucida Grande UI

Here's a surprise: what the heck is that leading dot doing in the name of the general font? There is definitely no font named ".Lucida Grande" in the list of families, nor displayed by the Fontbook app, nor (using find /System ".Lucida*') in the System folder. However, the font db will create a QFont for it:

>>> lgf = f_db.font('.Lucida Grande UI','',12)
>>> lgf.family()
'.Lucida Grande UI'
>>> lgf.pointSize()
13

So it's real in some sense, and your code can use it. Moving on...

You can ask the font db things about any single family name, for example isFixedPitch(family). But font family names aren't consistent from system to system. And often what you want is a font that meets certain criteria, such as: it's "Times" by some name, or it has a bold or italic variant, or it scales to 36pt. So I put together this little function that will return a (possibly empty) list of fonts that meet given criteria:

def font_filter( name = None, style = None, fixed = False, sizes = [], language = None ) :
    db = QFontDatabase()
    selection = db.families()
    if name :
        selection = [family for family in selection if name in family ]
    if style :
        selection = [family for family in selection if style in db.styles(family) ]
    if fixed :
        selection = [family for family in selection if db.isFixedPitch(family) ]
    if language :
        selection = [family for family in selection if language in db.writingSystems(family) ]
    if sizes :
        size_set = set(sizes)
        selection = [family for family in selection
                     if size_set.issubset( set( db.smoothSizes(family, '' ) ) ) ]
    return selection

For example,

>>> print( font_filter( name = 'Helvetica' ) )
['Helvetica', 'Helvetica CY', 'Helvetica Neue']
>>> print( font_filter( fixed = True ) )
['Anonymous Pro', 'Courier', 'Courier New', 'Inconsolata', 'PCMyungjo', 'PT Mono']
>>> print( font_filter(fixed = True, style='Bold', sizes=[12, 18]) )
['Courier', 'Courier New', 'PT Mono']
>>> font_filter(fixed=True, language=QFontDatabase.Hebrew)
['Courier New']

Next time: back to the issue of the Last(goddam)Resort font problem.

Saturday, March 15, 2014

Last (goddam) Resort Font

I wrote CoBro back in 2012 with the original intent of "releasing" it for public use (i.e. making it known on the the comics subreddit) but because of QWebKit instabilities as well as problems packaging it with either PyInstaller or cx_freeze, I put that off until I could re-do it for Qt5 and Python 3.

Now I've done that it still is plagued by QWebKit issues (and still breaks cx_freeze, and PyInstaller still doesn't support Python 3). Among the instabilities is a very intermittent tendency to throw a blizzard of messages like this: 2014-03-14 15:06:59.270 Python[6570:d07] Critical failure: the LastResort font is unavailable.

This is a Python log message, not a Qt one that I could maybe stifle with the technique mentioned in the prior post. And "Critical" at that! So what's going on?

Turning to The Google I find that a lot of people encountered this message with a variety of different apps, but mostly back in 2010 or 2011 when Snow Leopard was new. But some information emerged.

What is the Last Resort font? The official explanation from Unicode.org is that it is a font to use when the system needs to print a Unicode glyph and all available fonts lack values for that code position.

if the font cannot represent any particular Unicode character, the appropriate "missing" glyph from the Last Resort font is used instead. This provides users with the ability to tell what sort of character it is, and gives them a clue as to what type of font they would need to display the characters correctly.

This explains why the message comes only rarely but in a flurry: it happens when a comic web page has a glyph that isn't in any available font. That doesn't happen all the time, but if it does, the site is likely to have a string of such glyphs; hence the multiple messages. But why can't the WebKit browser find the font? With a little poking around on my system (and absolutely no help from the Finder, whose search function insists there is nothing to match that name) I did find that /System/Library/Fonts/LastResort.ttf exists and had 644 permissions. So...

Finally I found this FontGeek post about Safari having the same problem. They claim that because of "sandboxing" the browser can't access the folder where LastResort resides. I'm dubious about the explanation; and the fix they offer isn't directly applicable to the Qt WebKit code as far as I can tell. But it's the best explanation I've seen.

What to do? One possibility: get a copy of LastResort.ttf and include it in the app. I've already got code in PPQT V1 to load a font bundled with the app and add it to the QFontDatabase. I need to think about this, also I need to find a web comic that will trigger the issue reliably.

Thursday, March 13, 2014

Trapping Qt log messages

Using PyQt5, you can install your own handler for Qt's log messages and do with them as you wish, for example diverting them to a Python log file. There are (as usual) some surprises, but by and large, it works.

The context is CoBro, my little web-comic browser. It uses QWebKit to display single HTML pages. After displaying one particular comic (Two Guys and Guy) the webkit code likes to emit a couple of log messages like "error: Internal problem, this method must only be called once." Annoying since there is nothing you can do about it, and it doesn't seem to cause any harm. (BTW this is a known problem, see Qt Bug #30298.)

So I add the following code to Cobro, after creating the App and the main window and everything and we are just about ready to show the main window and enter the event loop:

    from PyQt5.QtCore import qInstallMessageHandler, QMessageLogContext
    from PyQt5.Qt import QtMsgType

    def myQtMsgHandler( msg_type, msg_log_context, msg_string ) :
        print('file:', msg_log_context.file)
        print('function:', msg_log_context.function)
        print('line:', msg_log_context.line)
        print('  txt:', msg_string)

    qInstallMessageHandler(myQtMsgHandler)

Now what comes out on stderr is this:

file: access/qnetworkreplyhttpimpl.cpp
function: void QNetworkReplyHttpImplPrivate::error(QNetworkReplyImpl::NetworkError, const QString &)
line: 1929
  txt: QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.

Ta-daaaa! We have intercepted a Qt log message and analyzed it to show where in the Qt code it originated. One surprise is that the "function" member of the QMessageLogContext object is not a simple function name, but the full C++ signature. Another surprise is that, when I stop on this code in a debugger and look at the msg_log_context item, its members are not strings but "sip reference" objects. Nevertheless by the magic of PyQt5 they print as strings.

Well, printing this bumf isn't a lot of use. What would be more useful, is to divert it into the Python log stream, like this:

    def myQtMsgHandler( msg_type, msg_log_context, msg_string ) :
        # Convert Qt msg type to logging level
        log_level = [logging.DEBUG,
                     logging.WARN,
                     logging.ERROR,
                     logging.FATAL] [ int(msg_type) ]
        logging.log(logging.DEBUG,
                    'Qt context file is '+msg_log_context.file
                    )
        logging.log(logging.DEBUG,
                    'Qt context line and function: {0} {1}'.format(
                        msg_log_context.line, msg_log_context.function)
                    )
        logging.log(log_level, 'Qt message: '+msg_string)

In other words, log the gritty details of the QMessageLogContext at the DEBUG level, but log its actual text at its own self-assigned severity, as translated into Python logging's values. The above code works and now I can redirect QWebKit's annoying messages into CoBro's log file.

Tuesday, March 11, 2014

Making a bad QTextCursor and a promising find

Working today on the unit testing and code finalization of pagedata, the module that keeps track of where the scanned OCR pages of a book each start. This module acts as the data model for several clients: every time the user moves the edit cursor, a bunch of different widgets will ask pagedata "which page is the cursor on now?" The imageview module asks it for the filename of the .png file to display, the scan# and folio# widgets under the edit window ask it for those items, and of course the Pages panel calls on it for the rows of data it displays as a table.

I thought I'd finalized the code but of course as soon as I started adding test calls into it from its unit-test module I found not only bugs but also things I hadn't thought of. It's schizophrenic, flipping back and forth between coding and testing. "What will it do if I throw it this?" the tester asks, and the coder is thinking "Oh shit, why didn't I plan for that?"

Anyway, one of the pieces of crap the test-monkey in me flang at the pagedata module opened a wonderful new prospect in Qt error control! It went down like this.

Important method in the PageData object is read_pages, which processes the page-data lines from the .meta file. When a book is saved, all the metadata goes in the bookname.meta file, including everything we know about page boundary locations. So at load time, read_pages gets called to rip through these saved lines and rebuild the page table as it was when the book was saved.

There are six items in each line, the first being the character offset to the start of the page. That gets turned into a QTextCursor so that Qt will maintain the position as it changes under user editing actions. The code is simple:

try:
    (P, fn, pfrs, rule, fmt, nbr) = line.split(' ')
    tc = QTextCursor(self.document)
    tc.setPosition(int(P))

and so forth. The test case had already flung a non-integer position P, and the failure of int(P) was caught by the try/except fine. So the next nastiness was a bad position value, first 1000000, much larger than the document, next -1. But neither of these tripped an exception! All that happened was that a message appeared on stderr, "QTextCursor::setPosition: Position '100000' out of range" and the QTextCursor was unchanged.

This opened two new questions: (1), how the heck can read_pages detect that it got a bad position?, and (B), how can we avoid having that message, about an error we've anticipated and dealt with, cluttering up stderr and getting the user all upset?

It was too late in the day to investigate (1), but (B) is a problem I've been plagued by for a long time. Qt is just full of unhelpful debugging errors. My other app, a simple web browser based on QWebKit, likes to throw out stuff like this:

QEventDispatcherUNIXPrivate(): Unable to create thread pipe: Too many open files
QEventDispatcherUNIXPrivate(): Can not continue without a thread pipe
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.

And I've been wondering in a dazed sort of way if there mightn't be some way to stifle those. But PPQT2 is using proper Python logging. Every module creates its own logger and writes diagnostics, warnings and errors with the logging API. And I definitely plan not to let those log messages dribble out on stderr; they will eventually go into a file.

So it suddenly occurred to me, is there maybe some way to divert the Qt log messages into the Python logging system? I wasn't aware of anything but I started browsing in the index of the Qt Assistant and turned up this: qInstallMessageHandler. This appears to offer a way to accept and process all of Qt's messages.

I'm quite excited about this. If it is accessible under PyQt5, I see a clear path to capturing all Qt messages and converting them into Python log entries. That will let me stifle the WebKit chatter from my comics browser and also this text cursor message in PPQT. Tomorrow I spend the day at the museum but thursday afternoon I get to dig into this!

Tuesday, February 18, 2014

Model-View design and user expectations of performance

PPQT presents

A table of the words in the document with their properties such as uppercase, numeric, misspelled.
A table of the characters in the document with their counts and Unicode categories
A table of the book pages, derived from the original PGDP page-separator lines

Each of these tables is derived from a "census" in which every line, word-token, and character in the document is counted. In v.1 this census is done the first time a book is opened, and any time after when the user needs to "refresh" the display of word or character counts. It's very time-consuming, 5 to 20 seconds for a large book. Getting the time down for v.2 would be a good thing. So would avoiding a big delay during first opening of a new book.

In v.1 the census is done in one rather massive block of code that fetches each line from the QTextDocument in turn as a QString and parses each, counting the characters and using a massive regex to pick out wordlike tokens. This process is complicated by:

The need to handle the PG codes for non-Latin-1 chars such as [oe].
The need to recognize HTML-like productions: some like <i> and <sc> are common from the start, and later in the book-production process there might be thousands of HTML codes; we count them for characters but not for "words".
But also the need to spot the lang=code property embedded in HTML codes, to signal use of an alternate spelling dictionary.

For v.2 I want to break up the management of all these metadata along MVC lines, with a "data" module and a "view" module for each type, so worddata.py manages the list of words while wordview.py contains the code to present that data using a QTableView and assorted buttons. Similarly for chardata/charview and pagedata/pageview. But will this complicate the census process? Will it slow it down?

Complicate it? Not exactly; more like "distribute" it. I will move each type of census to its data model: worddata will take a word census, chardata a char census, pagedata a page census. So a full census could potentially entail three passes over the document.

However, when this separation is done, it becomes clear that the only census that really needs to be done the first time a book is opened, is the page census. That's because the module that displays the matching page scan image as the user moves through the text, needs to know the position of each page's start. In other words, pagedata is the data model for both the page table and the image-display panel. Images need to be displayed immediately, so the page data needs to be censused the first time a book is opened.

The word and char censii, however, can wait. The char data is the model only for the Char panel. If that panel is showing an empty table, the user knows to click its "Refresh" button to make a char census happen, so the table updates.

The word data is the model for the Word panel, and again, if the user opens a new book and goes to the Word panel, and sees an empty table, it's a no-brainer to click Refresh and update the table. In either case, the user knows they've asked for something, and should be content to wait while the progress bar turns and the census finishes.

The word data is also the model, however, for the display of misspelled words with a red underline, and the display of "scannos", highlighted document words that appear in a file of likely OCR errors. These features of the editor are turned on with a menu choice (? or perhaps a check box in v.2? TBS). If either highlighter is set ON when a new book is opened, the highlights won't happen because the word data isn't known until a census is taken.

Easy solution: we know when we are opening a new book (we don't see a matching metadata file from a prior save), and in that case we force OFF the spellcheck and scanno highlight choices. Then if/when the user clicks spelling or scanno highlights ON, we can run a census at that time. Again the potentially slow process is initiated by an explicit user action.

What about (perceived) performance? It should be snappier. If you Refresh the Chars panel it will rip through the document counting characters, but not spend time on the big word-token regex html-skipping process. Refresh the Words panel and its census will at least not be slowed by counting characters.

Great, but I already started coding worddata on the assumption it would base both chars and words. Now I have to split it up.

Monday, February 17, 2014

Python logging and unit testing

PPQT 2 is to be pretty much a complete rewrite of version 1. I built the first version in an ad-hoc way, adding features one at a time to the basic editor, and as a result its software structure is rather ramshackle. Information about different data structures and formats leaks all over. So now I know where it's going, the next version can be properly compartmentalized and structured.

And better-tested! V1 got "tested" by my using it. V2, I am determined, will have a separate unit-test driver for each module, and every added function means adding test code to exercise it. We be professional here!

And logging! V1 has no logging of any kind. There may be one or two places where an except clause has a print statement in it (blush) but that's it. So I read up on Python logging, and each module will have its named logger and log some occasional INFO lines, always WARN lines where the module is working around some problem, and occasionally ERROR lines.

So the first module finished (yay!) is metadata.py and it has several places where it detects and logs errors. So how, in the matching metadata_test.py, can I test whether the module wrote the expected thing to the log?

There may be better ways, but this is how I'm doing it. First, at the top of the test module is this, which I expect will be boilerplate repeated in every test driver.

# set up logging to a stream
import io
log_stream = io.StringIO()
import logging
logging.basicConfig(stream=log_stream,level=logging.INFO)
def check_log(text):
    global log_stream
    "check that the log_stream contains text, rewind the log, return T/F"
    log_data = log_stream.getvalue()
    x = log_stream.seek(0)
    x = log_stream.truncate()
    return (-1 < log_data.find(text))

During execution of the unit test, log output is directed to an in-memory stream. In the test code, the module under test is provoked into seeing an error that should cause it to write a log line. Then you can just code assert check_log('some text the test should have logged'). The assertion fails if the string isn't in the log. If it succeeds, execution continues with the log cleared out for the next test.

Looking at it now, I think maybe check_log() should take two parameters, the text and the level, so as to verify that the message is at the expected level:

assert check_log('whatever',logging.WARN)

I'll leave that as an exercise. Meaning, I'm too lazy to do it now.

Incidentally, another goal of V2 is to have localized (e.g. translated) text in the visible UI. Perhaps log messages should also be translated but... nah.