Sunday, March 15, 2015

Building a case against QSortFilterProxy

This is part 1 of a multi-part post on speeding up the sorting of tables in Qt.

I've been disturbed by the apparent slowness of the Words table in PPQT2. It takes 15 or more seconds to refresh the table (rebuild it from scratch), and 8-10 seconds of "spinning beach-ball" cursor to re-sort on any column.

Then this week my α-test user reported that, for a large book, the bookloupe refresh was adding at least 12 seconds over the time that the bookloupe program required when run from the command line. That's consistent with the word-table delay: about a second for every 1000 rows of the table. (Not that I think the delay is a linear function of N; it might well be exponential!)

Nothing that a word-processing app does using local data, should take more than a second of elapsed time. Not on a modern computer. Maybe there are some natural-language algorithms, or neural-network or genetic problems, that might tie up a 64-bit, 2.8gigahertz CPU with eight gigabytes of RAM for substantial time. (Parenthetically, those numbers still boggle my mind; I spent so many hours at the keyboard of a 64-kilobyte CP/M system with a 2megahertz Z80...) A modern laptop should not spend any perceptible time sorting 10,000 rows of 3-column text. It does, and that means something's not right.

I did some initial fiddling around a couple months back, stuck timing statements around the Word table refresh operation and so on. From that I verified that it was taking my code much less than a second to sweep a large document and collect all the words in a vocabulary table. That included a few tens of thousands of calls to a fairly elaborate regex, and execution of another 30-odd lines of Python for each token it found.

Then my code told the QAbstractTableModel that it was time to endModelReset(), i.e. the data is all good now, redisplay the table. And that's when several seconds would elapse.

I didn't find any way to instrument the sort that happens when the user clicks on a column header. But I timed it manually at 8-10 seconds from click to redisplay of the inverted sort.

At the time I posted a query in the Qt forum. The answers I got were that, one, beginResetModel/endResetModel was a bad thing to do and should be avoided, and two, that sortFilterProxy is going to be slow and,

If you need quick sorting and you have access to the raw data in the model, you should just implement the sorting in your model itself. Just reimplement the sort function.

I wasn't very happy with this, because I thought I had revealed some pretty horrendous behavior from the sortFilterProxy. But I didn't have time to really nail the case for the prosecution. So I put the problem off. But now my user has shown the problem exists in the Loupe table also, I've gotta do something.

The first thing to do is to build a simple test-bed where I can experiment with the table code. I want to use that first to prove to myself that Qt's sort is actually slow, and not simply because I've done something stupid to slow it down. If that's the case, then I'll use that code to work out how to do my own sort.

In the next post of this sequence I will introduce that test-bed code.

No comments: