Monday, March 16, 2015

Testing table with QSortFilterProxy

With the test framework set up as described in the previous post, it is quite simple to implement real sorting in a table. The modified code is here in Pastebin. It is almost identical to the basic code.

The process of adding a proxy is described in the doc page for QSortFilterProxyModel. The proxy model object is given to the table view as its model. The proxy is given the actual table model as its source model. Here is the added code to do the job. First define the proxy class.

from PyQt5.QtCore import QSortFilterProxyModel
class Proxy( QSortFilterProxyModel ) :
    def __init__ ( self, parent=None ) :
        super().__init__( parent )
        self.setSortCaseSensitivity( True )
        self.setSortLocaleAware( True )

In the setup code in the main window, interpose it between the model and the view.

        self.table_view = View( parent=self )
        self.table_model = Model( parent=self )
        self.table_proxy = Proxy( parent=self )
        self.table_proxy.setSourceModel( self.table_model )
        self.table_view.setModel( self.table_proxy )

That's all! Now clicking column heads makes actual sorting happen.

When you run this program it looks just like the basic version. And if you do not click a column head, it performs just like the basic one as you can see in this capture.

Just as with the unsorted table, the view is only fetching the rows it needs to fill the window. The time to complete a refresh is a few milliseconds longer than the database rebuild.

Things change drastically if one simply clicks once on the head of column 1, to request a change of sort order. First, there is an agonizing pause of 9 seconds. The Mac OS "beachball" cursor is on through most of this. Then the data displays in the new sort order.

(I still do not know a good way to insert instrumentation into the process of a column sort. The QHeaderView emits a signal when the sort order changes, but not when the sort ends. Perhaps I could override the sort() method of the proxy? But it is not necessary.)

From now on, with sorting in effect, clicking Refresh causes a very different reaction than before.

I think we have a result here.

If the sort is moved to column 2 (by clicking once in the column 2 header), there is again a 9-second delay. Then clicking Refresh gets similar but not quite the same results:

The number of data() calls is lower, possibly reflecting the presence of a significant number of duplicate values in column 2. The elapsed time is proportional to the fewer calls. But the numbers are still extraordinarily big.

To me there is no way to account for the huge number of calls to the model data() function. A sane implementation would call 10,000 times to fetch all the key values into an array; sort the array using any available standard library routine; save a vector of indexes internally; and make a further 12 calls to fill up the display. Total, 10012 calls to the sorted column and 12 to the others.

That is clearly not what the proxy code is doing. A standard sort should make O(N*log2(N)) comparisons; in this case, about 133,000 comparisons. Even if we suppose that every comparison involved fetching two values from the model—which it certainly should not—that would still be only about 266,000 calls. I simply cannot image what the proxy is doing to require so many calls to the model.

Well, whatever it is doing, it is not doing it because I mis-coded something. This is absolutely minimal code. (Yes, I tried turning off case sensitivity and local-aware sorting. It didn't make a difference to the sorting in column 1, the words. It might have slightly speeded sorting column 2, integers.)

In the next post, I will explore ways to do my own sorting, taking advantage of the abilities of the SortedContainer package.

No comments: