Sunday, March 22, 2015

Getting the sort right

I mentioned a couple of posts back how the natsort package wasn't behaving correctly for me. I opened an issue on the github page and got an immediate and helpful reply from the author. He underscored what he had already noted in his doc page, that Locale support in both Mac OS X and BSD seems to be broken, and a simple fix is to install the ICU package (International Componenents for Unicode) and its Python binding, PyICU. It seems natsort attempts to import PyICU and it succeeds, uses it; otherwise falls back to the native support.

I was reluctant to do this because I didn't want to have to support Yet Another Goddam Third Party Module (YAGTPM, which when pronounced, conveys my feelings) on all the intended platforms for PPQT. But in fact, it appears it is only needed on OSX. (Hopefully Linux and Windows will Just Work.) So I went ahead and did it. Here's the sequence of operations one needs, if you have HomeBrew installed, slightly expanded from the natsort author's suggestions:

brew install icu4c
CFLAGS=-I/usr/local/opt/icu4c/include
export CFLAGS
LDFLAGS=-L/usr/local/opt/icu4c/lib pip install pyicu
export LDFLAGS
pip install pyuic

The CFLAGS and LDFLAGS are to tell pip where HomeBrew leaves things.

After doing this, I tested natsort and got the desired results,

words = ['apple', 'åpple', 'Apple', 'Äpple', 'Epple', 'Èpple', 'épple', 'epple']
key_func_L = natsort.natsort_keygen( alg = natsort.ns.LOCALE )
print( sorted( words, key=key_func_L ) )

['apple', 'Apple', 'åpple', 'Äpple', 'epple', 'Epple', 'épple', 'Èpple']

key_func_L = natsort.natsort_keygen( alg = (natsort.ns.LOCALE | natsort.ns.IGNORECASE ) )
print( sorted( words, key=key_func_L ) )

['apple', 'Apple', 'åpple', 'Äpple', 'Epple', 'epple', 'épple', 'Èpple']

This is what Qt's built-in table sort does. So that's solved.

Tomorrow I am going to do several things. One, I am going to modify my table sorting test platform to use natsort as the key generator in the sorted dicts it creates. That should produce proper sorting in the table. (That might be a bit hard to see given the nature of the generated "words" but it should be visible.)

Two, I am going to add filtering to the test platform in the manner of the Word table of PPQT and make sure that my plan for doing that works.

Three, if time permits, I am going to build a better test case to prove the problem with QSFPM. I believe I know how to time exactly and only the sort operations, so it will not be necessary to display the time used for endResetModel. The idea of resetting the table model seemed to perplex the responders on the Qt forum, so I'd like to eliminate it as a consideration and just show drastically long sort times.

If I can do that in a fairly clean test case, I will file a bug on QSFPM. I've not had good results filing bugs when the demonstration case is PyQt code. But maybe it will work. But I won't spend more than a couple of hours on that.

Tuesday I hope I will be able to put what I've learned from this exercise back into PPQT, first in the loupeview table and then in the Words table.

No comments: