Thursday, March 19, 2015

Locally frustrated

A good part of the coding part of the day was spent fixing a bad pull request I'd offered to PyInstaller. And in the course of testing that, discovered another issue. Since the last time I tried PyInstaller on PPQT2, I've added the helpview module, which calls on QtWebView, which lives in QtWebKitWidgets. Only thing, the Python3 branch of PyInstaller didn't know about QtWebKitWidgets, so I had to create a "hook" for it. Then it ran, making a working app. So that's good, but it chewed up some time.

Then I turned to the issue of making SortedDict sort its keys in a locale-aware fashion. Here I've run into a problem, or at least a difference between native Python and Qt, where for once Qt looks smarter.

Recently I added self.setSortLocaleAware(True) to the sortProxyFilter that is currently the sorting mechanism in PPQT2. And immediately it began to sort "correctly" in that the following sequence holds:

apple Apple ápple åpple Äpple bapple

In other words, the accented forms of a sort immediately after the un-accented a.

I've spent the last couple of hours trying to get Python sorted() to emulate that, without success. The default of course is not what's wanted.

sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'] )
['Apple', 'apple', 'bapple', 'Äpple', 'ápple', 'åpple']

It's basically the numerical values of the code-points. Localization is supposedly supported by the locale module, in particular, locale.strxfrm() that "Transforms a string to one that can be used in locale-aware comparisons." It can be given to sorted() as a key-function.

sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key=locale.strxfrm)
['Apple', 'apple', 'bapple', 'Äpple', 'ápple', 'åpple']

No change. Shall we try it in French?

locale.setlocale(locale.LC_ALL,'fr_FR.UTF-8')
'fr_FR.UTF-8'
sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key=locale.strxfrm)
['Apple', 'apple', 'bapple', 'Äpple', 'ápple', 'åpple']

And no change. So it doesn't seem to do anything.

The NatSort package aims to support "natural" sorting including grouping characters. It also can mint a key function to give to sorted. Let's see.

ns_key = natsort.natsort_keygen( alg = natsort.ns.LOCALE )
sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key = ns_key )
['Apple', 'apple', 'bapple', 'Äpple', 'ápple', 'åpple']
ns_key = natsort.natsort_keygen( alg = natsort.ns.GROUPLETTERS )
sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key = ns_key )
['Apple', 'apple', 'bapple', 'ápple', 'Äpple', 'åpple']

Interesting. The "group letters" option changed the order of the accented letters although not in any obviously useful way. The accented forms still sort after the b.

NatSort is supposed to support ignore-case, and this at least is a function I need. I totally forgot, in my pleasure at working out how to do my own table sorting, that I need to allow for re-sorting when the user changes the Respect Case switch. So let's see if NatSort handles that at least.

ns_key = natsort.natsort_keygen( alg = natsort.ns.IGNORECASE )
sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key = ns_key )
['apple', 'Apple', 'bapple', 'ápple', 'Äpple', 'åpple']

Yes and no; at least it now sorts a ahead of A, but it is ignoring the Unicode definition which should surely put Ä next to A; they are both "LATIN CAPITAL LETTER" types. Adding the locale-aware flag doesn't help, either:

ns_key = natsort.natsort_keygen( alg = (natsort.ns.IGNORECASE | natsort.ns.LOCALE) )
sorted( ['apple','Apple','ápple','åpple','Äpple','bapple'], key = ns_key )
['apple', 'Apple', 'bapple', 'ápple', 'Äpple', 'åpple']

This is all with the locale still set to 'fr_FR', by the way.

There is a note in the NatSort documentation to the effect that Python locale is broken on some platforms notably Mac OS X, and to install the IBM ICU package. I am not going to do this for two reasons. One, it is a huge package with lots of C++ code, so it would have to be built separately on every platform and included in a distribution package. And two, Qt is doing it right without any help.

However, I'm not sure what to do next to make my hand-made table sort to work.

No comments: