Monday, March 23, 2015

Implementing locale aware sorting

So first thing I set out to implement the locale-aware sort in my test program. And right away realized that some changes were needed.

The nut of the matter is to use a key-generator function while building the SortedDict, so as to ensure that the items are stored in the proper order. However, that turned out not to be possible.

The basic database is built on a primary key (a concept I know well from my days documenting the SQL at Informix) and a row of data. This is implemented as a SortedDict. In the case of the Words panel the primary key is a word-token (it may include hyphens and/or apostrophes), and the data are its count and its properties.

It's all very nice knowing that the SortedDict keeps the word tokens in a default ascending sorted order, but what if that order is not the order the View wants to display? I was using QSFPM to solve this, but now I have to address the question directly. What if the View (representing the user) wants the data sorted ignoring case? Or locale-aware in (for example) the de_DE locale? I don't want to have to propagate that knowledge down to where the database is built.

In my prior trial, I had assumed that the default sort imposed by SortedDict was the correct basic ascending-order sort. So I used a simple range(len(database)) as the default sort vector. But that assumption is bad. The default database order can't be trusted to be what the default View wants to display.

For non-default orders—descending order, other column order—I interposed a sort vector. Now I realized I need to interpose a sort vector even for the simple primary key ascending sort.

Well, that turned out not to be too difficult. I required that the sort() method of the model must be called when the database is refreshed. Not a problem, that was already happening. And, I initialized my collection of sort order vectors, which previously started out with the ascending-column-0 sort in place, to all-nulls. And finally, I made the magic get_column_vector() method take a key-function argument.

I had to modify this function considerably. Before, it never had to build a sort vector for column 0; that was assumed to be made when the database was initialized. Now it had to do that. I won't post the whole program until the filtering is done, but here's how that one function looks now (it will change yet more):

def get_column_vector( col, order, key_func=None ):
    global DATABASE, VALUES_BY_ROW, SORT_UP_DICTS, SORT_UP_VECTORS, SORT_DOWN_VECTORS
    vector = SORT_UP_VECTORS[ col ] if order == Qt.AscendingOrder else SORT_DOWN_VECTORS[ col ]
    if vector is None : # we need to make one
        if SORT_UP_VECTORS[ col ] is None : # we have no up-vector
            # Create a key_getter function to extract the key for this column
            if col == 0 :
                # key for column 1 is just the primary key string
                key_getter = lambda j : VALUES_BY_ROW[j][0]
            elif col == 1 :
                # key for column 1 is nnnnnPRIMARYKEY
                key_getter = lambda j : '{:05}{}'.format( VALUES_BY_ROW[j][1], VALUES_BY_ROW[j][0] )
            else : # col == 2
                # key for column 2 is XXXXXXXXPRIMARYKEY
                key_getter = lambda j : '{}{}'.format( VALUES_BY_ROW[j][2], VALUES_BY_ROW[j][0] )
            # Create the sorteddict, it is happy with key_func of None
            sort_dict = SortedDict( key_func )
            # Sweep through the actual database collecting keys and relating them
            # to their index position.
            for j in range( len(DATABASE) ) :
                k = key_getter( j )
                sort_dict[ k ] = j
            # Store the dict for reference.
            SORT_UP_DICTS[ col ] = sort_dict
            # Cache the values-view of it for use by the view
            SORT_UP_VECTORS[ col ] = sort_dict.values()
        # We have a sort-up vector, or was it sort-down we need?
        vector = SORT_UP_VECTORS[ col ] # be optimistic
        if order == Qt.DescendingOrder : # ok, make the reverse
            SORT_DOWN_VECTORS[ col ] = [j for j in reversed( SORT_UP_VECTORS[ col ] ) ]
            vector = SORT_DOWN_VECTORS [ col ]
    return vector

This does display the random keys of the fake database in proper order, all lowercase variants of a letter together, all uppercase variants of the same letter together. Or all mixed when case is ignored. I'm thinking that now, when all the ordering is imposed by vectors, is there any need to make the bottom-level database a SortedDict at all? Maybe it can be a normal Python dict.

With the extra code, sorting now takes around 1 second for a 10,000 row database. That's still a tenth the QSFPM time, and when I actually implement it in PPQT I think I can make it faster.

Now on to filtering. The above function will get yet another parameter!

No comments: