Tuesday, June 3, 2014

Default value of f_of_x

Whoof! What started as a modest revamp of my first-draft file handling code has turned into a major exercise of revising and refactoring. But I also found a cool thing to share!

The main window manages the File menu. An important feature of the File menu, as I mentioned yesterday, is the sub-menu of Recent documents. This has to be built and populated dynamically, upon the signal aboutToOpen from the File menu's action. Which means that the program should not spend much time formatting that menu.

Input to the Recent menu is a list of previously-opened files, held as a list of path-strings. The list is kept in sequence by use-time, with the most-recently-used files first. As I mentioned in the prior post, this list can't be sorted by filename, because it might contain two files with the same name but different folder paths.

When populating the menu, the code cannot assume that these files exist. One might have existed yesterday, but it was on a USB stick that is not mounted at this time. So that file should not be shown in the menu. However, a file should not be discarded from the list just because it is not available one time. Before the next time the File menu is shown, the USB stick could be inserted. Then the file should be shown as available.

So there's the situation: at the instant the user clicks on File in the menu bar, the code must run down a list of as many as 10 filepath strings and determine for each,

  • Does it exist?
  • If so, make a menu action for it

Of what should the menu action text string consist? In BBEdit, the string is "Filename emdash folderpath". In the Wing IDE that I use, it's "Filename (folderpath)". Mac default apps like TextEdit and Numbers don't bother with paths; they just show filenames. OpenOffice has "n: fullpathstring" for n from 1 to 9. PPQT V1 also displays an index number, "n filename" (no colon). The index number is also set as the accelerator key for Windows, so a really hot Windows user could key alt-F alt-R alt-3 to open the 3rd most recent file. A pretty silly feature IMO. I believe for V2 it will be "n filename (folderpath)".

This means that the logic of populating the sub-menu is, for each pathstring in the list:

  • if the file isn't accessible, continue
  • split the path into (filename, folderpath)
  • make the string "{0} {1} ({2}).format(index,file,folder)"
  • make a QAction and add it to the menu

Caching

ALERT! The following had a major although un-obvious error, which is corrected below.

Checking for accessibility means asking, does the file exist, is it really a file (not a directory), and is it readable? These are methods of the QFileInfo object, as is the ability to fetch the filename and folderpath separately. But that means instantiating a QFileInfo, a moderately expensive process. Since the same set, or an overlapping set, of path strings will be checked again and again, it makes sense to cache the QFileInfo objects. The first time a path is checked, make a QFileInfo. If (when) it is checked again, re-use that QFileInfo. Good concept; how to implement?

There exist nice "memoization" solutions for Python, generic code that lets you put a decorator @memoize on a function to automatically cache the result that corresponds to each unique argument. However in this case, I don't want to cache the result for a given path string, I want to cache an intermediate value, the QFileInfo for that string, so I can use it in a different function instead of recreating it. So memoization is out.

What I need is something like collections.defaultdict: a dictionary whose keys are pathstrings, and whose values are QFileInfo objects upon those pathstrings. When the dictionary is queried for a given key, and key is not present, it should set a default value of QFileInfo(key).

Unfortunately, this is not what collections.defaultdict provides. It allows you to provide a "default factory" that is a fixed value, or a classname such as list, in other words the default factory is a constant function f(k). What I want is a defaultdict that provides a default factory f(key), where the default value is a function of the missing key. And indeed, defaultdict provides a way to do that, by overriding its __missing__() method. Update: When I first wrote this, I thought that __missing__ did the whole thing: provided a default value for the missing key, and stored that value as the value of the key. This was not so! It is necessary for __missing__ to save the default value explicitly, as shown below.

Without further ado, here is generic default dictionary that does this:

from collections import defaultdict
class key_dependent_default(defaultdict):
    def __init__(self,f_of_x):
        super().__init__(None) # base class doesn't get a factory
        self.f_of_x = f_of_x # save f(x)
    def __missing__(self, key): # called when key is not defined
        ret = self.f_of_x(key)  # calculate the default value of key
        self[key] = ret         # save for future uses
        return ret

To use this, create an object of this class passing the name of a function of one argument that returns an appropriate default value. In my case I used it so:

_FI_DICT = key_dependent_default( QFileInfo )

def file_is_accessible(path):
    global _FI_DICT
    qfi = _FI_DICT[path]
    return qfi.exists() and qfi.isFile() and qfi.isReadable()

# Split a full path into a tuple (filename, folderpath)
def file_split(path):
    global _FI_DICT
    qfi = _FI_DICT[path]
    return ( qfi.filename(), qfi.canonicalPath() )

The mainwindow will call file_is_accessible for some path, and soon after, will call file_split for the same path. The first time, a QFileInfo is created. On every subsequent call, the same QFileInfo is interrogated.

The code for key_dependent_default imposes no limit on the size of the dictionary. In my case, there will be at most 10 recent files in the settings when the app starts up, plus as many unique files as the user opens during the session. So I am not concerned about the size of the dictionary. But it would be possible to code the class so it limited its own size. When it reached the limit, it could simply stop caching new items, or it could delete one of its own members at random, or with more work, it could delete the least recently used member.

No comments: