Monday, April 14, 2014

Case of the Shredded Data

A persistent source of PyQt bugs is the problem that as soon as a variable "goes out of scope"—that is, cannot be referenced from any statement—it gets garbage-collected and either re-used or made unreachable by the virtual memory hardware. Newbies to PyQt get bitten by this early, often, and hard. It usually shows up as a segmentation fault that takes down the Python interpreter and your app. And there's no obvious bread-crumb path back to the problem.

The usual problem is that you build one object A based on another object B. Then you pass object A around and try to use it in another part of the program. Meanwhile, object B, which was just input material to making A, has gone out of scope and been shredded. Then any use of A references non-existent memory and segfaults, or produces weird results because it is accessing memory that doesn't contain what it should.

Case in point: QTextStream. This is a useful class, very handy for reading and writing all kinds of files. You could use the Python file class instead, but you need to standardize on one paradigm or the other, Python's files or Qt's QTextStreams. And I've gone with the latter, but they have this little problem of segfaulting if you are not careful with them.

A QTextStream is built upon some source of data, either a QFile or an in-memory QByteArray. The class constructor takes that source object as its only argument, as in this perfectly innocuous function:

def get_a_stream(path_string):
    '''Return a QTextStream based on a path, or None if invalid path'''
    if not QFile.exists(path_string):
        return None
    a_file = QFile(path_string)
    if not a_file.open(QIODevice.ReadOnly):
        return None
    return QTextStream(a_file)

Lovely, what? Simple, clear—and wrong. Because a_file goes out of scope as soon as the function returns. Many statements away in another part of the program, the next use of the returned stream crashes the program. This is (in my oh-so-humble opinion) a stupid design error in Qt (it affects C++ users too) but fortunately it is easy to work around. You just use the following instead of QTextStream:

class FileBasedTextStream(QTextStream):
    def __init__(self, qfile):
        super().__init__(qfile)
        self.save_the_goddam_file_from_garbage_collection = qfile

That's it! An object of this class FileBasedTextStream can be used anywhere a QTextStream would be used, but it does not require you to find some way to save the QFile from the garbage collector. The single reference to the QFile keeps it alive until the stream object itself is freed.

I solved this same issue earlier, for memory-based text streams. These are very handy for nonce files, and my unit test code builds lots of them.

class MemoryStream(QTextStream):
    def __init__(self):
        # Create a byte array that stays in scope as long as we do
        self.buffer = QByteArray()
        # Initialize the "real" QTextStream with a ByteArray buffer.
        super().__init__(self.buffer)
        # The default codec is codecForLocale, which might vary with
        # the platform, so set a codec here for consistency. UTF-16
        # should entail minimal or no conversion on input or output.
        self.setCodec( QTextCodec.codecForName('UTF-16') )
    def rewind(self):
        self.seek(0)
    def writeLine(self, str):
        self << str
        self << '\n'

It's just a QTextStream based on an in-memory buffer, but the buffer can't go out of scope as long as the object exists. It adds a couple of minor features that QTextStream lacks.

No comments: