Saturday, January 30, 2016

Byte-playing, a new project to doodle on

So. Been awhile.

Going to introduce a different project I've been messing with for a few weeks: byteplay3.

More on that in a minute. First the status of old projects.

PPQT2 continues to have one or two regular users, and a minor UI issue was posted a couple of weeks ago. No real bugs found; is that good news? It says something about the code quality; but I suspect it says more about how nobody is using it. I would be a regular user, if I were still doing PGDP Post-Processing. Unfortunately, the switch to EPUB with its unavoidable compromises on book quality has killed my interest in that. So I'm not PP'ing any more.

Anyway, I want to fix that UI issue and bring it up to the latest levels of its dependencies. Qt 5.6 is due out shortly, with PyQt5.6 to follow soon after. So when I can install those levels, I'll do the code update to the Find panel and rebuild on all platforms. That will probably be the end of PPQT2. Well, it was a most satisfying hobby project to design and build.

I remain a daily user of CoBro, and it definitely needs to be refreshed. It embeds the Qt WebEngine. Recently a couple of the comics I read have stopped loading, I think because they are insisting on a higher or different level of https encryption than this old WebEngine module supports. So I will also rebuild CoBro with Py/Qt5.6 and, one hopes, it will stop giving an obscure error when it tries to load Penny Arcade.

But that's all to do in a couple of months, whenever the Py/Qt upgrade happens.

The new project, byteplay3, needs to be at Python 3.5. I've been making do with 3.4 for a while, and there's no function in 3.5 that would benefit PPQT, but for byteplay I need to test against async coroutines and such.

Byteplay

The original byteplay, written by Noam Raph, was uploaded to PyPi in 2010. Briefly, the point of byteplay is to make it easier to diddle with the bytecodes generated by the Python compiler. You, gentle reader, are a highly competent Python user, so I will only show the meat of the example code. I'm sure you can figure out what's going on.

>>> def f(a, b):
...   print(a, b)
...
>>> f(3, 5)
    3 5
>>> from byteplay3 import *
>>> # convert code object of function f to a Code object
>>> c = Code.from_code(f.__code__)
>>> c
    <byteplay3.Code object at 0x1030da3c8>
>>> print(c.code)
2        1 LOAD_GLOBAL          print
         2 LOAD_FAST            a
         3 LOAD_FAST            b
         4 CALL_FUNCTION        2
         5 POP_TOP              
         6 LOAD_CONST           None
         7 RETURN_VALUE         
>>> c.code[4:4] = [(ROT_TWO,None)]
>>> f.__code__ = c.to_code()
>>> f(3,5)
    5 3

Short intro follows. If you understood perfectly what happened in that demo, skip ahead. Or for a longer explanation see the detailed "about" page that I've made (starting from Raph's original).

When Python processes a def statement or lambda or compile() expression, it compiles the source text into an internal form. The heart (though by no means all) of the internal form is a byte array, a string of "bytecodes" which represent machine instructions for a simple stack machine. The bytecode representation is binary and designed first for speed of execution and second for compact storage. Readability and post-compile editing were not goals of that design.

The standard module dis will display the bytecode of a function or compiled expression in much the same format as shown in the example. In fact, dis of Python 3.5 has a nice facility that lets you read a bytecode stream one instruction at a time from an iterator. (I think this preempts a couple of the usual uses of byteplay. But it still has some value.)

That takes care of displaying the bytecode of a function. But what if you want to modify it? In the preceding example, a ROT_TWO instruction is inserted in the sequence and that changes the function's behavior. More realistically, there are many opportunities for peephole optimizations, where you look for inefficient code sequences and shorten them. Ryan Kelly made the promise package based on the original byteplay. It provides decorators that optimize certain code sequences of your functions.

Or, I can imagine wanting to generate a bytecode sequence starting with some other notation. You could design some little Domain-Specific Language, and compile it down to bytecodes, and call it from a Python function. I'm considering doing a demo of byteplay3 in which I implement, say, Tiny Basic by compiling it into bytecodes. Python bytecode: the poor man's LLVM!

Kelly's promise module, and the original byteplay, are firmly dependent on Python 2. I thought it would be fun to bring byteplay, and maybe promise, into the world of Python 3. And that's what I've been doing in the odd spare hour for the past month or so.

This is long enough; I'll delve into some of what I've done next time.

Meanwhile ... if Noam Raph is out there? I'd love to talk! You aren't on Facebook or LinkedIn nor a user on github...