Tuesday, February 2, 2016

Function signatures; bytecode lacks "computed go-to"

Well, sure, there was a bug. A good thing I blogged about it, or who knows it would have gone untested and unfound?

The Code class represents a code object, but in a form that can be manipulated. It has two crucial methods. from_code() accepts a Python code object and captures all its contents, returning a new instance of Code class. I had properly updated that method to notice the Python 3 features of varkwargs and the kwonlyargcount.

I had not completely dealt with these in the other method, to_code(). Calling to_code() of a Code object returns a code object that is supposedly equivalent. But I had not included the kwonlyargcount in the calculation of the code.co_argcount. So it was off by 1, causing a TypeError exception, wrong number of arguments, when you called the code.

Testing for that also revealed a bug in my unit-test scaffold code. But it's all good now.

Tiny Basic and the computed go-to

In the first post in this series, I mentioned that one use of byteplay would be to implement domain-specific languages using Python bytecode as a the implementation language. And I suggested I might try to implement a Tiny Basic compiler in this manner. Turns out? Not so much.

Historic background: Tiny BASIC was the term for several, minimal BASIC interpreters for early microcomputers. The first was written by Tom Pittman of Itty Bitty Computers. The more influential version—because it was for the 8080 where Pittman's was for the 1802—was by Li-Chen Wang. I was aware of Tiny BASIC although I never used it. (I programmed my Z80-based CP/M system in assembler, thank you.)

Anyway, Tiny BASIC is such a minimal language that its interpreter, including an adequate editor, can be implemented in a few kilobytes. On a walk yesterday I thought about it and at first got rather excited about the possibility. I quickly arrived at a program structure (a dict keyed by the line number with the text of the line as value) and visualized how I could use the built-in compile() function to reduce expressions to bytecode, and glue those bytecode bits together with more bytecode and thus produce a whole Python function from a BASIC program.

Then, sitting in a coffee shop, I used my phone to look up the syntax of the language...

...and is it not a fabulous age we are living in, where one can be sitting over a capuccino and have the passing notion, "I wonder what was the syntax of a programming language last used over thirty years ago?" and be reading the answer in less time than it takes to describe it? Seriously, people, why are we not all happy as kings?

So, here's the manual. Less than five minutes of skimming on the little screen of the phone revealed a major, major problem.

The problem is that Python bytecode has no real "jump" instruction as found in an assembly language. It has several jump opcodes, for example POP_JUMP_IF_TRUE and JUMP_ABSOLUTE, but the argument to all of these is a fixed integer offset in the bytecode string. The amount to jump forward or backward, or the absolute offset to jump to, is hard-coded in the instruction. There is no way in bytecode language to say, jump to the offset encoded in the top-of-stack item.

Without such a "computed go-to" it becomes much more difficult to implement Tiny BASIC. Because Tiny BASIC has both a computed GOTO and a GOSUB. Of the GOTO, Pittman's manual explicitly says, "the next statement to be executed after a GOTO has the line number derived by the evaluation of the expression in the GOTO statement. Note that this permits you to compute the line number of the next statement on the basis of program parameters during program execution. (my emphasis)". The GOSUB presents the same problem in two ways. First, it takes an expression, so the actual target is determined at execution time. Second, it stores the return location on a stack for use by the RETURN statement. RETURN is effectively a GOTO where the destination is fetched from the call stack.

None of these things can be implemented using a bytecode jump. So that pretty well ends any thought of compiling a Tiny BASIC source file into a single Python function.

I did think of a complicated mechanism, basically each line of the BASIC program would compile into a separate function that could operate on globals (the BASIC variables) and must return the number of the next line to be executed, with None meaning, "next sequential". But it would be kind of ugly and clumsy and not a good demo of byteplay3.

No comments: