Tuesday, May 26, 2015

Parsing a document: 3, trying out YAPPS2

I have winnowed down the list of 34 (yes you read that correctly) parser generators to a very short list of ones that (a) are pure Python, (b) are documented in a readable and complete fashion, (c) appear to allow the user to tinker with the tokenizer—as opposed to being locked-in to parsing strings or files. That group is:

I've spent the last two afternoons reading parser docs until my eyes bleed and their features are starting to run together. It would be a fascinating exercise, and useful to the Python community, to spend a few weeks really sorting out those offerings and put together a paper with comparative code examples and timings and such. I don't have time to do it well even for the short list above. Maybe someday.

Anyway to commence I thought I'd generate a parser using YAPPS2, and trace through the code of the generated parser and really get a handle on what it does. So I downloaded it. First thing to note is that the download link in PyPi doesn't really go to a download page, but to a page that points in a confused manner several directions: to a Debian package, another Debian package "with improvements", and to a Github repo that is supposedly Python 3 compatible. But it isn't. But there's a link to a set of patches for the Github code that fixes quite a few Python 3 issues, notably print statements. But it wasn't complete; very shortly after applying it I ran into an unfixed "except Exception, e" and soon after, another unfixed print statement. So it's an adventure getting it going.

But I got it to where I could begin to try the first example in the manual. Which is clearly very old, because this is supposedly the YAPPS2 manual, but the example has you "import yapps"—it has to be "import yapps2" now. And that did not work, but immediately stopped with an undefined name. Exploring, it turns out that the code is such that the hand-execution shown in the manual (start python and type "import yapps2; yapps2.generate('filename')") cannot possibly work. A critical statement "from yapps import grammar" is only executed when yapps2.py is run from the command line.

OK the generate step now reads the "calc" (basic calculator) example definition and writes a small and readable python program. Which upon execution reveals several more Python 2/3 issues, including use of raw_input and some more print statements. But when I manually fixed those, it actually worked, reading expressions, parsing them, and printing the results.

My brain is a bit fried at this point; gonna take a nap now; tomorrow is Museum work day; resume this on Thursday.

No comments: