Tuesday, August 18, 2015

Ppgen translator done, some bugs found

I finished the ppgen translator this afternoon. In order to verify it works I had to download and run the ppgen program itself, which proved quite simple. You just go get it from its github page, a big single Python module, and run it. I started to read it but decided instead to treat it as a black box. While I respect the effort that RFrank continually pours into supporting it, there are things about its design, and its documentation, that irk me as a professional writer and programmer. So if I start to read it I will just be picking nits and thinking of how it should be done, and that's unproductive of my time. Worse, I could sucked into helping maintain it. Run away!

Anyway, during the testing I found some things that couldn't be accounted for in my Translator code (which turned out to be pretty simple, less than 300 lines with lots of comments). Investigation led to two small bugs in the PPQT Translator support itself. There was one logic error that resulted in generating a spurious blank line preceding any no-reflow section. I am not sure why I never noticed that until these tests.

The other had to do with the YAPP-generated document parser. The way it was written, the following perfectly normal input,

...end of a paragraph.

<tb>

Start of next paragraph...

was wrongly parsed as if the second paragraph was a section head. In the DP document format a section head is marked by two preceding empty lines. There was only one preceding blank line here, why was it being parsed as a head?

Every production (other than the thought-break, which was a late addition to the parser) ended with EMPTY? to absorb any empty line that followed them. For example, a no-reflow section was defined as XOPEN (LINE | EMPTY)* XCLOSE EMPTY?. So if the user wrote

/X
stuff...
X/

New paragraph...

the blank line after the X/ line would be absorbed into the NOFLOW section. Because of my doing that in all cases, the syntax of a HEAD3 was just EMPTY PARA, or one empty line and a paragraph. When I belatedly remembered the thought-break markup and added it, I forgot to define it as absorbing an optional empty line after it. That was easy to add, a two-line fix. With the other bug, a total of 4 or 5 lines changed. But that mandates repackaging the whole app again, sigh. Although that isn't really so bad, a few hours of work, most of which is spent waiting for files to upload to or download from the dropbox, so I can be doing other things.

That will be Thursday. Before I do it I'll review the issues list, there may be a couple of other easy fixes I should do.

Saturday, August 15, 2015

Translators: Updating HTML, adding PPgen

By PPQT2 to the Woodshed

I used PPQT2 to post-process quite a bulky project, Hawkins Electrical Guide Vol. 3. This is the kind of PP project I've always enjoyed, with varied document structure (not just chapters of paragraphs) and many images. In this case, over 200 images, on which I spent many hours in Photoshop making clear, clean yet very compact .png files.

I had been working on this book ever since PPQT2 was at all usable, a year ago or so. After finishing the ASCII and HTML translators, I could finalize this book using PPQT2. Which I did, and uploaded it, and ran into an eagle-eyed and extremely conscientious PPVer, who kicked it back to me with a list of over 50 issues to correct. Properly taken to the woodshed, I was!

Some of the issues related to the generated HTML, and in the process of correcting them I realized some ways in which the HTML translator could do a better job. Also I had spent some time absorbing the various EPUB advice pages in the DP Wiki, and realized the impact that EPUB has on the post-processor's view of HTML.

EPUB Rant

Parenthetically, DP has a confused relationship to EPUB. Project Gutenberg now routinely does a batch conversion of the submitted HTML book using something called EPUBmaker, and it does a number on one's HTML. In prior years I, like many PPers, have spent lots of time on tweaking the HTML to make the ebook look very much like the printed book. But Epubmaker ruthlessly throws away most of that, leaving a flat, boring, ugly etext.

Double-parenthetically, part of the problem is the many restrictions of the EPUB format itself. It doesn't allow floats—so forget about sidebars, side-notes, and running text around small images. It doesn't support pop-up title=texts when you hover the mouse on an element—so forget about showing the original spelling of a typo, or showing the transliteration of a Greek or Cyrillic word. It imposes ridiculous constraints on images; nothing wider than 600px and no image files larger than 200K. Like other stupidly-designed standards, it takes the historical limitations of the ebook readers of 2005 and codifies them for all time. Do you think a retina iPad can't display an image larger than 600px? Or a Kindle Fire? The EPUB standard is very much like the many state laws that codified the design of auto headlights in the 1950s, based on the then state of the art, the sealed-beam unit. So when European cars started using replaceable halogen bulbs, they could not be imported to the U.S. because their headlights were not sealed-beam units. It took decades to get the laws changed so imported cars didn't have to have inferior U.S. headlight units retrofitted before they could be sold. EPUB does exactly the same thing, locking us into an already-outmoded technology. Close inner parenthesis.

DP's response to EPUB has been scattered and slow. There are several different Wiki pages about it, giving conflicting advice and often referring to forum threads that are years old. But the bottom line is, the PPer today who spends any time on how the HTML looks is wasting her energy. The majority of PG downloads are for EPUB, not HTML, and all your pretty CSS will be stripped out by Epubmaker. Close outer parenthesis!

HTML changes

With all this in mind, I went back to the HTML translator and made changes. I simplified the CSS in the header block a lot, removing many options and comments on appearance. I changed the method of encoding visible page numbers from the Guiguts method to a method that was recommended in one of the EPUB Wiki pages, as possibly able to survive Epubmaker.

Another change was from percentage widths to fixed widths. PPQT lets the user specify margins in ASCII space units, for example /Q First:6 Left:4 Right:4. These translate nicely in the ASCII output. But for HTML, I had been converting them to percentages of a 75-character line, so that Right:4 became margin-right:5%. But percent widths are relative to the container, so 5% is less in a nested container than at the outer level.

There was already a historical conversion of 2 ASCII spaces == 1 HTML "em" unit; this had been in use for poetry line indents for years in the Guiguts HTML conversion, and my HTML Translator did the same thing for poetry. Well, why not for all widths? So I changed it to use em units for everything, and Right:4 becomes margin-right:2em; which is the same regardless of context.

Ppgen

The afternoon after I posted the updated HTML Translator I was congratulating myself on the PPQT2 design that makes the Translators into separate files, and how easy it was to update just that file without having to repackage the whole app. And then about how nobody has expressed any interest in doing any other Translator. And how there really ought to be a Ppgen one.

I've had a Chrome window open for months, with about six tabs open pointing to different Ppgen docs. (Which, parenthetically, are badly organized and incomplete.) Well, crap, I said to myself, let's see how hard it would be. I pulled up a copy of my skeleton Translator file and started filling in the 30-odd entries in the "computed go-to" list of API "events". And it went very well. A majority of events are either null, or can be handled by a single literal string without any functional logic. For example, the OPEN_H2 event just squirts out .h2.

By end of the day I had almost all of it coded, lacking only the table-related events, and I pretty well see how to implement them.

So early next week I reckon I will be able to announce a trial Ppgen Translator. I'll have to hedge the announcement with many caveats, mostly because I do not have the actual Ppgen batch tools installed so I can't actually test that my translation produces usable output. But if people don't like it, they can fix it. It's just a small Python source file; be my guest.

And when that's finalized, people will be able use PPQT2 to complete Ppgen-based projects. Which might increase adoption.

Saturday, August 1, 2015

What to do, where to go next?

I've used this blog with the very clever name (well, perhaps not so clever these days, since nobody uses paper manuals any more, and if you've never seen a software manual on printed paper, you might not get the reference) -- used it, I say, to document whatever enthusiasm is monopolizing my attention. Before 2010, I used it as a place to store occasional essays on whatever was bubbling around in my brain. (Here's a really well-written piece, if I do say so myself, from 2009, on The Too-Small God. Here are actual numbers on the economics of a plug-in hybrid car.) For several months in 2010 I used it to record the process of rebuilding my recumbent bike. For the last two years I've used it as a diary as I developed PPQT2.

Well, PPQT2 is pretty well done now. There are several issues still on the github page, some of which would require significant days of effort to close. But I don't feel any urgency to do that work. The existing app is adequate for my personal needs. The "user community" aside from me can be numbered on one hand, I think; and they are very quiet.

So: whence the blog? Probably it will be very quiet for a while. If you have been following it for the PyQt5 stuff, thank you for reading along! I hope you got something useful from it. I don't expect to be doing much with PyQt now, but if I do, I'll post about it. So I suppose you should keep it in your RSS reader. Just move it down to the bottom of the list, next to those other blogs that you used to follow but which have gone quiet of late.

(I have several like that in my RSS reader. You know, that could make an interesting blog post...)