Thursday, July 9, 2015

So back to work...

I released the mostly-baked PPQT2 to a reception that was friendly although very muted. In particular, nobody indicated any interest whatever in writing a Translator.

Then I spent a couple of days completing the basic work on a large and complex post-processing book. That is, I did all the steps that in my own "Suggested Workflow" document should precede translating the book to some other markup like HTML.

In the course of that, I found some issues with the sequence of events in the Suggested Workflow, so revised that. I also found a few minor usability issues with the app and added them to the Issues list on Github.

Then it was time to try translating a real, and large, document to HTML, complete with many Illustrations, a few Footnotes, and many Block Quotes and Unsigned Lists and a few Tables. So, all the stuff that a Translator should recognize.

The first step of Translating is parsing the document, and this threw up many errors. Some were legitimate; others should not happen, but do happen because the automated document syntax parser needs to be tweaked. Several more Issues went onto the stack. After I either fixed of circumvented those, the HTML translator actually got called, and it revealed two problems.

The first was a puzzling crash while processing a footnote. It turned out that I had mis-coded a Footnote in the document. This error was not being caught by the document structure parse, with the result that bad data was being passed to the translator. I had to tighten up a regex in the parser so it would not recognize an ill-formed footnote. It would just become a line of text.

The next problem was that the alt= and title= properties of most of the images were broken. The cause turned out to be obvious. Whatever text follows the [Illustration: markup, presumably the first line of the caption, is passed to the Translator along with the Open Figure event code. The point was to let the HTML translator use that first caption text as the alt=/title= string.

Unfortunately for most of the figures in this book, the opening of the caption looks like

[Illustration: <id="Fig_563"><sc>Fig</sc>. 563. Some bumble rumble thing

The <id="Fig_563"> is my optional markup for a link target; it is taken from the Ppgen markup. However, its presence results in building an img statement with:

<img src="images/f563.png" alt="<id="Fig_563"><sc>Fig</sc>. 563. some..."

I fixed this by adding a new utility function to the xlate_utils module: flatten_line(text) returns a text string with everything stripped out of except words and spaces. Then I had the html translator pass its Open Figure preview text through that, so that it would write HTML like this:

<img src="images/f563.png" alt="Fig 563 some bumble rumble thing"

Yes, flatten_line() even strips periods and other punctuation out. That's intentional, because after all, quote characters are punctuation too.

With these changes, the HTML translator is working quite nicely. It certainly produces an HTML book that is ready to be edited in html, have its CSS tweaked and so forth.

What next? Several things. First of all, there are 16 open Issues on the github site. Over the next few days I plan to fix at least ten of them. Then, I am going to write the ASCII translator I promised myself. When I have that, I will be able to complete post-processing of the book I'm working on. When that Translator works, I will put up an updated release of PPQT2 and make another plea for Translators, specifically for Ppgen and Fpgen ones. They are needed, and I am really not the right person to write one, as I lack the kind of deep knowledge of those markups that would make it easy. If necessary, I might write a very rudimentary one of each so I can tell the maintainers of those markups, there, now finish it please.

After that—which will happen by 30 July—I will dust off my hands and walk away from PPQT, returning only to fix serious bugs.

No comments: