Friday, April 11, 2014

Why Can't Huns Spell?

Lordy but I hate the kind of work described below. It's really stressful. (I know, kvetch, kvetch, kvetch.)

I'm to the point where I want to test the ability to mark words that fail spellcheck. To do that, I need the ability to check spelling, duh!

PPQT version 1 went through stages of spell-checking, each representing many hours of effort. First, trying to send words over a pipe to Aspell running as a subprocess. Then I wrote my own all-Python spell-checker to use the Myspell/OpenOffice dictionary format. That was a useful learning exercise. I learned:

  • All about the format and content of the .dic/.aff dictionary files.
  • That German is a damned hard language to spell-check.
  • That there are a lot of subtleties to the spell-check algorithms.

In the end German defeated my code. I just couldn't get it to handle multiple affixes properly. In the nick of time I found this Python wrapper for Hunspell. Like a lot of FOSS, it was created by someone who needed it a few years ago, and that person has apparently moved on and left it dangling unmaintained. But it can be made to work, with effort—for Python 2.x. And it was suuu-weet once I got it going, blazing fast and reliable. I made it work for Mac OS and for Linux, but blew many hours failing to make it work for my Windows distribution. Eventually I went on ELance and paid a dude $150 to make the Hunspell wrapper work on Windows. Money well spent.

But PPQT2 is built on Python 3.3 (well, probably 3.6 by the time it's done) and the Hunspell wrapper doesn't work for that. However, another user posted a diff file that, he claimed, made it work with Python 3. So I spent some hours today getting it compiled and installed.

It should be a one-liner, python setup.py install, but of course that don't work because there are things in the setup.py script that assume Linux, and a prior release of Hunspell. So you tweak that a while. Reviewing my notes from last fall, you run python setup.py build and it fails, then you manually run a compile command that works to actually create the module, then python setup.py install will install it. The compile command that worked for 2.7 (contributed to that wiki by another user, bless her heart) was:

gcc -fno-strict-aliasing -fno-common -g -fwrapv\
 -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64\
 -pipe -D_LINUX -I/usr/include/hunspell\
 -I/usr/include/python2.7 -lpython -lhunspell-1.2\
 -shared hunspell.c -o build/hunspell.so

But that doesn't work after the diff was applied for Python 3. It coughed up an unresolved symbol _PyModule_Create2 for no apparent reason. So, what's a search engine for if not to find obscure error messages? And Da Google turned up many people with this problem dating back to 2010. A stackoverflow response, although not directly responsive, pointed to lack of inclusion of the python3.3 library, and that was it. Here's the command that actually compiles and links hunspell for Python 3:

P=/Library/Frameworks/Python.framework/Versions/Current
gcc -fno-strict-aliasing -fno-common -g -fwrapv\
 -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64\
 -pipe -D_LINUX -I/usr/local/include/hunspell\
 -I$P/include/python3.3m\
 -L$P/lib -lpython3.3 -lhunspell-1.3 -shared\
 hunspell.c -o build/hunspell.so

So I now have a working Hunspell that I can start playing with, on Mac OS at least. Such a relief! OK, Ghu alone knows what it will take to make it work on other platforms, but that's months away. For now, my Huns can spell.

No comments: