Tuesday, December 23, 2014

Exploring ctypes

Implementing spellcheck has been a constant problem for me. I solved it, awkwardly, using the pyhunspell package (see below for another link). This provides a Python interface to the Hunspell checker.

There is nothing at all wrong with Hunspell itself. It is complete, fast, and still supported. It is superior to Aspell and Myspell in several ways. Most importantly it supports UTF, and so can be used to spellcheck German, Greek, and the like.

My issues are, or were, with the pyhunspell package. It went unsupported for a long time. It didn't support Python3 until a user posted the necessary small changes as a comment on an issue. And getting it compiled and working on Windows was, for me, a huge problem. So I wanted to experiment to see if I could access the hunspell library direct from Python using ctypes, eliminating the need to compile a wrapper. Important note: I just discovered that pyhunspell was very recently picked up by a new owner, BenoƮt Latinier, and rehosted on github: here is its new home. Another user has posted a binary package for Windows on the old site; unfortunately it's for Python 2.7. So things are looking up for pyhunspell. Which is a good thing, because as I will now finally get around to saying, the ctypes experiments are not going super-well.

We start with getting access to the library.

# Find the library -- I know it is in /usr/local/lib but let's use
# the platform-independent way.
import ctypes.util as CU
libpath = CU.find_library( 'hunspell-1.3.0' )
# Get an object that represents the library
import ctypes as C
hunlib = C.CDLL( libpath )

To do spell-checking, one must create a Hunspell object. The C header declares:

typedef struct Hunhandle Hunhandle;
LIBHUNSPELL_DLL_EXPORTED Hunhandle *Hunspell_create(const char * affpath, const char * dpath);

Converting that to ctypes, we have:

hunlib.Hunspell_create.argtypes = [C.c_wchar_p, C.c_wchar_p]
hunlib.Hunspell_create.restype = C.c_void_p

OK, let's call it!

dpath = '/blah/blah...'
daff = os.path.join(dpath, 'en_US.aff')
ddic = os.path.join(dpath, 'en_US.dic')
hun_handle = hunlib.Hunspell_create( daff, ddic )

Well, nothing crashed. At this point we should be able to use methods of the Hunspell object. Back to the C header file:

LIBHUNSPELL_DLL_EXPORTED char *Hunspell_get_dic_encoding(Hunhandle *pHunspell);

In Python/ctypes:

hunlib.Hunspell_get_dic_encoding.argtypes = [C.c_voidp]
hunlib.Hunspell_get_dic_encoding.restype = C.c_char_p
print(hunlib.Hunspell_get_dic_encoding( hun_handle ))

And whoop-de-doo, it prints b'ISO8859-1'. To review: we have successfully loaded the library, created a Hunspell object, and invoked one of its methods. During creation, the object correctly loaded the dictionary that was passed. Ergo, we can pass a Python3 string into a const char * argument. This is looking great!

And this post is getting long. I will continue with actual spell-checking next time. Spoiler alert! It doesn't go well!

No comments: