Tuesday, December 23, 2014

Continuing to live-ish blog the hunspell hunt

For reference, here is where the experimental code is now.

import os
# set up path strings to a dictionary
dpath = '/Users/dcortes1/Desktop/scratch'
daff = os.path.join(dpath, 'en_US.aff')
ddic = os.path.join(dpath, 'en_US.dic')
print( os.access(daff,os.R_OK), os.access(ddic,os.R_OK) )
# Find the library -- I know it is in /usr/local/lib but let's use
# the platform-independent way.
import ctypes.util as CU
libpath = CU.find_library( 'hunspell-1.3.0' )
# Get an object that represents the library
import ctypes as C
hunlib = C.CDLL( libpath )
# Define the API to ctypes
hunlib.Hunspell_create.argtypes = [C.c_wchar_p, C.c_wchar_p]
hunlib.Hunspell_create.restype = C.c_void_p
hunlib.Hunspell_destroy.argtypes = [ C.c_void_p ]
hunlib.Hunspell_get_dic_encoding.argtypes = [C.c_voidp]
hunlib.Hunspell_get_dic_encoding.restype = C.c_char_p
hunlib.Hunspell_spell.argtypes = [C.c_void_p, C.c_char_p]
hunlib.Hunspell_spell.restype = C.c_uint
# Make the Hunspell object
hun_handle = hunlib.Hunspell_create( daff, ddic )
# Check encoding
print(hunlib.Hunspell_get_dic_encoding( hun_handle ))
# Check spelling
for s in [ 'a', 'the', 'asdfasdf' ] :
    b = bytes(s,'UTF-8','ignore')
    t = hunlib.Hunspell_spell( hun_handle, b )
    print(t, s)
# GCOLL the object
hunlib.Hunspell_destroy( hun_handle )

Let's see if changing the create argtypes makes a difference.

Bingo! Made the following changes. One, change the argtypes of create():

hunlib.Hunspell_create.argtypes = [C.c_char_p, C.c_char_p]

That caused a ctypes error on the call _create(daff,ddic), because a Python3 string is not compatible with c_char_p. So encode the strings:

baff = bytes(daff,'UTF-8','ignore')
bdic = bytes(ddic,'UTF-8','ignore')
hun_handle = hunlib.Hunspell_create( baff, bdic )

Et voila, the output is

b'UTF-8'
1 a
1 the
0 asdfasdf

Most excellent! I have achieved my goal of invoking Hunspell for spell-checking without use of the pyhunspell package. I am not sure if I want to change my existing dictionaries.py to do this in place of relying on the package. For sure, if I have even the slightest trouble installing the package on Windows, I will be quick to fall back on this.

No comments: