Tuesday, December 23, 2014

It's like live-blogging, almost

Ok. I have figured out one issue. I looked at my own code that creates a dictionary and noticed that it presents the two path arguments to hunspell.Hunspell() with .dic first, .aff second. And that is documented in the hunspell package doc. Making that change, the hunspell package works. It correctly notes the Greek dictionary is UTF-8 and spells a word.

Hgr = hunspell.HunSpell(pdic,paff)
Hgr.get_dic_encoding()
'UTF-8'
Hgr.spell('α')
True

And if I present it with my en_US dictionary saved in UTF-8, it opens it correctly also. This is rather bad, in that anyone looking at the Hunspell doc at the Hunspell sourceforge page will see "Hunspell(const char *affpath, const char *dpath);" which is exactly the reverse of the hunspell package. If you present the files in the reverse order (the correct order per the man page), the Hunspell object is created, no error is reported, but it can't check spelling, calls any input misspelled.

What about my ctypes invocation? Well, that definitely uses the C-defined function which should take the .aff first, the .dic second. Checking the pyhunspell code it definitely passes the aff-path first, dic-path second, which is what my ctypes invocation is doing.

I do note a comment in the pyhunspell code, "Some versions of Hunspell_create() will succeed even if there are no dictionary files." So that's probably what's happening: for some reason it is not opening the path strings I am passing, and it silently fails and defaults to a rather useless, and undetectable, no-dictionary condition.

The likeliest cause of that is it is not getting the path strings in a form it expects. Maybe it can't handle c_wchar_p after all. Before I experiment with that, I am going to add a call to destroy the Hunspell object. Unlike a PyQt object, it isn't known to Python. I may be memory-leaking a Hunspell object every time I run my test code.

No comments: