This is the first of a few posts about PPQT2, the hobby software project that's been the focus of the blog for two years. Then there will be a post announcing a turn to something completely different.
This week I had to deal with a reported bug in PPQT -- note, the first one in six months or more, which is more a reflection of its very low usage than on its code quality -- and found a couple other things that needed work.
Dictionary bug
The bug was present on all platforms, but was only visible on Windows. Here's the problem. The user opens a file that contains some non-Latin-1 characters, maybe some words in Greek. The default dictionary is set to en_US and the Greek words are not tagged for an alternate dict. So when the user refreshes the Words panel, every Greek word gets presented to a Hunspell object initialized with the en_US dict.
The encoding of any dict is specified in its name.aff file. The en_US dict is encoded ISO8859-1, and Hunspell expects any word checked against it to be encoded the same. Obviously a Greek word will have non-Latin characters, and Hunspell will correctly suffer an encoding error.
Not a problem! I was ready for this:
try : return dict_to_use.spell(word) except UnicodeError as UE : dictionaries_logger.error("error encoding spelling word {}".format(word)) return False
However, in Windows the call to the logger .error() itself raised an encoding error! Because the logged string contained the offending Greek word, and the log file had been opened using the default encoding which in Windows, was some wonky code page that couldn't encode Greek.
So there was an exception raised in the Except block, and the return False to indicate misspelling was never executed. But that did not cause a problem! Because the caller used the return value in an if statement, if check(word):. After Python displayed the traceback in the console window, it resumed execution in the calling function with the conventional default return value of None. None is just as False as False is, so everything worked, except for a ton of error messages in the console window.
The fix was to go back to the top-level module where I set up the logging handler for the whole program, and add encoding='UTF-8' to that call. A one-line fix and the error messages disappeared.
No comments:
Post a Comment