Monday, September 22, 2014

New Timeline guessing and Semi-Hiatus

Back in August I made some guesses about what parts I would do in what sequence, and how long those would take. I think I'll review my timeline and see how it's going. Here's what I said I'd do:

  • ✓ Code review of several modules
  • ✓ Revise editview to get rid of QT Creator glop code
  • ✓ Find panel
  • ✓ Chardata and charview panel
  • ✓ Wordview, including good-words drag'n'drop UI

So while that all took about 3 weeks longer than I planned, it is done. Yay me! Remaining, per the August plan, are

  • pageview, the Page table panel
  • fnotedata and fnoteview, the Footnotes panel
  • loupeview, integration with bookloupe
  • translate, code to parse a DP-formatted book and use it to drive the extremely clever API that I have devised for writing format-translation modules.

I believe I will rearrange the above sequence and do loupeview next. Footnote and pagination are fairly sophisticated proofing tools, while applying bookloupe or some such nit-picking tool is needed when post-processing even simple books. With Find, Word, Char and Loupe you have a toolkit adequate for many simple books.

I remarked in August that regarding bookloupe "there are many unknowns about how to integrate this hunk of somebody else's C code into Python and to display its output in a useful way." That's still very true, however there are at least two tools that might make integration of "somebody else's C" easier. One is SWIG and the other is Cython. There is also the easy way, which is really not easy at all: save the current file to a temp, and use subprocess to run the bookloupe as a command-line utility and capture its stdout stream. This is fraught with host dependency issues like, where to write a temp file?

Anyway, all of that will be a bit delayed as I and spouse are about to head out on a 2-week run in the RV. I may get some coding, and even some blogging, done on this trip, but it will be spotty.

Another interfering factor: in late October (barely a month now), Qt 5.4 will be released, and it contains the new QWebEngine browser. I am looking forward to this because of my other project, the web-comic browser Cobro. I read web comics every day in it, and about 1 day in 3, it crashes somewhere in QWebKit. I don't mind; I just restart it and carry on. But I can't make it available to anyone else when it is so unstable. I have high hopes that by upgrading it to Qt 5.4 (and presumably, PyQt5.4 and an upgraded SIP by then), and dumping QWebKit for QWebEngine, I will get a faster and more stable program.

If so, then CoBro will also be the perfect test bed for me to learn how to use pyqtdeploy, and with it make stand-alone CoBro apps for three platforms. Those two challenges are likely to eat up at least a week, probably more.

But with that done, especially if I can master pyqtdeploy, I will be in position to release a stand-alone, multi-platform alpha of ppqt2 sometime quite early in 2015.

Saturday, September 20, 2014

How often does QTableView update cells?

The answer to the title question is, more often than you thought, probably. Here's how I found out.

So a PGDP thing is the "good-words" file. During the multi-stage proofing process, the proofers can nominate words that fail the online spellcheck for the good-words list, meaning they are correct. Proper nouns, technical terms, archaisms, etc. It's a familiar concept in spell-checking, sometimes called a local dictionary.

A book downloaded by a post-processor usually has a good-words.txt file. When PPQT opens a book for the first time (there's no metadata file to be found), it looks for the good-words file, reads it, and makes sure that every word in it gets a free pass around the spellchecker. The good-words list gets written into the metadata file for later.

A feature of the words (vocabulary) panel is that you can select a word or words and have them added to the good-words list. In V1, this was done via a context menu command. You selected a word or words and right-clicked, and selected "Add to good words" from the popup menu. Then you had to confirm you wanted to do this to an OK/Cancel dialog, because there was no way to undo this step.

It was somewhat tedious. Clearing out all the misspellings is a major task during post-processing. There are often a hundred or more, very few of them actual misspellings. Verifying that they are not, and adding them to good-words, can take a while. So for V2 I was determined there would be a simpler drag-and-drop interface for this. I am showing the actual good-words list as a one-column table alongside the vocabulary table, and the user can just drag a word (or words, complex selections are allowed) into good-words.

When this is done, the "X" in the vocabulary window should immediately disappear, indicate removal of the misspelled status. Here's a short video of how it looks currently.

The play-by-play is: the word LOWENHEIM is marked as misspelled (the X in the Features column). The user drags it to the Good Words list. Immediately the X should go away, but it doesn't. Why not? Because I have not added any code to tell the Words Table View it needs to update that word. I meant to, but hadn't got around to it. However, the instant the words table is scrolled—you can tell there's a scroll happening because OS X displays the scrollbar temporarily—the X does disappear. Why?

Then, the user clicks on LOWENHEIM in the good words list and hits the Delete key. It is deleted from the list. Immediately the X should reappear in the Words table, but again, I haven't yet added code to signal a change from the list widget to the table widget. But again, upon the tiniest scroll movement, the X does reappear. Is it magic?

I conclude that whenever there is even a tiny bit of scrolling, the QTableView calls the data() method of the table model for fresh data. It wouldn't know to do that for the LOWENHEIM row in particular, so it must be doing it for every visible cell! The call to data() fetches the latest info about the word, including its new status as correctly, or incorrectly, spelled.

I had not anticipated this. I had supposed that the X's would not change until the user clicked the Refresh button, causing a table model reset. This would not be a nice UI, so I had expected I would need to add a user-defined signal, "wordChanged" or such, to the Good Words list view widget, and catch that signal in the Words View widget. There it would have to look up the word, get its physical index, get the sort/filter proxy to translate that to the sorted row index, and then it could issue the dataChanged signal to get the table View to call the table Model for fresh data.

Now, I dunno. Do I need to add that machinery? If the user can get the current status just by scrolling even a tiny bit (or by hiding the app and revealing it; that does it also), should I bother?

Hunspell dicts and encodings

I've almost completed testing the word view panel, which displays the vocabulary of words (and word-like tokens) in the document, with their counts and some "properties" such as, are the spelled correctly? The table can be filtered various ways, and in particular the user can opt to show only the misspelled words.

So my test document had some French phrases, which I'd marked with <span lang='fr_FR'>, to request spell-check using the fr_FR dictionary. And this was working beautifully; words from phrases like je suis jeune fille showed up in the vocabulary list as properly spelled.

Except for words with accents: était, majesté and so on were shown as misspelled. Why?

Well, the whole thing reeks of character encoding issues, dunnit? Somewhere in the interface between a call in Python 3 and the C++ wrapper around Hunspell, there has to be an encoding step to get from Python's however-many-bit Unicode (16? 32? variable?) character string, and a C++ char *.

I experimented with encoding the word that I passed, but that only caused more problems. The hunspell call wanted a string, and word.encode(encoding='ISO-8859-1',errors='replace') produces a bytes object. So an immediate Type Error happened.

Then I looked at the hunspell wrapper code, and it uses PyArg_ParseTuple() to receive the word-string from Python. And per its doc (at the link if you care) it says "Unicode objects are converted to C strings using 'utf-8' encoding..."

So my Unicode word était is being properly passed into Hunspell as a UTF-8 string, without effort on my part. Hmmm.

Oh.

I remembered (from the month or so I spent buried in spellcheck technology in 2012, struggling to get spellcheck working in version 1) that the .aff file of a dictionary includes an encoding, specifying the encoding of the matching .dic file. I checked, and in the fr_FR.aff I had picked up (sometime or other, from OpenOffice.org, I think) had this as its opening line: SET ISO8859-15.

Now, if I was writing a spellchecker these days, I imagine I would use that to decode the file but store the decoded words in full Unicode or UTF-8. But just maybe Hunspell wasn't that smart. So I opened the two files in BBEdit (which has a convenient UI for changing the file encoding), changed that line to SET UTF-8 and saved both files in UTF-8.

Problem gone; now all French words from the test doc checked as correct, even those with accents.

So Hunspell was storing the dictionary words as Latin-1 strings, then comparing them to UTF-8 strings, and not surprisingly, getting mismatches. Making the dictionary file encoding match the Python wrapper interface fixed the problem.

Not quite! I can distribute some dictionaries with the program (which I also did with V1) but the user can get more or other dicts from anywhere. As long as they are Myspell/Hunspell compatible, they should work. Except, if they are not encoded UTF-8, they won't. I foresee problems here.

Saturday, September 13, 2014

Qt Unclear on the Concept

Qt makes a big deal of using the Model-View architecture for its tables. One creates a table model by customizing QAbstractTableModel. Then one makes the table visible by creating a QTableView and linking it to the customized model.

All well and good, but unfortunately their design leaks view considerations into the model, as I only realized while finishing the character panel. It occurred to me that I had implemented a little character database in chardata.py (as described in the preceding post); so why had I based that class on QObject instead of QAbstractTableModel? I actually started to change this, and then stopped.

You customize QAbstractTableModel by adding overriding definitions of these methods:

  • rowCount() to return the number of unique rows.
  • columnCount() to return the number of columns.
  • data(index, role) to return both data and metadata for one cell.
  • headerData(index, role) to return both data and metadata for one header cell.

There's no debate about rowCount(); it is certainly the job of the data model to know how many primary keys there are to show. The trouble starts with columnCount(). The number of columns is a matter of how the data are to be presented to the user. As it says in Wikipedia, "the model captures the application's behavior in terms of its problem domain, independent of the user interface." The data model can know how many items are in the tuple related to one key, but how many of those are to be shown in this table, and in what sequence? That's the view's domain.

Things get worse with the data(index, role) method. There's no issue when the "role" passed is Qt.DisplayRole; then the return is one datum from the row. (Although one might quibble that the same datum could be displayed different ways; and this design forces the model to decide how to format each datum.) The issue is that the "role" code passed to data() can also be Qt.ToolTipRole, Qt.StatusTipRole, Qt.TextAlignmentRole, Qt.ForegroundRole and several other "roles" all related strictly to the display of the data.

It is (in my humble opinion) no business of the data model to know whether a given datum should be shown in red or black, left- or right-aligned, or what its tooltip should say.

The real breakdown of MVC is in headerData(index, role). The name at the top of a table column has nothing to do with the data model. I am especially sensitive to this because I am trying to make sure that all user-visible strings pass through QCoreApplication.translate(), so there is some hope of a properly-localized UI. Column header titles like "Symbol", "Count", and "Value" need to be translated. Same for tooltip and statustip strings! But (again, in my so-humble opinion), nothing the data model knows about should ever need translation. Translation should only ever be needed by the user-facing View component.

Tl;Dr: The Qt Model-View architecture forces the table model to perform many view-related things: deciding how to display each datum, providing column header and tooltip texts, and knowing presentation attributes such as color and alignment. That's just wrong.

Not that anything can or should be done about it at this point. I implemented the table model in the charview module.

Thursday, September 11, 2014

Little performance pick-up

The character panel (that I start work on tomorrow) will feature a button named "Refresh" meaning, bring the census of characters in the book up to date. This is implemented in the chardata module I worked on today. Initially I coded refresh() in the simplest way:

        editm = self.my_book.get_edit_model()
        c = self.census # save a few lookups
        self.k_view = None
        self.v_view = None
        self.census.clear()
        for line in editm.all_lines() :
            for char in line :
                n = self.census.setdefault(char,0)
                self.census[char] = n+1
        # Recreate the views used for fast access
        self.k_view = self.census.keys()
        self.v_view = self.census.values()

Get rid of the key- and value-views just in case sorteddict wants to try to update them as keys are added. Clear the sorteddict. Brute-force count all the characters. (editm.all_lines() is an iterator returning the lines of text in the document in order from first to last, as Python strings.) Recreate the views.

When the document managed by the edit model is about 25K characters, calling timeit on this method for four iterations took 0.75 seconds.

When the user opens a book for the first time, there is no metadata, and the character census sits empty until the user clicks Refresh. Then the above logic runs, loading the sorteddict. On a save, the list of characters and counts is written to the meta file, and reloaded when the book is opened again. The user clicks Refresh only after editing, to get an updated list of characters. Thus, almost every time Refresh is clicked, a dictionary exists that is almost complete. Possibly the user has added or eliminated a few characters (converted some non-Latin-1 characters to entity notation, for example); and the counts will be different. But the dictionary exists.

So it occurred to me to wonder whether this might not benefit from a trick I used in the word data Refresh method. If the dictionary exists, i.e. this is not the first time the document has been opened and a character census has previously been taken, don't throw the dictionary away. Go through it and zero all the counts; then take the census; then go through and delete any entries with a zero count. Applying this results in the much more complex method here:

        editm = self.my_book.get_edit_model()
        c = self.census # save a few lookups
        if len(c) : # something in the dict now
            for char in self.k_view:
                c[char] = 0
            for line in editm.all_lines() :
                for char in line :
                    n = self.census.setdefault(char,0)
                    c[char] = n+1
            mtc = [char for char in self.k_view if c[char] == 0 ]
            for char in mtc :
                del c[char]
        else : # empty dict; k_view and v_view are None
            for line in editm.all_lines() :
                for char in line :
                    n = self.census.setdefault(char,0)
                    self.census[char] = n+1
            # Restore the views for fast access
            self.k_view = self.census.keys()
            self.v_view = self.census.values()

Four iterations on the 25K book: 0.21 seconds. Keeping the dictionary and its views intact rather than recreating them saved considerable time. The Refresh operation should take only a barely perceptible delay even in a large book.

Chardata Done, But TIL...

Today I started and finished the chardata module, the data store for the character table. This went very quickly because I had actually started to do this function as part of the worddata module that was the second module I completed. I quickly realized then that char data deserved its own module, and I yanked out the partly-written chunks of it to its own module. So today I had to tidy that up and finish it, and of course in the process, rewrote a good bit. Then I wrote a unit-test driver to force it through all its error conditions, and in the course of that, I found out something I never knew about Python!

One key function of the char data module is get_tuple(j), which returns (to the character table code still to be written) a tuple, (character, count) for the j'th item in the sorted sequence of characters. That lets the table view build itself by just calling in a loop from 0 to the size of the table, and getting characters in sorted order.

The magic behind that is a sorteddict from the blist module. I put all the characters as keys into a sorteddict, with their counts as values. Then I get a KeyView and a ValueView object on the dict, and I can index them by an integer for O(1) access to the j'th key or value in sorted order. Slick.

    def get_tuple(self,j):
        try :
            return (self.k_view[j], self.v_view[j])
        except :
            cd_logger.error('Invalid chardata index {0}'.format(j))
            return ('?',0)

So I'm thinking like a QA person writing my unit test driver, and I have coded tests of the normal function of get_tuple(), and want to exercise all the things that can make it get into the except clause. So I code this:

assert ('?', 0) == cd.get_tuple(3) # there are only 3 chars in the database
check_log(etxt,logging.ERROR) # check for the error message in the log
assert ('?', 0) == cd.get_tuple('x')
check_log(etxt,logging.ERROR)
assert ('?', 0) == cd.get_tuple(-1)
check_log(etxt,logging.ERROR)

And the test fails, assertion error on assert ('?', 0) == cd.get_tuple(-1). What? So a little experimenting shows that I can index the dictionary key and value views with -1 and -2, returning the third and second characters of the three in it.

So I innocently file an issue on blist, asking "is this expected behavior?". The reply comes in an hour. Daniel Stutzbach, owner of blist, writes "Yes. This is how all sequences in Python work, such as list() and tuple()."

Oops. The blist doc (linked above) clearly says that key and value views support all Python sequence operations, and the 3.3 docs say of indexing a sequence, "If i or j is negative, the index is relative to the end of the string: len(s) + i or len(s) + j is substituted."

I had no idea! I understand and use negative indices in slices, but had always supposed that was special to slicing. I have never had occasion to use a negative index on a list, and cannot imagine a case where I would want to do so. But there it is.

Monday, September 8, 2014

Another helping of Lambda Stew

Back a few days I posted about methods of connecting the clicked() signal of a button to a slot, when there are a lot of otherwise-anonymous buttons all handled from the same slot.

I am never shy to admit being wrong, and there was something wrong with that code, although it did work. I will now admit to being wrong, wrong, and wrong.

Let's make the problem more general. Say you are using PyQt to build a software simulation of an old-time jukebox.

Part of your UI is a set of 26 pushbuttons, visible in the image as numbers 1-26, as well as four others, A-B-C-D. When any of the latter four are clicked, all that happens (besides a juicy ka-chunk sound effect) is storing its letter. When any of the 26 numbered buttons are pressed, the whole simulation is set in motion to load and play selection B-13 or whatever.

So you have 30 QPushbuttons in two lists, and they can be handled by just two slots, one for the four letter buttons and one for the 26 numbered ones. It would be stupid to write 30 slot methods. You want to connect the clicked() signal of each button to one of two slot methods, and somehow the slot has to know which button was clicked.

Wrong method #1

This is what I did in PPQT V1. Although it is tricky and it worked, it is quite wrong and nobody should use it.

    for j in range(number_of_buttons):
        self.connect(self.button_list[i], SIGNAL("clicked()"),
                                lambda b=i : self.button_click(b) )

Note this is using the "old" signal API. It says, connect the variant of clicked that has no parameter to the anonymous function lambda b=i : self.button_click(b). That creates for each button an anonymous function with the signature def anon(b=<a button number>). Python's rule for default arguments is that they are evaluated when the function is compiled; hence, the button number was baked into the argument list as a default value. Because the clicked() signal passed no argument, the default was used, and passed to the slot as its parameter.

This broke in PyQt5 because, using the new signal API, I didn't know how to ask for the no-parameter variant of the signal. The default version passes a boolean value, which overrode the clever default button-number value. But then I found out how to do it, resulting in...

Wrong method #2

This seemed to work when testing on my laptop, but it's wrong, don't use it.

        for j in range(number_of_buttons):
            self.button_list[j].clicked[()].connect(
                lambda b=j: self.button_click(b)
                )
 

When I tested it on my desktop, it failed with a message about no such overloaded signal. The difference? Laptop had PyQt5.2, the desktop had PyQt5.3. Between the two point releases, Phil had deliberately removed support the no-argument overload of clicked(). I got around that with...

Wrong method #3

Even trickier, and still wrong:

        for j in range(number_of_buttons):
            self.button_list[j].clicked.connect(
                lambda f, b=j: self.user_button_click(b)
                )

This connects each button to an anonymous function that has the signature def anon(f, b=j), where f receives the unneeded boolean value and ignores it. No second value is passed, so the default-argument button number can be passed.

All this was just so the slot could know which button was calling it. But there is already a way to know that! Back two-plus years ago when writing version 1, apparently I didn't know that, or thought it was not kosher to use it, or something. And now working on V2, I was just trying to make the old code work instead of rethinking it.

Correct method

The right way is this way:

        for j in range(number_of_buttons):
            self.user_buttons[j].clicked.connect( self.button_click )

Just connect the damn signal. Then, in the slot method,

        button = self.sender() # object generating the signal
        button_number = self.button_list.index(button)

The QObject.sender() method, in a slot, returns a reference to the object that generated the signal. That's the button. If it's important to know which button, Python will happily tell you its index in your list of buttons.

So slow to get smart, I am.

Friday, September 5, 2014

Global replace

Today I put in the last missing bit of function in the Find panel, global replace. The user has found something, set the checkbox that is labelled "All!", and clicked the Replace button beside one of three replace input fields. Just as in V1, the code finds all occurrences of the current search string in the search range, and asks the user, "OK to replace 41 occurrences of Blarg with Bluh?"

I have always thought well of this feature. I don't know any other editor that does it. It gives you a quick feel for whether this global replace is indeed what you intended. You know right away if the count is unexpectedly large or small. And it comes before anything has been changed. The closest feature to it that I know is BBEdit, which after a global replace tells you "OK I have replaced 437 occurrences of Original". And if the number looks odd, you can control-z the operation to back out. But I'd rather be told beforehand.

Anyway, with the list of all matches in hand, the code then marches through them replacing each with the replace string. This is surrounded with calls to QTextCursor beginEditMacro() and endEditMacro() so it's all one undo.

I had been going to split the processing between regex and non-regex, as in V1, but then I found the lovely finditer() method of regex, which returns all matches in a range of text in one operation. So instead, I convert a non-regex find and replace string to regex format, as follows. For the find string, apply this regex:

    RE_MAGIC_CHARS = regex.compile('([\[\]\(\)\*\.\?\+])')

That matches to any single character that is magic to an RE. For the replace string, the only thing that is potentially magic is the backslash. So here's the code, with several error-checks edited out for simplicity.

        r_pattern = self.replace_fields[button].text()
        f_pattern = self.find_field.text()
        if self.sw_regex.isChecked() :
            rex = self.find_field.regex
        else : # not a regex pattern, make it one.
            r_pattern = r_pattern.replace('\\','\\\\')
            f_pattern = RE_MAGIC_CHARS.sub('\\\\\\1',f_pattern)
            rex = regex.compile(f_pattern)
        range_tc = self.editv.get_find_range() # cursor over find range
        full_text = self.editm.full_text() # entire document as Python string
        # In one statement get a match for every hit in the range.
        mlist = [ m for m in rex.finditer(full_text,range_tc.selectionStart(),range_tc.selectionEnd())]

Isn't that slick? A one-line list comprehension to match potentially hundreds of hits, or just a few in a small range. Anytime I get to use a list comprehension I feel like a real pythonista.

What? That expression RE_MAGIC_CHARS.sub('\\\\\\1',f_pattern)? Yeah, there are six consecutive backslashes in that, so what?

OK, Python will compile that literal to an actual string of \\\1. That gets processed by the regex.sub() code as "for every match to RE_MAGIC_CHARS, replace it with a backslash followed by match group 1". So, for every regex-significant character in the f_pattern, replace it with itself preceded by a backslash. So just in case the find-string included something like [A]. it will become \[A\]\. and will be treated as literal characters.

Find is now almost functionally complete. I need to figure out what UI to provide to load and save the user buttons to/from a file. After that it has a rather rich array of behaviors and I need to write some test cases. Not sure whether to use QTest or Sikuli. Decide those two things over the weekend.

Thursday, September 4, 2014

Regex and backward searching

In the seemingly endless process of getting the Find/Replace panel done (now 3 weeks into its 1-week planned schedule) I have finally gotten to the part where I code some actual finding, and today had a bit of a surprise.

The Find panel offers four buttons, First, Next, Prior and Last. This is my design response to whether find should "wrap around" at end of document. To me it makes much more sense to find the First match and proceed to the Next, and the Next..., or to find the Next after the current cursor and then the Next, and the Next..., or to find the very Last match and walk backward to the Prior, and Prior...

These have key equivalents, and in my most common use case, it's click Next and then (because the focus is pushed to the document on a match) ctl-g, ctl-g, ctl-g... walking forward through the document match by match. Or, click Last and ctl-shift-g, ctl-shift-g, backward through the document. And not infrequently, alternating ctl-g and ctl-shift-g to bounce backward and forward between two (or more) matches.

It wasn't hard to get this working for non-regex finds. This just uses the QTextDocument find() method. This takes a string to match and a starting point (expressed as a QTextCursor), and optionally a flag word with optional Ignore Case, Whole Word, and Backward bits. It searches forward or backward from that cursor's position. This was pretty easy to set up, and it works nicely. I can set the Respect Case and Whole Word check-boxes, click First, and then hold down ctl-g and it rattles through the document from match to match as fast as the key repeats. Same for Last and ctl-shift-g.

Adding regex was a bit more complex. If the Regular Expression checkbox is on, the current value of the Find text field gets compiled as a regex while it is being edited. So when starting a search, if that regex is not None, the find string is a syntactically-valid regex. But the backward part...

As I mentioned some time back, I'm using the superb regex module, not the Python default re module and not Qt5's new QRegularExpression class. One advantage of regex is that is supports backward searching, which neither re nor QRegularExpression do.

However, it supports it as a global flag, similar to regex.IGNORECASE. That means it needs to be specified at compile time. But the compile is happening while the user is editing the pattern string. By the time she clicks Last or Prior, asking for a backward search, the compile is finished. The code that recompiles the regex every time the user edits it (in order to turn the input field pink if the syntax is bad) cannot know if the regex will be used in the forward direction, or reverse.

So in the actual find code, in the case that the regex checkbox is on, I have to ask, is this a reverse search? And if so, recompile the regex with the REVERSE flag. Well, that would be a waste when Prior is being clicked or ctl-shift-g pressed repeatedly. So I had to cache the recompiled regex and only recompile it if needed.

But the first time I hit the Prior button (search backward from the current location), I was surprised that the match occurred at the end of the document, instead of just left of the cursor. As if I'd clicked Last, but I hadn't. What?

The problem was that regex does something reasonable but not what I expected. The QTextDocument find() method, if asked to go backward, goes backward from the position of the given cursor. The equivalent argument for a regex.search() is the pos argument: regex.search(string,pos). That works for a forward search; the search starts at offset pos in string and goes forward. I supposed that an re compiled with REVERSE would be like find(), it would go backward from pos.

Nunh-unh. A reverse search goes backward from the end of the string. Which is why a "Prior" search acted like a "Last" search. Well, how to make it do what I wanted?

Careful reading of the re documentation reminds one that the regex.search() method has two optional arguments, pos and endpos. Not unreasonably, regex doing a backward search starts at the endpos offset (default end of string), and scans backward toward pos (default start of string). So my regex find code ends up with this:

                if flag & FindPanel.SEARCH_BACKWARD :
                    # reverse search begins at "endpos"
                    fp.match = re.search(text,0,start_tc.position())
                else : # forward search begins at "pos" argument
                    fp.match = re.search(text,start_tc.position())
                if fp.match : # is not None, we have a match...

I started to file an Issue with the developer, but thought better of it. What he'd doing makes sense. It merely failed to match my preconceptions.

Monday, September 1, 2014

Sometimes things just work...

In PPQT V1 I emulated a great feature of Guiguts: the ability to limit find/replace to a range of text. For the savvy user this is very useful. Say you are formatting a big table, or the particular range of text like the index of a book. You'd like to do a multiple replace operation within that block of text. A typical example for a table of contents or an index would be to change all \b(\d+)\b page numbers into links to #Page_\1 anchors. You don't want to change every number in the book that way, only the ones in the block of text you are working on.

So you select a block of text and through some feature of the UI, tell the program that this is the range for find/replace operations. Then you can safely do a global replace and know that it will only affect that span.

I forget what the UI for this is in Guiguts. It was something arbitrary and tricky. PPQT V1 was no less arbitrary and tricky. You clicked the "In Selection" checkbox of the Find panel and then clicked either the First or Last search button. At that point the current selection got set as the range.

You had to trust that this was the case, however. Neither V1 nor its model Guiguts showed any visible indication that a search range had been set. You just had to hope you'd set it to cover all, but only, the text you wanted to work on.

For V2, I want to make the setting of the search range more intuitive and also give a visible sign of it. After the great success using the Extra Selections feature of the editor for the current line, I realized that this was the way to put a visible background color under the chosen find range. So what I did this morning was to make this happen:

What you see is a pale blue background showing the extent of a limited find/replace range. Superimposed on it is the pale yellow current-line highlight, and therein lies a tale.

First, the UI. For V2, the way you say "I want a restricted search range" is to click the "In Selection" checkbox making it checked. When that checkbox goes from unchecked to checked, I look at the current selection and if it is "large enough" (a rather arbitrary 100 characters or 4 lines) that selection is made the find-range. And it turns light blue and stays that way until you click "In Selection" off again. So much more obvious than the V1 rule, which I don't even want to think about, it's so dumb.

Displaying the blueness required changes in two other modules. First, all the colors are localized in colors.py, so I had to add get/set_find_range_brush methods to that, and to save and restore the find-range color choice from settings.

Second, the visible display of the document is up to editview.py, which was already handling the current-line mechanism. So for both practicality and MVC purity, that's where display of the range should be. So I generalized that code to always have not one, but two "extra" selections.

Recall that an extra selection is basically a tuple of a QTextCharFormat and a QTextCursor. Anytime any extra selection is altered, you have to call setExtraSelections with a list of them. Previously the list had only one element, the current-line selection. But it was easy to change that so the list always included two selections, one for current line and one for find-range.

For the current line, the related cursor gets updated whenever the edit cursor moves. But for the find-range selection, the related cursor has normally no selection, and thus has no visible effect. The findview code calls into the editview code to set_find_range(cursor) and that puts a text selection in the find-range cursor. And a clear_find_range() call takes it out again.

With that in place I could add the code to the half-done Find panel to use it, and test it and it worked! Well, except for one thing. When the cursor moved through the blue area, the yellow highlight of the current line disappeared. The blue background had priority over the yellow background. Awwww. But wait! The editview was initializing its list of extra selections so:

self.extra_sel_list = [self.current_line_sel, self.range_sel]

Just maybe... I reversed the order of the list,

self.extra_sel_list = [self.range_sel, self.current_line_sel]

and the current line brush now took priority over the blue range brush!

This doesn't seem to be documented but it's good to know: the "extra selection" list is prioritized left to right. Or maybe it goes back-most to front-most in visual depth?

Whatever. Find still has no code for finding anything, but all its widgets work and the In Selection highlight is purty.