Friday, September 5, 2014

Global replace

Today I put in the last missing bit of function in the Find panel, global replace. The user has found something, set the checkbox that is labelled "All!", and clicked the Replace button beside one of three replace input fields. Just as in V1, the code finds all occurrences of the current search string in the search range, and asks the user, "OK to replace 41 occurrences of Blarg with Bluh?"

I have always thought well of this feature. I don't know any other editor that does it. It gives you a quick feel for whether this global replace is indeed what you intended. You know right away if the count is unexpectedly large or small. And it comes before anything has been changed. The closest feature to it that I know is BBEdit, which after a global replace tells you "OK I have replaced 437 occurrences of Original". And if the number looks odd, you can control-z the operation to back out. But I'd rather be told beforehand.

Anyway, with the list of all matches in hand, the code then marches through them replacing each with the replace string. This is surrounded with calls to QTextCursor beginEditMacro() and endEditMacro() so it's all one undo.

I had been going to split the processing between regex and non-regex, as in V1, but then I found the lovely finditer() method of regex, which returns all matches in a range of text in one operation. So instead, I convert a non-regex find and replace string to regex format, as follows. For the find string, apply this regex:

    RE_MAGIC_CHARS = regex.compile('([\[\]\(\)\*\.\?\+])')

That matches to any single character that is magic to an RE. For the replace string, the only thing that is potentially magic is the backslash. So here's the code, with several error-checks edited out for simplicity.

        r_pattern = self.replace_fields[button].text()
        f_pattern = self.find_field.text()
        if self.sw_regex.isChecked() :
            rex = self.find_field.regex
        else : # not a regex pattern, make it one.
            r_pattern = r_pattern.replace('\\','\\\\')
            f_pattern = RE_MAGIC_CHARS.sub('\\\\\\1',f_pattern)
            rex = regex.compile(f_pattern)
        range_tc = self.editv.get_find_range() # cursor over find range
        full_text = self.editm.full_text() # entire document as Python string
        # In one statement get a match for every hit in the range.
        mlist = [ m for m in rex.finditer(full_text,range_tc.selectionStart(),range_tc.selectionEnd())]

Isn't that slick? A one-line list comprehension to match potentially hundreds of hits, or just a few in a small range. Anytime I get to use a list comprehension I feel like a real pythonista.

What? That expression RE_MAGIC_CHARS.sub('\\\\\\1',f_pattern)? Yeah, there are six consecutive backslashes in that, so what?

OK, Python will compile that literal to an actual string of \\\1. That gets processed by the regex.sub() code as "for every match to RE_MAGIC_CHARS, replace it with a backslash followed by match group 1". So, for every regex-significant character in the f_pattern, replace it with itself preceded by a backslash. So just in case the find-string included something like [A]. it will become \[A\]\. and will be treated as literal characters.

Find is now almost functionally complete. I need to figure out what UI to provide to load and save the user buttons to/from a file. After that it has a rather rich array of behaviors and I need to write some test cases. Not sure whether to use QTest or Sikuli. Decide those two things over the weekend.

No comments: