Submitted by : techie at: 2006-08-18T02:00:28+00:00 (13 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

Leo 4.4.1b3 fails to correctly recognize word boundaries in russian sentences written in UTF-8. Try this text for example:

80-е годы, ГДР. Один немецкий переводчик хвастался, что идеально знает русский язык, переведет любую фразу. Ну, ему и предложили перевести на немецкий:«Косил Косой косой косой». До сих пор мучается.

... --edreamleo, Mon, 21 Aug 2006 07:54:58 -0700 reply

> Leo 4.4.1b3 fails to correctly recognize word boundaries in russian sentences written in UTF-8.

Thanks for this report. I assume you mean that Alt-f (forward-word) does not work properly. It doesn't work for me on XP either.

I'm not sure whether I can fix this: it may be a Tk bug. The relevant code is in leoPy.leo in the node:

Code-->@thin leoEditCommands.py-->class editCommandsClass-->move cursor... (leoEditCommands)--> helpers-->moveWordHelper

That is, Leo is using Tk's notion of what a word is:

if forward:
    ind = w.search('\w','insert',stopindex='end',regexp=True)
    if ind: nind = '%s wordend' % ind
    else:   nind = 'end'
else:
    ind = w.search('\w','insert -1c',stopindex='1.0',regexp=True,backwards=True)
    if ind: nind = '%s wordstart' % ind
    else:   nind = '1.0'

Edward

... --techie, Mon, 21 Aug 2006 23:22:57 -0700 reply

Sorry, can't help there - hardly imagine what TCL/TK is and have very little Python experience to try anything at the moment. But I googled another way of detecting word boundaries (rather obscure for me) http://www.openmash.org/lxr/source/library/word.tcl?c=tcl8.3 To detect if it is TCL/TK bug seems like all needed is correct TCL/TK version and small hello-world editor in that language.

... --techie, Mon, 21 Aug 2006 23:34:05 -0700 reply

Another google result http://wiki.tcl.tk/3512 - wordstart/wordend - I do not know if they are work by default. And if TCL/TK is responsible for drawing windows I have noticed that text wrapping in edit pane works correctly.

... --edreamleo, Tue, 22 Aug 2006 06:28:03 -0700 reply

> I googled another way of detecting word boundaries.

This is actually the same way used in the code, but it revealed the problem: the pattern must be

\\W rather than \w

The fix is on cvs, and affects both the forward-word and backward-word commands.

... --edreamleo, Tue, 22 Aug 2006 06:32:22 -0700 reply

Status: open => closed

... --techie, Wed, 23 Aug 2006 00:34:05 -0700 reply

Great! Tnx. =)

... --techie, Thu, 31 Aug 2006 06:06:54 -0700 reply

I modified the script to jump to word start similar to majority of other editors. See http://sourceforge.net/forum/message.php?msg_id=3892876

... --techie, Sat, 09 Sep 2006 00:39:09 -0700 reply

Name: '#9 Russian unicode words are not detected properly in body pane' => '#9 Jump dest for prev/next word is not detected properly' Status: closed => pending

I have reopened this issue, because my patch for the issue was incomplete and didn't cover all possible cases. There is a corrected version in the thread at the link below.

http://sourceforge.net/forum/message.php?msg_id=3901275

As we all do not have too much free time I hope this tracker issue will help to get things done in most simple way. =)

... --techie, Sun, 01 Oct 2006 05:03:34 -0700 reply

This time it is patch for 4.4.2 beta 1 to make Leo jump to the start of word while searching backwards. It is default behavior windows editors:

--- E:/ENV/Leo/src/leoEditCommands_old.py   Sun Oct 01 14:35:38 2006
+++ E:/ENV/Leo/src/leoEditCommands_var2.py  Sun Oct 01 14:42:48 2006
@@ -3302,12 +3302,18 @@
         i = toPython(w.index('insert'))
         delta = g.choose(forward,1,-1)

-        if not forward: i -= 1
-        while 0 <= i < n and isWordChar(s[i]):
-            i += delta
-        while 0 <= i < n and not isWordChar(s[i]):
-            i += delta
-        if not forward: i += 1;
+        if not forward:
+            i -= 1
+            while 0 <= i < n and not isWordChar(s[i]):
+                i += delta
+            while 0 <= i < n and isWordChar(s[i]):
+                i += delta
+            i += 1
+        else:
+            while 0 <= i < n and isWordChar(s[i]):
+                i += delta
+            while 0 <= i < n and not isWordChar(s[i]):
+                i += delta

         self.moveToHelper(event,toGui(i),extend)
     #@nonl