Avatar

Username with Cyrillic charset doesn't supported in 2.1.3... (Bugs)

by Urfin® ⌂, Russia, Tuesday, February 02, 2010, 10:19 (5169 days ago)

After update from 2.1.1 to 2.1.3. previously registered users with cyrillic symbols in their usernames can't login to forum. Error message "The user name contains invalid characters" appears... :(
How can I fix the problem?

--
no bees - no honey,
no business - no money

Avatar

Username with Cyrillic charset doesn't supported in 2.1.3...

by Alex ⌂, Tuesday, February 02, 2010, 10:44 (5169 days ago) @ Urfin®

After update from 2.1.1 to 2.1.3. previously registered users with cyrillic symbols in their usernames can't login to forum. Error message "The user name contains invalid characters" appears... :(
How can I fix the problem?

This check is only performed when registering. New user names are checked for some control characters. This is done by the function contains_special_characters() (includes/functions.inc.php). What happends if you change it like this?

function contains_special_characters($string)
 {
  return false;
 }
Avatar

my description of problem was wrong...

by Urfin® ⌂, Russia, Tuesday, February 02, 2010, 12:44 (5169 days ago) @ Alex

After update from 2.1.1 to 2.1.3. previously registered users with cyrillic symbols in their usernames can't login to forum. Error message "The user name contains invalid characters" appears... :(
How can I fix the problem?

Some correctives: registered user with cyrillic nikname can login to forum, but he can't create new topic or answer to other messages - error message appears.

This check is only performed when registering. New user names are checked for some control characters. This is done by the function contains_special_characters() (includes/functions.inc.php). What happends if you change it like this?

function contains_special_characters($string)
{
return false;
}

This code allows to bypass check, and "cyrillic user" can publish messages.

--
no bees - no honey,
no business - no money

Test posting...

by Проверять, Tuesday, February 02, 2010, 13:42 (5169 days ago) @ Urfin®

- No text -

Avatar

my description of problem was wrong...

by Alex ⌂, Tuesday, February 02, 2010, 13:58 (5169 days ago) @ Urfin®

Some correctives: registered user with cyrillic nikname can login to forum, but he can't create new topic or answer to other messages - error message appears.

I see. Actually it isn't necessary to check registered users. But that's not the main problem...

This code allows to bypass check, and "cyrillic user" can publish messages.

The original function looks like this:

function contains_special_characters($string)
 {
  if(preg_match("/([[:cntrl:]]|\255)/", $string)) return true; // control characters and soft hyphen
  if(preg_match("/(\x{200b})/u", $string)) return true; // zero width space
  return false;
 }

I'm afraid the u modifier causes the problem on your server (cannot reproduce it here, see this posting).

What about this modification?

function contains_special_characters($string)
 {
  if(preg_match("/([[:cntrl:]]|\255)/", $string)) return true; // control characters and soft hyphen
  return false;
 }

Anyway this check isn't really important. The idea was to prevent users from specifying identical looking user names like already registered user names by not accepting invisible characters. However, this isn't very promising as unicode characters are allowed (for example: you could post with my name using the cyrillic "А").

Alex

Avatar

my description of problem was wrong...

by Urfin® ⌂, Russia, Tuesday, February 02, 2010, 15:04 (5169 days ago) @ Alex

The original function looks like this:

function contains_special_characters($string)
{
if(preg_match("/([[:cntrl:]]|\255)/", $string)) return true; // control characters and soft hyphen
if(preg_match("/(\x{200b})/u", $string)) return true; // zero width space
return false;
}

original function was:

function contains_special_characters($string)
 {
  #if(!preg_match("/^[a-zA-Z0-9_\- ]+$/", $string)) return true; // only alphanumeric characters, "-", "_" and " " allowed
  if(preg_match("/([[:cntrl:]]|\255)/", $string)) return true; // not allowed: control characters and soft hyphen
  else return false;
 }

I'm afraid the u modifier causes the problem on your server (cannot reproduce it here, see this posting).

no, not 'u', but maybe [[:cntrl:]]?

What about this modification?

function contains_special_characters($string)
{
if(preg_match("/([[:cntrl:]]|\255)/", $string)) return true; // control characters and soft hyphen
return false;
}

mod above doesn't allow the 'ciryllic' user to publish a message on my server...
so I use only this part of code:

function contains_special_characters($string)
 {
  if(preg_match("/(\x{200b})/u", $string)) return true; // zero width space
  return false;
 }

Anyway this check isn't really important. The idea was to prevent users from specifying identical looking user names like already registered user names by not accepting invisible characters. However, this isn't very promising as unicode characters are allowed (for example: you could post with my name using the cyrillic "А").

yes, some 'bad guys' in several Russian forums use similar vulnerabilities in their purposes... until a moderator has noticed ;)

--
no bees - no honey,
no business - no money

Avatar

[[:cntrl:]]?

by Alex ⌂, Tuesday, February 02, 2010, 15:43 (5169 days ago) @ Urfin®

no, not 'u', but maybe [[:cntrl:]]?

Hm... I'd really like to reproduce this! Could you check what this script outputs (was "Алек©ей" the user name that triggered the error?)?

<?php
function matches($string)
 {
  if(preg_match("/([[:cntrl:]])/", $string)) return true; // control characters
  return false;
 }
 
if(matches('Алек©ей')) echo 'matches!';
else echo 'doesn\'t match!';
?>

I get "doesn't match!".

Alex

Avatar

Re: [[:cntrl:]]?

by Urfin® ⌂, Russia, Wednesday, February 03, 2010, 12:46 (5168 days ago) @ Alex

I get "doesn't match!".

I get "doesn't match!" too...
but when the function function contains_special_characters($string) in functions.inc.php contans the string like

if(preg_match("/([[:cntrl:]])/", $string)) return true; // control characters


user with cyrillic name ("Алек©ей" or specially created user "тест") can not publish comments.

and while I check the script

<?php
function matches($string)
 {
  if(preg_match("/([[:cntrl:]])/", $string)) return true; // control characters
  return false;
 }
 
if(matches('тест')) echo 'matches!';
else echo 'doesn\'t match!';
?>


I also get "doesn't match!"

some info about web-server:
Operating system: Linux
Apache version: 2.2.12 (Unix)
PHP version: 5.2.10
MySQL version: 5.0.51a
MySQL encoding: UTF-8 Unicode (utf8)
Unicode collation: utf8_general_ci

...maybe it's better to change last option to utf8_unicode_ci?

--
no bees - no honey,
no business - no money

Avatar

Re: [[:cntrl:]]?

by Auge ⌂, Wednesday, February 03, 2010, 15:55 (5168 days ago) @ Urfin®

Hello

if(matches('тест')) echo 'matches!';
else echo 'doesn\'t match!';
?>[/code]
I also get "doesn't match!"

some info about web-server:
Operating system: Linux
Apache version: 2.2.12 (Unix)
PHP version: 5.2.10
MySQL version: 5.0.51a
MySQL encoding: UTF-8 Unicode (utf8)
Unicode collation: utf8_general_ci

...maybe it's better to change last option to utf8_unicode_ci?

The collation of the database table cell (i.e. utf8_general_ci vs. utf8_unicode_ci) is used for language dependent sorting of strings (considering of special chars like umlauts for sorting).

An explanation of regular expressions in SELFHTML (german language!) says for perls regular expressions (in PHP preg_*** functions), that /\w/ is the equivalence for alphanumerical chars, including "_" (for the latin writing system: /[a-zA-Z0-9_]/). Together with "u" (/\w/u) it should act for all unicode chars in any writing system, numbers and "_".

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Avatar

Re: [[:cntrl:]]?

by Alex ⌂, Wednesday, February 03, 2010, 19:41 (5167 days ago) @ Auge

Together with "u" (/\w/u) it should act for all unicode chars in any writing system, numbers and "_".

<?php
if(preg_match("/[^\w]/u", 'тест')) echo 'Non alphanumeric charecter(s) found!';
?>

→ Non alphanumeric charecter(s) found!

Avatar

/[^\w]/u

by Auge ⌂, Wednesday, February 03, 2010, 20:14 (5167 days ago) @ Alex

Hello

Together with "u" (/\w/u) it should act for all unicode chars in any writing system, numbers and "_".


<?php
if(preg_match("/[^\w]/u", 'тест')) echo 'Non alphanumeric charecter(s) found!';
?>

→ Non alphanumeric charecter(s) found!

Isn't it the wrong logic? If 'тест' matches against /[^\w]/u, wich means all alphanumeric chars and "_" (/[^\w]/) in unicode range (u), the test passes. Or am I wrong?

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Avatar

/\w/u

by Alex ⌂, Wednesday, February 03, 2010, 20:29 (5167 days ago) @ Auge

Isn't it the wrong logic? If 'тест' matches against /[^\w]/u, wich means all alphanumeric chars and "_" (/[^\w]/) in unicode range (u), the test passes. Or am I wrong?

[^...] negates it. So it should match if there is at least one non alphanumeric character, shouldn't it?

But let's just do it the other way round:

<?php
if(preg_match("/\w/u", 'тест')) echo 'Alphanumeric charecter(s) found!';
?>

→ Doesn't output anything on my server.

Avatar

/\w/u

by Auge ⌂, Wednesday, February 03, 2010, 20:38 (5167 days ago) @ Alex

Hello

Isn't it the wrong logic? If 'тест' matches against /[^\w]/u, wich means all alphanumeric chars and "_" (/[^\w]/) in unicode range (u), the test passes. Or am I wrong?


[^...] negates it. So it should match if there is at least one non alphanumeric character, shouldn't it?

Yes it should, I was wrong. But in the case of a negation of a class (like /\w/ or /\d/ for numbers) you can use /\W/ (/\D/) as negator instead ^.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Avatar

/\w/u

by Alex ⌂, Wednesday, February 03, 2010, 20:47 (5167 days ago) @ Auge

Yes it should, ...

And as it matches on "тест" it means that Cyrillic characters are not considered as alphanumeric characters. :-(

Alex

Avatar

so...

by Urfin® ⌂, Russia, Friday, February 05, 2010, 12:49 (5166 days ago) @ Alex

Yes it should, ...


And as it matches on "тест" it means that Cyrillic characters are not considered as alphanumeric characters. :-(

[irony]ethnic discrimination detected![/irony] ;)

--
no bees - no honey,
no business - no money

RSS Feed of thread