Warning: Non-US characters in user name (General)

by danielb987, Tuesday, April 18, 2017, 12:43 (2789 days ago)

Hello,

I have a Swedish forum there some of the users have Swedish characters in their user names. I have now noticed that if two users have user names that looks alike but are different, they will not be able to login.

Example:
One user has the user name "daniel" and the other user has the user name "däniel".

The mysql database thinks that this is the same user name, which means that a database search on one of these user names will result in two rows. But My Little Forum requires that only one single user has a particular user name.

I think MySQL has changed how the database thinks of this sometimes in the past. Maybe it's due to the transition to UTF-8 that causes this but I'm not sure. My forum has had a very old version of My Little Forum until a month ago and when I did the upgrade, the tables where changed to UTF-8.

I suggest that the field "user_name" should have a unique index, but if doing so, we might have other forums with duplicate user names.

There is a simple way to see if this is a problem. This SQL query will show any dublicate user names. If there is any dublicate user names in the database, these users will not be able to log in to the forum.

SELECT `user_name`, COUNT(`user_name`) FROM `mlf2_userdata` GROUP BY `user_name` HAVING COUNT(`user_name`) > 1

If an upgrade script is adding a uniqe index to a text field in the database, I strongly recommends that the query above is used before to check if the field is uniqe. It is much better to get an error before the upgrade script does any changes than to get the error in the middle of the upgrade.

Best regards,
Daniel

Avatar

Warning: Non-US characters in user name

by Auge ⌂, Tuesday, April 18, 2017, 03:01 (2788 days ago) @ danielb987
edited by Auge, Wednesday, April 19, 2017, 09:41

Hello

I have a Swedish forum there some of the users have Swedish characters in their user names. I have now noticed that if two users have user names that looks alike but are different, they will not be able to login.

Example:
One user has the user name "daniel" and the other user has the user name "däniel".

I can confirm this misbehaviour in one of <del>my accessible</del> servers <ins>I have access to</ins>. The reason is the use of the collation utf8_general*. A blog entry (in german language) offers a solution, the use of the collation utf8_bin. But this has the same issues like your unique index and has to be implemented equally careful.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Warning: Non-US characters in user name

by danielb987, Tuesday, April 18, 2017, 04:46 (2788 days ago) @ Auge
edited by Auge, Wednesday, April 19, 2017, 09:42

Hello

I have a Swedish forum there some of the users have Swedish characters in their user names. I have now noticed that if two users have user names that looks alike but are different, they will not be able to login.

Example:
One user has the user name "daniel" and the other user has the user name "däniel".


I can confirm this misbehaviour in one of <del>my accessible</del> servers <ins>I have access to</ins>. The reason is the use of the collation utf8_general*. A blog entry (in german language) offers a solution, the use of the collation utf8_bin. But this has the same issues like your unique index and has to be implemented equally careful.

Tschö, Auge

In the case with the user name, it is a problem. But in general I think this is a very useful feature. For example, a danish user may search for a word with the letter Ø then the word contains the Swedish letter Ö.

Best regards
Daniel

Avatar

Warning: Non-US characters in user name

by Auge ⌂, Wednesday, April 19, 2017, 09:44 (2788 days ago) @ danielb987

Hello

I have a Swedish forum there some of the users have Swedish characters in their user names. I have now noticed that if two users have user names that looks alike but are different, they will not be able to login.


I can confirm this misbehaviour in one of <del>my accessible</del> servers <ins>I have access to</ins>. The reason is the use of the collation utf8_general*. A blog entry (in german language) offers a solution, the use of the collation utf8_bin.


In the case with the user name, it is a problem. But in general I think this is a very useful feature. For example, a danish user may search for a word with the letter Ø then the word contains the Swedish letter Ö.

It should be no problem to only change the collation of fields, where it is necessary. :-)

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Avatar

Warning: Non-US characters in user name

by Auge ⌂, Wednesday, May 03, 2017, 08:10 (2773 days ago) @ danielb987

Hello

I suggest that the field "user_name" should have a unique index, but if doing so, we might have other forums with duplicate user names.

There is a simple way to see if this is a problem. This SQL query will show any dublicate user names.

SELECT `user_name`, COUNT(`user_name`) FROM `mlf2_userdata` GROUP BY `user_name` HAVING COUNT(`user_name`) > 1

I build a (until now) non functional prototype of the check for duplicates/non-unique names.

if (empty($update['errors']) && in_array($settings['version'],array('2.3.5', '2.3.6', '2.3.6.1', '2.3.7', '2.3.99.1', '2.3.99.2', '2.3.99.3', '2.4', '2.4.1', '2.4.2'))) {
 $resCountNames = mysqli_query($connid, "SELECT `user_name`, COUNT(`user_name`) AS cnt FROM `mlf2_userdata` GROUP BY `user_name` HAVING COUNT(`user_name`) > 1");
 if ($resCountNames === false) {
  $update['errors'][] = 'Database error in line '.__LINE__.': ' . mysqli_error($connid);
 }
 if (empty($update['errors'])) {
  if (mysqli_num_rows($connid, $resCountNames) > 0) {
   # list the non-unique user names
   $update['errors'][]  = '<h3><strong>Attention</strong>: Found non-unique user names!</h3>';
   $update['errors'][] .= '<p>Please make the names unique and inform the users in question about the changes.</p>';
   $update['errors'][] .= '<pre>';
   while ($row = mysqli_fetch_assoc($resCountNames)) {
    $update['errors'][] .= htmlspecialchars($row['user_name']) .": ". htmlspecialchars($row['cnt']) ."\n";
   }
   $update['errors'][] .= '</pre>';
   mysqli_free_result($result);
  } else {
   if(!@mysqli_query($connid, "ALTER TABLE `".$db_settings['userdata_table']."` MODIFY `user_name` varchar(255) COLLATE utf8_bin;")) {
    $update['errors'][] = 'Database error in line '.__LINE__.': ' . mysqli_error($connid);
   }
   /* a few queries to add indexex to a few fields in a few tables */
   if(!@mysqli_query($connid, "ALTER TABLE `".$db_settings['userdata_table']."` ADD INDEX(`user_type`);")) {
    $update['errors'][] = 'Database error in line '.__LINE__.': ' . mysqli_error($connid);
   }
   if(!@mysqli_query($connid, "ALTER TABLE `".$db_settings['userdata_table']."` ADD INDEX(`user_name`);")) {
    $update['errors'][] = 'Database error in line '.__LINE__.': ' . mysqli_error($connid);
   }
  }
 }
}

In the block if (mysqli_num_rows($connid, $resCountNames) > 0), where I want to list the questionable names, we should add an additional info about names, that can be listed as identic but are not. This can be a case like you described it ("Daniel" vs. "Däniel" (wrote it with German "ä")) or similar. Do you or anyone else have a proposal for a few suitable sentences?

If an upgrade script is adding a uniqe index to a text field in the database, I strongly recommends that the query above is used before to check if the field is uniqe. It is much better to get an error before the upgrade script does any changes than to get the error in the middle of the upgrade.

FACK, thatswhy I defined the check as the first step of the upgrade process.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Warning: Non-US characters in user name

by danielb987, Thursday, May 04, 2017, 06:06 (2773 days ago) @ Auge

Very good. But I think that the SQL query that changes the user_name field to utf8_bin should be run before the duplicate test. By doing that, Daniel and Däniel would be different usernames.

Second, the user_name field needs to have a UNIQUE index. Even if the user_name field is utf8_bin, there may still be duplicates. For example if someone is manually editing the database and messing things up. So we need to protect the user_name field from duplicates in the database by a UNIQUE index.

If we find duplicates, we should show a list of all the duplicates and a link to the administrator control panel to edit the user name:
http://www.example.com/forum/index.php?mode=admin&edit_user=18

This list should show the user id, user name and email address, since the user id is the only thing that is guaranteed unique and the email address is usefull if the administrator wants to contact the user. In order to do this, we need to do a second SQL query for each duplicate user name.

Best regards,
Daniel

Avatar

Warning: Non-US characters in user name

by Auge ⌂, Thursday, May 04, 2017, 06:28 (2773 days ago) @ danielb987

Hello

Very good. But I think that the SQL query that changes the user_name field to utf8_bin should be run before the duplicate test. By doing that, Daniel and Däniel would be different usernames.

*grmpf* Yes, you are right.

Second, the user_name field needs to have a UNIQUE index.

The operation would fail, if there are (for wich reason ever) duplicates. Therefore the check has to be performed before the unique index take place. One step after the other. :-)

If we find duplicates, we should show a list of all the duplicates and a link to the administrator control panel to edit the user name:
http://www.example.com/forum/index.php?mode=admin&edit_user=18

That's, what the prototype adumbrates. If there are rows in the result (if (mysqli_num_rows($connid, $resCountNames) > 0)) the script will render a list of the non-unique names. At the moment it lacks a link to the admin interface for user handling (by the way: a good idea).

This list should show the user id, user name and email address, since the user id is the only thing that is guaranteed unique and the email address is usefull if the administrator wants to contact the user. In order to do this, we need to do a second SQL query for each duplicate user name.

This might be doable in a single query. If we create a list, we can enrich this one list with all necessary informations.

Thank you so far for your useful input.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Avatar

Warning: Non-US characters in user name

by Micha ⌂, Thursday, May 04, 2017, 06:38 (2773 days ago) @ Auge

Hello,

$update['errors'][] .= '<p>Please make the names unique and inform the users in question about the changes.</p>';

Hmm, I don't like this proposal. If once the user-name was valid, it should be valid all the time (like the Password: md5 --> sha1).

/Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

Warning: Non-US characters in user name

by danielb987, Thursday, May 04, 2017, 06:51 (2773 days ago) @ Micha

$update['errors'][] .= '<p>Please make the names unique and inform the users in question about the changes.</p>';


Hmm, I don't like this proposal. If once the user-name was valid, it should be valid all the time (like the Password: md5 --> sha1).

Please note that we are talking about a bug here. If the forum has two users Daniel and Däniel, only one of them will be able to log in to the forum. This is how I discovered the problem.

I think that the forum in the past had Latin1 encoding but this was changed to UTF-8. (I might be wrong, since I'm new to MyLittleForum). And with Latin1, Daniel and Däniel is two different user names.

With utf-8 encoding and the use of the collation utf8_general*, the MySQL database treats similar characters as the same character, for example A and Ä. And therefore, Daniel and Däniel is now the same user name.

If two users has the same user name, one of them will not be able to log in, so this issue must be resolved.

Best regards,
Daniel

Avatar

Warning: Non-US characters in user name

by Micha ⌂, Thursday, May 04, 2017, 10:23 (2773 days ago) @ danielb987

Hi,

If the forum has two users Daniel and Däniel, only one of them will be able to log in to the forum. This is how I discovered the problem.

Okay, it was a misinterpretation.

regards
Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

Avatar

Warning: Non-US characters in user name

by Auge ⌂, Thursday, May 04, 2017, 07:35 (2773 days ago) @ Micha

Hello Milo

$update['errors'][] .= '<p>Please make the names unique and inform the users in question about the changes.</p>';


Hmm, I don't like this proposal. If once the user-name was valid, it should be valid all the time (like the Password: md5 --> sha1).

The problem with it is the possibility of non-uniqueness of names. It came up, that the used collation utf8_general_ci does not distinct between "a" and "ä" or "o" and "ø" and so on. So two different names could be realised as identically by the database. To change this I want to change the collation of only the column mlf2_userdata.user_name to utf8_bin.

Because of the existing check for uniqueness of a user name during the registration it is very unlikely to duplicate a user name, but it is not impossible. It's possible, even it's very improbable, that the check failed. Thatswhy, only for safety reasons, I want to add the check in the update script. I do not expect to find doublettes at all because (Daniels example) after the change of the collation the users "daniel" and "däniel" would never again found as identical and a second user with the name "daniel" or "däniel" would already now be rejected during the <del>installation</del><ins>registration</ins> when other user(s) are registered under this/these name(s).

So this would be a safety net for the hypothetical case of a bitwise non-unique user name. This case has to be prevented because the users with the questionable name have the real chance of not to be able to log in again.

Again, this is a very hypothetical case in my eyes.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

RSS Feed of thread