Cannot register, always get: The e-mail address is invalid (General)

by WorldofBB, Wednesday, February 05, 2020, 12:01 (1504 days ago) @ Auge
edited by WorldofBB, Wednesday, February 05, 2020, 12:06

Hello

So as of php 7.3, PCRE2 is more strict in pattern validations. So hyphens now need to be escaped or put at the beginning or the end of a character class.

After running a quick test, it would seem that escaping the hyphen does seem to resolve the problem in php 7.3. Replacing line 393 in functions.inc.php with the following code appears to resolve the issue as well:

if (!preg_match("/^([\w\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,}|[0-9]{1,3})(\]?)$/", $email)) {


I've two questions about it (I myself have problems with understandig of regular expressions).

1. The syntax checks are more strict with PRCE2 which was introduced with PHP 7.3. Is it possible that the more strict syntax rules break the code in older PHP versions (up to 7.2.x) that use PRCE1?

As far as I understand it, the problem is that PCRE1 was way more forgiving of syntax errors than PCRE2 is, so unless a hyphen was at the beginning or the end of a character class, then it is always supposed to be escaped if it is supposed to be taken as a literal hyphen and not part of a range. PCRE1 would assume that an unescaped hyphen was literal if it caused a syntax error whereas PCRE2 no longer assumes that an unescaped hyphen was intended to be a literal hyphen. As of 10.33 it will always follow the syntax as written, and it will no longer assume a hyphen was meant to be taken as literal if it does not follow the correct syntax.

Since PCRE follows Perl behaviour, it is documented like this:

The minus (hyphen) character can be used to specify a range of characters in a character class. For example, [d-m] matches any letter between d and m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class, or immediately after a range. For example, [b-d-z] matches letters in the range b to d, a hyphen character, or z.

"/^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,}|[0-9]{1,3})(\]?)$/"
"/^([\w\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,}|[0-9]{1,3})(\]?)$/"

2. In the first block (([\w-\.]+) versus ([\w\-\.]+)) you masked the hyphen (-) with a backslash. In the block for the domain name the hyphen got not masked only because it is at the end of the rule ((([\w-]+\.)+)); for this see your sentence: "So hyphens now need to be escaped or put at the beginning or the end of a character class.")?

Correct. A hyphen that cannot be syntactically part of a range will still be interpreted as literal, so there is no need to escape the hyphen at the beginning or the end of the character class (Or immediately after a range such as the example above [b-d-z]). But if the hyphen appears anywhere else, it will now ALWAYS be considered part of a range if it isn't escaped with a backslash, and it will always throw up an error - whereas that same code in php 7.2 and earlier would not have.

At least, that's the way that I understand it.

Thank you again for your investigations.

Tschö, Auge

I'm happy to help in any way that I can.

Thank you for all of the time and effort you've put into this forum over the years, it is very much appreciated.


Complete thread:

 RSS Feed of thread