Avatar

Facebook crawler (Technics)

by Auge ⌂, (192 days ago) @ Alfie
edited by Alfie, ,

Hello Alfie

Hmm, shoudn't the forum internal spam protection method work similar to described the workaround?

MLF1 (version 1.7.9, inc.php line #58 ff.)

if (trim($data['list']) != '') {
$banned_ips_array = explode(',',trim($data['list']));
if (in_array($_SERVER["REMOTE_ADDR"], $banned_ips_array)) {
die($lang['ip_no_access']);
}
}


Not sure. AFAIK, the FB-crawler has 500+ IP4 and 2,000+ IP6 addresses. According to my access.log my forum was crawled from 67 different IPs within three days.

Did you see my marked edit in my last posting before you wrote yours? After sending my posting to the forum I became aware, that MLF1 doesn't have a banlist for user agents.

Handling this over the IPs or IP-ranges is a pointless task. Not only but also because MLF1 does not support IP-ranges.

The function searches only for exact string. Not ideal, a search for a partially matching string would be better. That way it would also match the following search strings.

+http://www.facebook.com/externalhit_uatext.php
facebookexternalhit
facebookexternalhit/1.1

Especially recognising facebookexternalhit would be nice because this would make the check version string independent. Currently the match would break if Facebook would run a version with another UA-string than 1.1 (provided that the UA-string would otherwise remain unchanged).


Right, makes sense. However, regexes are not my friends.

Me too, as you know. :-)

In principle it would be possible to search the strings with string functions but in general such a RegEx has to be implemented only once. For tests of regular expressions I always (in fact: every when and then) use regex101.com.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!


Complete thread:

 RSS Feed of thread