Avatar

Facebook crawler (Technics)

by Alfie ⌂, Vienna, Austria, (192 days ago) @ Auge
edited by Alfie, ,

Hi Auge,

Thank you for your report.

Welcome.

Oha.
Oha again.

Right?

Hmm, shoudn't the forum internal spam protection method work similar to described the workaround?

MLF1 (version 1.7.9, inc.php line #58 ff.)

if (trim($data['list']) != '') {
$banned_ips_array = explode(',',trim($data['list']));
if (in_array($_SERVER["REMOTE_ADDR"], $banned_ips_array)) {
die($lang['ip_no_access']);
}
}

Not sure. AFAIK, the FB-crawler has 500+ IP4 and 2,000+ IP6 addresses. According to my access.log my forum was crawled from 67 different IPs within three days.

MLF2

Not my cup of tea… ;-)

The function searches only for exact string. Not ideal, a search for a partially matching string would be better. That way it would also match the following search strings.

+http://www.facebook.com/externalhit_uatext.php
facebookexternalhit
facebookexternalhit/1.1

Especially recognising facebookexternalhit would be nice because this would make the check version string independent. Currently the match would break if Facebook would run a version with another UA-string than 1.1 (provided that the UA-string would otherwise remain unchanged).

Right, makes sense. However, regexes are not my friends.

--
Cheers,
Alfie (Helmut Schütz)
BEBA-Forum (v1.8β)


Complete thread:

 RSS Feed of thread