I am also experiencing this problem.
The requests have these properties:
My theory is that the problem is caused by a combination of "stupid" web crawlers and unpractical forum software.
If you make a request with an old (long gone) sid in the url from a new IP, then you will receive a page in a new session and with the new sid in all the links on the page, so basically you get a page with new links (we know that they are not actually new because the sid should be disregarded).
Now if you then take the page back to the central register it will obtain some text and a number of "new" links to crawl. These links are put into an already long (very long) queue of links to crawl. Links in this queue is crawled much later than they are entered and by a completely different client/crawler. So they again appear "unknown" and from a "first timer" and therefore they result in pages with new sid's in the links.
So this is a never ending story: New links are being invented and later crawled and then new links are being invented ...
To deal with this I have made two changes to the phpbb-software:
This change means that with very little resource load the "bots" are redirected to a very neutral page, which only contains static links.
As a result the web server now actually have resources to service regular users.
The second change is in append_sid (the root cause of it all) in functions.php.
This change means that people without sid in a cookie cannot have a session (in my mind there should never have been an append_sid function at all).
Now I will wait and see how long time it takes before the "sid" requests wear off. If they don't wear off then my theory is wrong. Question is how long time I will have to wait for this to happen because I believe "they" have a "tricillion" links waiting in their register(s) already. Each request they have made in the past gave 3-100+ new links (i.e. an exponential growth with a rather high exponent).
The requests have these properties:
- They do not have a bot-identification
- They do not respect robots.txt
- They have a sid in the url
My theory is that the problem is caused by a combination of "stupid" web crawlers and unpractical forum software.
If you make a request with an old (long gone) sid in the url from a new IP, then you will receive a page in a new session and with the new sid in all the links on the page, so basically you get a page with new links (we know that they are not actually new because the sid should be disregarded).
Now if you then take the page back to the central register it will obtain some text and a number of "new" links to crawl. These links are put into an already long (very long) queue of links to crawl. Links in this queue is crawled much later than they are entered and by a completely different client/crawler. So they again appear "unknown" and from a "first timer" and therefore they result in pages with new sid's in the links.
So this is a never ending story: New links are being invented and later crawled and then new links are being invented ...
To deal with this I have made two changes to the phpbb-software:
- Requests with a sid which cannot be found in the database are redirected to a static html file; stating that their session has expired (and with a link back to the main forum page)
- Only registered users will get links with sids (otherwise you cannot change to administrator mode).
Code:
// if session id is setif (!empty($this->session_id)){$sql = 'SELECT u.*, s.*FROM ' . SESSIONS_TABLE . ' s, ' . USERS_TABLE . " uWHERE s.session_id = '" . $db->sql_escape($this->session_id) . "'AND u.user_id = s.session_user_id";$result = $db->sql_query($sql);$this->data = $db->sql_fetchrow($result);$db->sql_freeresult($result); // silly bot counter-fit if (!isset($this->data['user_id'])){ redirect("/expired.htm"); }
As a result the web server now actually have resources to service regular users.
The second change is in append_sid (the root cause of it all) in functions.php.
Code:
// Append session id and parameters (even if they are empty)// If parameters are empty, the developer can still append his/her parameters without caring about the delimiter global $user; if ($session_id && $user->data['is_registered']) { return $url . (($append_url) ? $url_delim . $append_url . $amp_delim : $url_delim) . $params . ((!$session_id) ? '' : $amp_delim . 'sid=' . $session_id) . $anchor; } else { return $url . (($append_url) ? $url_delim . $append_url . $amp_delim : $url_delim) . $params . $anchor; }
Now I will wait and see how long time it takes before the "sid" requests wear off. If they don't wear off then my theory is wrong. Question is how long time I will have to wait for this to happen because I believe "they" have a "tricillion" links waiting in their register(s) already. Each request they have made in the past gave 3-100+ new links (i.e. an exponential growth with a rather high exponent).
Statistics: Posted by Thomas Linder Puls — Wed Mar 12, 2025 11:53 am