Petr Bena wrote:
There is a number of services running on our wikis,
such as english
wikipedia which are for example generating some information and
frequently update certain page with these data, (statistics etc). That
makes tons of revisions that will be never needed in future and which
will be in database with no option to remove them.
For this reason it would be nice to have an option to create a special
page on wiki which would have only 1 revision and its data were
overwritten, so that it wouldn't eat any space in DB. I know it
doesn't look like a big issue, but there is a number of such pages,
and bots that update them generate lot of megabytes every month. Bot
operators could use for example external web server to display these
data, but these can't be transcluded using templates, like here:
What do you think about this feature? Is it worth of working on, or
this isn't a big issue? Keep in mind that even if we have a lot of
storage, these unneeded data make the dumps larger and processes that
are parsing the XML files are slower
I think this is the wrong approach to take. I agree that adding these
revisions to the database is a silly thing to do.
<https://commons.wikimedia.org/wiki/Template:Toolserver> is usually the
example I use (currently at 44,816 revisions).
For pages such as this, there are a few reasons you might put this on a wiki
as opposed to having a dynamic tool on an external service:
* historical data can be stored and tracked;
* users can see the updated information in their watchlists; and
* the data can be transcluded elsewhere on the wiki.
Depending on the bot and the page, points one and two can have more or less
importance. For the third point (transclusion), I've always thought it would
be neat to be able to transclude content from the Toolserver. I don't think
it's been explored too much, but imagine something like:
MediaWiki could take this parser tag, fetch the remote content, run it
through the parser/sanitizer, and then cache the page. A purge of the
MediaWiki page would update the data.
Or nowadays, I guess you could do something similar with jQuery.
This creates a reliance on an external tool to be up and running and might
make page load a bit slower, but most of this data is non-critical. As long
as it doesn't block page loading when the external service is down, it isn't
a big deal.
I'm very hesitant to introduce page history truncation or manipulation. We
generally work hard to preserve and protect the integrity of the page
history and we use Special pages for dynamic data. For a lot of these bots
and pages, I agree that we need to re-evaluate them and make smarter
decisions about to present and update the data they provide, but I don't
think changing MediaWiki's page history infrastructure is the best approach,
given the other options available.