Petr Bena wrote:
There is a number of services running on our wikis, such as english wikipedia which are for example generating some information and frequently update certain page with these data, (statistics etc). That makes tons of revisions that will be never needed in future and which will be in database with no option to remove them.
For this reason it would be nice to have an option to create a special page on wiki which would have only 1 revision and its data were overwritten, so that it wouldn't eat any space in DB. I know it doesn't look like a big issue, but there is a number of such pages, and bots that update them generate lot of megabytes every month. Bot operators could use for example external web server to display these data, but these can't be transcluded using templates, like here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Sub... ssions
What do you think about this feature? Is it worth of working on, or this isn't a big issue? Keep in mind that even if we have a lot of storage, these unneeded data make the dumps larger and processes that are parsing the XML files are slower
Hi.
I think this is the wrong approach to take. I agree that adding these revisions to the database is a silly thing to do. https://commons.wikimedia.org/wiki/Template:Toolserver is usually the example I use (currently at 44,816 revisions).
For pages such as this, there are a few reasons you might put this on a wiki as opposed to having a dynamic tool on an external service:
* historical data can be stored and tracked; * users can see the updated information in their watchlists; and * the data can be transcluded elsewhere on the wiki.
Depending on the bot and the page, points one and two can have more or less importance. For the third point (transclusion), I've always thought it would be neat to be able to transclude content from the Toolserver. I don't think it's been explored too much, but imagine something like:
{{#toolserver:~user/my_dynamic_tool.py}}
MediaWiki could take this parser tag, fetch the remote content, run it through the parser/sanitizer, and then cache the page. A purge of the MediaWiki page would update the data.
Or nowadays, I guess you could do something similar with jQuery.
This creates a reliance on an external tool to be up and running and might make page load a bit slower, but most of this data is non-critical. As long as it doesn't block page loading when the external service is down, it isn't a big deal.
I'm very hesitant to introduce page history truncation or manipulation. We generally work hard to preserve and protect the integrity of the page history and we use Special pages for dynamic data. For a lot of these bots and pages, I agree that we need to re-evaluate them and make smarter decisions about to present and update the data they provide, but I don't think changing MediaWiki's page history infrastructure is the best approach, given the other options available.
MZMcBride