Hi,
There is a number of services running on our wikis, such as english wikipedia which are for example generating some information and frequently update certain page with these data, (statistics etc). That makes tons of revisions that will be never needed in future and which will be in database with no option to remove them.
For this reason it would be nice to have an option to create a special page on wiki which would have only 1 revision and its data were overwritten, so that it wouldn't eat any space in DB. I know it doesn't look like a big issue, but there is a number of such pages, and bots that update them generate lot of megabytes every month. Bot operators could use for example external web server to display these data, but these can't be transcluded using templates, like here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Sub...
What do you think about this feature? Is it worth of working on, or this isn't a big issue? Keep in mind that even if we have a lot of storage, these unneeded data make the dumps larger and processes that are parsing the XML files are slower
I might be wrong, but I believe there's a legal implication to doing something like this (at least on WMF wikis) since it's removing attribution to various people who "contributed" to the article, even if it's not in the traditional sense. Also it kind of violates the basic design of MediaWiki itself, which is to keep revisions of pages.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Tue, Aug 28, 2012 at 4:04 AM, Petr Bena benapetr@gmail.com wrote:
Hi,
There is a number of services running on our wikis, such as english wikipedia which are for example generating some information and frequently update certain page with these data, (statistics etc). That makes tons of revisions that will be never needed in future and which will be in database with no option to remove them.
For this reason it would be nice to have an option to create a special page on wiki which would have only 1 revision and its data were overwritten, so that it wouldn't eat any space in DB. I know it doesn't look like a big issue, but there is a number of such pages, and bots that update them generate lot of megabytes every month. Bot operators could use for example external web server to display these data, but these can't be transcluded using templates, like here:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Sub...
What do you think about this feature? Is it worth of working on, or this isn't a big issue? Keep in mind that even if we have a lot of storage, these unneeded data make the dumps larger and processes that are parsing the XML files are slower
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I> might be wrong, but I believe there's a legal implication to doing
something like this (at least on WMF wikis) since it's removing attribution to various people who "contributed" to the article, even if it's not in the traditional sense. Also it kind of violates the basic design of MediaWiki itself, which is to keep revisions of pages.
This would make an excellent extension though. I know we would use it. It's probably one of the most straight-forward to make extensions I've heard of as well.
Thank you, Derric Atzrott
I don't mean to break the current design, rather to implement a new feature, these pages could live in a special namespace like [[NoHistory:PageName]] and it would be forbidden to move pages from / to this space so that existing articles with history could not be damaged in any way.
Since it wouldn't be used by people at all, but only by automated bots or scripts, I don't think there would be any problem with attribution.
On Tue, Aug 28, 2012 at 3:54 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
I> might be wrong, but I believe there's a legal implication to doing
something like this (at least on WMF wikis) since it's removing attribution to various people who "contributed" to the article, even if it's not in the traditional sense. Also it kind of violates the basic design of MediaWiki itself, which is to keep revisions of pages.
This would make an excellent extension though. I know we would use it. It's probably one of the most straight-forward to make extensions I've heard of as well.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hey,
It's probably one of the most straight-forward to make extensions I've
heard of as well.
If you go for a special page based design, then yes, this is simple. If you want to have it properly integrated with MediaWiki, it's rather less trivial.
these pages could live in a special namespace like [[NoHistory:PageName]]
I like that idea. Generalizing the article handling code to not assume there is a revision for every edit would probably require changes much in line what's being done with the content handler stuff.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
It's not really that simple, the page should be accessible via API just as any other page, should be useable as any other page, etc. (include able using {{NoHistory:PageName}}). If we used the conception of Special page, it wouldn't work like this
On Tue, Aug 28, 2012 at 3:59 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
It's probably one of the most straight-forward to make extensions I've
heard of as well.
If you go for a special page based design, then yes, this is simple. If you want to have it properly integrated with MediaWiki, it's rather less trivial.
these pages could live in a special namespace like [[NoHistory:PageName]]
I like that idea. Generalizing the article handling code to not assume there is a revision for every edit would probably require changes much in line what's being done with the content handler stuff.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
It's probably one of the most straight-forward to make extensions I've heard of as well.
If you go for a special page based design, then yes, this is simple. If you want to have it properly integrated with MediaWiki, it's rather less trivial.
It's not really that simple, the page should be accessible via API just as any other page, should be useable as any other page, etc. (include able using {{NoHistory:PageName}}). If we used the conception of Special page, it wouldn't work like this
Hmmmm... That is true. I had been thinking the Special Page route, but you make a good point in that it should be properly integrated into Mediawiki.
Although for the {{NoHistory:PageName}} bit you could use a transcludable Special Page for that.
Is there functionality in the core for extensions to be able to create and manage a namespace of their own?
Thank you, Derric Atzrott
Petr Bena wrote:
There is a number of services running on our wikis, such as english wikipedia which are for example generating some information and frequently update certain page with these data, (statistics etc). That makes tons of revisions that will be never needed in future and which will be in database with no option to remove them.
For this reason it would be nice to have an option to create a special page on wiki which would have only 1 revision and its data were overwritten, so that it wouldn't eat any space in DB. I know it doesn't look like a big issue, but there is a number of such pages, and bots that update them generate lot of megabytes every month. Bot operators could use for example external web server to display these data, but these can't be transcluded using templates, like here: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Sub... ssions
What do you think about this feature? Is it worth of working on, or this isn't a big issue? Keep in mind that even if we have a lot of storage, these unneeded data make the dumps larger and processes that are parsing the XML files are slower
Hi.
I think this is the wrong approach to take. I agree that adding these revisions to the database is a silly thing to do. https://commons.wikimedia.org/wiki/Template:Toolserver is usually the example I use (currently at 44,816 revisions).
For pages such as this, there are a few reasons you might put this on a wiki as opposed to having a dynamic tool on an external service:
* historical data can be stored and tracked; * users can see the updated information in their watchlists; and * the data can be transcluded elsewhere on the wiki.
Depending on the bot and the page, points one and two can have more or less importance. For the third point (transclusion), I've always thought it would be neat to be able to transclude content from the Toolserver. I don't think it's been explored too much, but imagine something like:
{{#toolserver:~user/my_dynamic_tool.py}}
MediaWiki could take this parser tag, fetch the remote content, run it through the parser/sanitizer, and then cache the page. A purge of the MediaWiki page would update the data.
Or nowadays, I guess you could do something similar with jQuery.
This creates a reliance on an external tool to be up and running and might make page load a bit slower, but most of this data is non-critical. As long as it doesn't block page loading when the external service is down, it isn't a big deal.
I'm very hesitant to introduce page history truncation or manipulation. We generally work hard to preserve and protect the integrity of the page history and we use Special pages for dynamic data. For a lot of these bots and pages, I agree that we need to re-evaluate them and make smarter decisions about to present and update the data they provide, but I don't think changing MediaWiki's page history infrastructure is the best approach, given the other options available.
MZMcBride
Although for a different use case, I find myself facing a related problem with my bot on Commons.
https://commons.wikimedia.org/wiki/Commons:Auto-protected_files/wikipedia/zh
Those pages get updated periodically whenever a commons image starts or ends being used on a Main Page. Aside from the fact that this in particular could be a native feature[1], this page will get a lot of revisions.
Now also aside is that these old revisions are quite useless. Even if someone would make an edit in between, the bot will overwrite it. But I don't care so much for the wasted space, since it is relatively small waste.
The problem comes in when these pages need maintenance. I find myself periodically checking up on my bot generated pages to make sure I move them to /Archive_# and delete those and start clean. Because when the revision count reaches 2,500, it can no longer be deleted because of the limit we implemented.
So to keep them mobile and usable, I always fight the limit by moving it to a subpage (without a redirect) before it reaches that limit and delete it there and start a new page on the old name.
Would be nice if these kind of bot-generated pages could avoid that. And if in doing so we saving revision space, that's nice.
Other examples: * https://commons.wikimedia.org/wiki/Commons:Database_reports (and subpages) * https://commons.wikimedia.org/wiki/Commons:Auto-protected_files (and subpages)
-- Krinkle
On Tue, 28 Aug 2012 12:56:50 -0700, Krinkle krinklemail@gmail.com wrote:
Although for a different use case, I find myself facing a related problem with my bot on Commons.
https://commons.wikimedia.org/wiki/Commons:Auto-protected_files/wikipedia/zh
Those pages get updated periodically whenever a commons image starts or ends being used on a Main Page. Aside from the fact that this in particular could be a native feature[1], this page will get a lot of revisions.
Now also aside is that these old revisions are quite useless. Even if someone would make an edit in between, the bot will overwrite it. But I don't care so much for the wasted space, since it is relatively small waste.
The problem comes in when these pages need maintenance. I find myself periodically checking up on my bot generated pages to make sure I move them to /Archive_# and delete those and start clean. Because when the revision count reaches 2,500, it can no longer be deleted because of the limit we implemented.
We still need to actually fix that issue. We've got an RFC on the topic: https://www.mediawiki.org/wiki/Requests_for_comment/Page_deletion
The idea of deleted revisions being grouped together so that when you delete, recreate, then delete a page the revisions from the separate deletions are grouped separately rather than being lumped into a big pile sounds really nice too.
So to keep them mobile and usable, I always fight the limit by moving it to a subpage (without a redirect) before it reaches that limit and delete it there and start a new page on the old name.
Would be nice if these kind of bot-generated pages could avoid that. And if in doing so we saving revision space, that's nice.
Other examples:
subpages)
subpages)
-- Krinkle
wikitech-l@lists.wikimedia.org