On the recent changes page the change in size that the edits make is shown for each edit. Could it be possible to get a similar overview for the edits of a certain user? The reason I am asking this is that the last few months it has happened a few times that my bot unwantingly removed a large part of a long page. I still don't know what caused it (my guess is that it's a problem in the internet connection causing only part of the data to be transferred), but if I had an overview like that, at least I would be able to do a check and quickly repair it where it has happened.
On Feb 6, 2008 9:26 PM, Andre Engels andreengels@gmail.com wrote:
On the recent changes page the change in size that the edits make is shown for each edit. Could it be possible to get a similar overview for the edits of a certain user?
The recentchanges table (used for Special:Recentchanges, watchlists etc) stores both the old and new page size, but the revision table (used for history pages and for Special:Contributions) only stores the new page size, so you can't calculate the change unless you look up the previous revision too. This would make Special:Contributions pretty expensive to run, so I doubt it's feasible.
2008/2/6, Stephen Bain stephen.bain@gmail.com:
The recentchanges table (used for Special:Recentchanges, watchlists etc) stores both the old and new page size, but the revision table (used for history pages and for Special:Contributions) only stores the new page size, so you can't calculate the change unless you look up the previous revision too. This would make Special:Contributions pretty expensive to run, so I doubt it's feasible.
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just only bot edits, like there now already are possibilities with only logged in/anonymous edits etcetera?
On 06/02/2008, Andre Engels andreengels@gmail.com wrote:
2008/2/6, Stephen Bain stephen.bain@gmail.com:
The recentchanges table (used for Special:Recentchanges, watchlists etc) stores both the old and new page size, but the revision table (used for history pages and for Special:Contributions) only stores the new page size, so you can't calculate the change unless you look up the previous revision too. This would make Special:Contributions pretty expensive to run, so I doubt it's feasible.
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just only bot edits, like there now already are possibilities with only logged in/anonymous edits etcetera?
That could be useful. It wouldn't give much of a history, since edits would fall of the bottom of the recentchanges table reasonably soon (how soon is a configuration setting, I think, but it's for the whole table, rather than per user), but it would show particularly recent changes.
On Feb 7, 2008 12:50 AM, Thomas Dalton thomas.dalton@gmail.com wrote:
That could be useful. It wouldn't give much of a history, since edits would fall of the bottom of the recentchanges table reasonably soon (how soon is a configuration setting, I think, but it's for the whole table, rather than per user), but it would show particularly recent changes.
It defaults to a week but I think all the Wikimedia sites have it set to a month.
A short period of time, but still useful for error-checking purposes (a fast response activity).
Thomas Dalton wrote:
That could be useful. It wouldn't give much of a history, since edits would fall of the bottom of the recentchanges table reasonably soon (how soon is a configuration setting, I think, but it's for the whole table, rather than per user), but it would show particularly recent changes.
In fact, i'd have expected /index.php?title=Special:Recentchanges&hideanons=1&hideminor=1&hidebots=0&hideliu=1&hidemyself=1 to do so :P but a) Bots are logged-in users. b) Rob Church shows anons if i hide users.
I sometimes find the options a bit limited. I may want to show only minor edits, or my non-minor edits. The query is not more complex, just changing some ANDs by ORs, or skipping the wiki defaults. I think the problem lies with implicit versus explicit commands. It now takes anything not explicity told, as its assuming the not told as "you told me to do the default", whereas if i asked to "hide users and show myself" i mean "hide all users but myself" DWIM!
On Feb 7, 2008 4:10 PM, Platonides Platonides@gmail.com wrote:
I sometimes find the options a bit limited. I may want to show only minor edits, or my non-minor edits. The query is not more complex, just changing some ANDs by ORs, or skipping the wiki defaults.
Actually, switching ANDs with ORs can have dramatic performance implications, at least in MySQL. (Before 5.0ish, ORed conditions can AFAIK never be satisfied by indexes, only by scanning values.) And too much flexibility can be a big problem too. Too much flexibility generally means you can't satisfy the conditions using B-trees, or at least MySQL can't, and so you have to scan rows. In particular, specifying a condition with fewer hits than the requested number of rows means scanning the entire table. (I guess that's a reason to not use the revision table for this!)
On Feb 6, 2008 5:00 PM, Andre Engels andreengels@gmail.com wrote:
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just only bot edits, like there now already are possibilities with only logged in/anonymous edits etcetera?
To me, this seems to be a good idea. Why not to log a bug for that? :)
2008/2/6, Huji huji.huji@gmail.com:
On Feb 6, 2008 5:00 PM, Andre Engels andreengels@gmail.com wrote:
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just only bot edits, like there now already are possibilities with only logged in/anonymous edits etcetera?
To me, this seems to be a good idea. Why not to log a bug for that? :)
I'm afraid that that would just mean having the thing open for a few months or years. Here I had hoped to get a "We will do this" or "We won't do this" instead.
"Andre Engels" andreengels@gmail.com writes:
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just
[[Special:Contributions/Jusoneuser]] should do the same, if I understand you correctly.
only bot edits, like there now already are possibilities with only logged in/anonymous edits etcetera?
That change should be quite easy to implement.
On Feb 6, 2008 4:43 PM, Anders Wegge Jakobsen wegge@wegge.dk wrote:
"Andre Engels" andreengels@gmail.com writes:
Just thinking out loud here: Might it be possible to have a Special:Recentchanges with only edits by a certain user, or even just
[[Special:Contributions/Jusoneuser]] should do the same, if I understand you correctly.
The issue is precisely that it does not. There is information stored in the recentchanges table, such as patrolled status and size changes, which is not stored in the revision table, and which therefore is not available to Contributions, history, etc. without some weird monkey business.
You know, exactly why do we have a separate recentchanges table at all? It seems as though the few extra tidbits that aren't in the revision table could just be added there, and then that could be unioned with the log table to produce Special:Recentchanges. (This would also make it possible to keep all edits' IP addresses forever, which is probably a much more sensible default than deleting them after a week.)
Simetrical wrote:
You know, exactly why do we have a separate recentchanges table at all? It seems as though the few extra tidbits that aren't in the revision table could just be added there, and then that could be unioned with the log table to produce Special:Recentchanges.
Hmm... that's a good question, actually.
(This would also make it possible to keep all edits' IP addresses forever, which is probably a much more sensible default than deleting them after a week.)
There could be privacy concerns over that, though; certainly I'd assume there's a reason why purge checkuser records after a while.
(I'd have expected the Foundation privacy policy to say something about that, but strangely enough it doesn't seem to; the only statement about retention times I can find is that Apache access logs are "normally discarded after about two weeks.")
In fact, I wonder if we really need to record IP addresses of logged-in users in the recentchanges table at all -- after all, the CheckUser extension has its own tables for that data.
It seems to be used (only?) for retroactive autoblocks, but that only needs the last IP used by each user, which could just as well be stored in the user table (or in a separate table altogether). Besides, we already have a config option to disable it ($wgPutIPinRC), so I don't suppose it can be that critical for anything.
On Feb 6, 2008 8:27 PM, Ilmari Karonen nospam@vyznev.net wrote:
There could be privacy concerns over that, though; certainly I'd assume there's a reason why purge checkuser records after a while.
That's why it would be an option, perhaps disabled on Wikimedia. The reasonable default would be to log them permanently, because that's what practically all web software does, and what all admins and users expect (to the extent they know about stuff like IP addresses).
(I'd have expected the Foundation privacy policy to say something about that, but strangely enough it doesn't seem to; the only statement about retention times I can find is that Apache access logs are "normally discarded after about two weeks.")
Which tells you how old that is. How long has it been since we kept Apache access logs? We have some Squid sampling now, I think, but last I heard only a one-tenth anonymized sample was stored anywhere.
In fact, I wonder if we really need to record IP addresses of logged-in users in the recentchanges table at all -- after all, the CheckUser extension has its own tables for that data.
Currently, a default installation of MediaWiki allows IP addresses to be checked by anyone with database access. That's intentional, I'm pretty sure. I assume it used to be the way things were done on Wikimedia, in fact, before Checkuser.
On Feb 7, 2008 3:16 AM, Simetrical Simetrical+wikilist@gmail.com wrote:
You know, exactly why do we have a separate recentchanges table at all?
The thing is, recentchanges boosts things a lot. For example, if I edit a page and change its size from 2KB to 4KB, then you edit it to 6KB, and then an admin deletes the middle version, the wiki will be able to show them in a matter of milliseconds, because every step (and every size) is stored in recentchanges table. However, if the wiki was supposed to generate the recentchanges using revision and logs table, it had to get the rev_id of the 6KB edition and then find the previous revision (which means it should have search both "revision" and "archive" tables for it, because the previous revision could be deleted, which is a more complicated task, with more load on the database backend. Also, it had to mix the logs and edit events, and then sort all of them using their timestamp, which is again a pain in ass.
Recentchanges boosts all these actions, by distributing the load. Instead of having high load everytime recent changes should get calculated (when visiting Special:Recentchange with your proposed function), a little load is put on the server when every event (edit/log) happens, to add a row to recentchanges table. This distribution of the load helps the wiki work better.
Huji
Huji schreef:
On Feb 7, 2008 3:16 AM, Simetrical Simetrical+wikilist@gmail.com wrote:
You know, exactly why do we have a separate recentchanges table at all?
The thing is, recentchanges boosts things a lot. For example, if I edit a page and change its size from 2KB to 4KB, then you edit it to 6KB, and then an admin deletes the middle version, the wiki will be able to show them in a matter of milliseconds, because every step (and every size) is stored in recentchanges table. However, if the wiki was supposed to generate the recentchanges using revision and logs table, it had to get the rev_id of the 6KB edition and then find the previous revision (which means it should have search both "revision" and "archive" tables for it, because the previous revision could be deleted, which is a more complicated task, with more load on the database backend. Also, it had to mix the logs and edit events, and then sort all of them using their timestamp, which is again a pain in ass.
Note that: 1. Deleted revisions and revisions of deleted pages don't show up in RC 2. The elaborate query you're talking about wouldn't be necessary, as we'd be adding rev_old_len and rev_new_len and all kinds of other fancy flags the recentchanges table currently has
Roan Kattouw (Catrope)
On Feb 7, 2008 6:44 AM, Roan Kattouw roan.kattouw@home.nl wrote:
- The elaborate query you're talking about wouldn't be necessary, as
we'd be adding rev_old_len and rev_new_len and all kinds of other fancy flags the recentchanges table currently has
Precisely.
the last few months it has happened a few times that my bot unwantingly removed a large part of a long page. I still don't know
If it's your bot, presumably you could get it to do a diff between the before and after of its change, and write that to a log somewhere...?
Steve
On 2/6/08, Stephen Bain stephen.bain@gmail.com wrote:
The recentchanges table (used for Special:Recentchanges, watchlists etc) stores both the old and new page size, but the revision table (used for history pages and for Special:Contributions) only stores the new page size, so you can't calculate the change unless you look up the previous revision too. This would make Special:Contributions pretty expensive to run, so I doubt it's feasible.
On 2/6/08, Steve Bennett stevagewp@gmail.com wrote:
If it's your bot, presumably you could get it to do a diff between the before and after of its change, and write that to a log somewhere...?
Could log it at the end of the bot's edit summary, then it would appear in Special:Contributions where Andre wants it. ;-)
—C.W.
2008/2/6, Charlotte Webb charlottethewebb@gmail.com:
Could log it at the end of the bot's edit summary, then it would appear in Special:Contributions where Andre wants it. ;-)
I don't think that that would work - it seems extremely unlikely to me that the bot would 'knowingly' (insofar as you could use this word) make such a mistake. More probably it is getting an incorrect export page or maybe mis-parsing a correct one, so it will have the wrong idea about the previous size of the page and not report removing something even though it did.
On Feb 6, 2008 5:24 PM, Andre Engels andreengels@gmail.com wrote:
2008/2/6, Charlotte Webb charlottethewebb@gmail.com:
Could log it at the end of the bot's edit summary, then it would appear in Special:Contributions where Andre wants it. ;-)
I don't think that that would work - it seems extremely unlikely to me that the bot would 'knowingly' (insofar as you could use this word) make such a mistake. More probably it is getting an incorrect export page or maybe mis-parsing a correct one, so it will have the wrong idea about the previous size of the page and not report removing something even though it did.
-- Andre Engels, andreengels@gmail.com ICQ: 6260644 -- Skype: a_engels
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
The problems that pywikipediabot caused for some weeks were caused by premature connection aborts on posting a page. Due to Python's hashtable implementation, values with as key 'wpTextbox1' get sorted as last item. And even though the framework sends a Content-Length header, PHP does not respect this. So if a connection abort occurs on post, it will just continue with the malformed data. I fixed this by explicitly sending wpEditToken as last item, but maybe it should be fixed in PHP?
Bryan
Rather than open a bug on bugzilla I thought I would just mention the interwiki list at meta needs regenerating again please. The old list had a few typos in which give broken external links on Wikipedia which ain't a good thing (let alone on all the other projects).
Thanks
BozMo
wikitech-l@lists.wikimedia.org