OK, now that I've looked into the code some more (looking at 1.9.3). I see that I have a somewhat different problem.
Here's what's going on. I have some wikis where I regularly update page content from an external source using a script that generates xml like what the exporter generates. These get loaded via importdump.php and are marked in the page histories as revisions...which is working fine.
The problem is that I want to be able to see in recentchanges when a human has edited the pages. So, I want to have the script-generated pages marked as bot edits and let the human changes show through. I can mark the script-generated pages as being from a bot by flagging them in the recentchanges...but the human ones still aren't there because the limit on 5000 entries in recentchanges when rebuildrecentchanges.php runs. My scripts run daily and they typically affect more than 5000 pages at a time.
I'm concerned that just raising or eliminating the limit will make things unacceptably slow and make the table too large. I'm wondering about hacking the script to separate passes for bots and non-bots. Thoughts?
Jim
p.s. I decided to sent this just to mediawiki-l and not wikitech, since presumably the wikipedias don't have this problem.
On May 20, 2007, at 5:26 PM, Jim Hu wrote:
Yes, I know that.. I should have said recentchanges instead of revisions.
But they're not flagged in the recentchanges table. The problem is that they show up whether or not one uses hide bots in Special:Recentchanges. This is fixable by manually updating recentchanges with,
update recentchanges set rc_bot=1 where rc_user=<bot_user_id>;
but I was hoping that importDump would do this automatically based on recognizing the username. However, upon reflection, I believe that importDump doesn't do anything directly to recentchanges - I usually have to rebuild to get changes to show up. Thus, if there is a fix needed, it should be in rebuildrecentchanges.php or somewhere else.
JH
On May 20, 2007, at 3:29 PM, Aaron Schulz wrote:
Hmm, bot edits are only observed as "bot edits" in the recentchanges table. Edits by bots in the revision table are not actually flagged as bot edits.
<div><FONT color=#3333cc>-Aaron Schulz</FONT></div></html>
From: Jim Hu jimhu@tamu.edu Reply-To: Wikimedia developers wikitech-l@lists.wikimedia.org To: Wikimedia developers <wikitech- l@lists.wikimedia.org>,MediaWiki announcements and site admin listmediawiki-l@lists.wikimedia.org Subject: [Wikitech-l] importDump and setting rc_bot Date: Sun, 20 May 2007 13:22:07 -0400
As far as I can tell, importDump does not mark imported pages as coming from a bot, even when the user is a bot in the User table. Is that correct? Is there a way to indicate a bot revision in the xml, or do I need to do this in the db afterward? ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Make every IM count. Download Messenger and join the i’m Initiative now. It’s free. http://im.live.com/messenger/im/home/? source=TAGHM_MAY07
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
OK, now that I've looked into the code some more (looking at 1.9.3). I see that I have a somewhat different problem.
(Please don't post in three threads on two different lists. Stay in one place, it's easier to keep track of you. :)
Did you check if this fix fixes your issue?
http://bugzilla.wikimedia.org/show_bug.cgi?id=9860
- -- brion vibber (brion @ wikimedia.org)
On May 21, 2007, at 2:05 PM, Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
OK, now that I've looked into the code some more (looking at 1.9.3). I see that I have a somewhat different problem.
(Please don't post in three threads on two different lists. Stay in one place, it's easier to keep track of you. :)
Sorry! <groveling apology> I'll try to stay on thread and post to only one in the future.
Did you check if this fix fixes your issue?
I'll look at it harder at how this affects the import, but I don't think it will affect the rebuildrecentchanges.php/ rebuildrecentchanges.inc mechanisms, since they don't seem to call RecentChange.php. I have to rebuild a lot, since the job queue falls waaaay behind when I run my scripts.
Jim
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGUd98wRnhpk1wk44RAiY/AJ0RABX3CbiQc/oM8drARwgheb8RqACfXYpi tGRglSBbAAWw6OJD4+GhkM4= =PkG7 -----END PGP SIGNATURE-----
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
I'll look at it harder at how this affects the import, but I don't think it will affect the rebuildrecentchanges.php/ rebuildrecentchanges.inc mechanisms, since they don't seem to call RecentChange.php. I have to rebuild a lot, since the job queue falls waaaay behind when I run my scripts.
Ahhhh, I see -- this has nothing to do with imports at all, it's the Recentchanges rebuild script.
Running the rebuild script will remove bot flags from all recent changes entries.
In theory you could probably approximate it based on bot group membership or something, but it would complicate the query or require a second pass (and of course would not be exact and may or may not be what you want anyway.)
- -- brion vibber (brion @ wikimedia.org)
On May 21, 2007, at 3:55 PM, Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Hu wrote:
I'll look at it harder at how this affects the import, but I don't think it will affect the rebuildrecentchanges.php/ rebuildrecentchanges.inc mechanisms, since they don't seem to call RecentChange.php. I have to rebuild a lot, since the job queue falls waaaay behind when I run my scripts.
Ahhhh, I see -- this has nothing to do with imports at all, it's the Recentchanges rebuild script.
Indeed! Until I dove deeper into the code, I thought it was the import... but yes, it's the rebuild that's the problem (which is why I tried to change the name of the thread!). The first clue was when I updated rc_bot manually and it only affected 5000 rows. That's odd, I thought...
Running the rebuild script will remove bot flags from all recent changes entries.
I found this bug on the bugzilla, complete with a patch (unreviewed?)
http://bugzilla.wikimedia.org/show_bug.cgi?id=9166
but I don't think it will do what I need.
In theory you could probably approximate it based on bot group membership or something, but it would complicate the query or require a second pass (and of course would not be exact and may or may not be what you want anyway.)
I'm thinking that for my specific case I should just customize my own copy to do a second pass or filter them out. Instead of bot group membership, I'll look at the rc_comment being "Automated import of articles". Probably not useful for wikipedia or general use, but there are some other folks I know who could use this.
I was also going to put a bug into bugzilla about ordering, but I see in SVN that you fixed it 3 weeks and 4 days ago.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGUfkuwRnhpk1wk44RAqx9AKC04G2/OrWEow+73s04IWGe8e3EjQCgrNZN Q1KAfXcF0uCMgmQxV6tBDKE= =zHhp -----END PGP SIGNATURE-----
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
Jim Hu wrote:
The problem is that I want to be able to see in recentchanges when a human has edited the pages. So, I want to have the script-generated pages marked as bot edits and let the human changes show through. I can mark the script-generated pages as being from a bot by flagging them in the recentchanges...but the human ones still aren't there because the limit on 5000 entries in recentchanges when rebuildrecentchanges.php runs. My scripts run daily and they typically affect more than 5000 pages at a time.
I'm concerned that just raising or eliminating the limit will make things unacceptably slow and make the table too large. I'm wondering about hacking the script to separate passes for bots and non-bots. Thoughts?
What about making your imports not appearing on RecentChanges? Seems the problem you need to solve.
mediawiki-l@lists.wikimedia.org