Exactly how resource-intensitive would be a bunch of queries that got the editcount from every wiki that you find in toolserver.wiki (821 of them). Basically it'll be a PHP code, which cycles through all the rows in toolserver.wiki, then connects to each server/db in turn (Ill ORDER BY server so I don't have to connect-disconnect every time), and runs SELECT user_editcount,user_registration FROM mw_user WHERE user_name="foo";
Thanks, Manish*Earth* http://en.wikipedia.org/wiki/User:Manishearth*Talk* • Stalkhttp://en.wikipedia.org/wiki/Special:Contributions/Manishearth
Op 16-4-2011 14:36, Manish Goregaokar schreef:
Exactly how resource-intensitive would be a bunch of queries that got the editcount from every wiki that you find in toolserver.wiki (821 of them). Basically it'll be a PHP code, which cycles through all the rows in toolserver.wiki, then connects to each server/db in turn (Ill ORDER BY server so I don't have to connect-disconnect every time), and runs SELECT user_editcount,user_registration FROM mw_user WHERE user_name="foo";
http://commons.wikimedia.org/w/index.php?title=Special%3ACentralAuth&tar... ?
Maarten
See also, http://toolserver.org/~vvv/sulutil.php?user=Manishearth
I don't know how that's implemented, but whatever the method is probably isn't causing problems because it's a very widely used tool. (it's linked from the footer of contribs and other interface pages for user/IPs on several major wikis including enwp)
-Jeremy
On Sat, Apr 16, 2011 at 1:37 PM, Maarten Dammers maarten@mdammers.nl wrote:
Op 16-4-2011 14:36, Manish Goregaokar schreef:
Exactly how resource-intensitive would be a bunch of queries that got the editcount from every wiki that you find in toolserver.wiki (821 of them). Basically it'll be a PHP code, which cycles through all the rows in toolserver.wiki, then connects to each server/db in turn (Ill ORDER BY server so I don't have to connect-disconnect every time), and runs SELECT user_editcount,user_registration FROM mw_user WHERE user_name="foo";
http://commons.wikimedia.org/w/index.php?title=Special%3ACentralAuth&tar... ?
Maarten
On Sat, Apr 16, 2011 at 5:59 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
See also, http://toolserver.org/~vvv/sulutil.php?user=Manishearth
I don't know how that's implemented, but whatever the method is probably isn't causing problems because it's a very widely used tool. (it's linked from the footer of contribs and other interface pages for user/IPs on several major wikis including enwp)
-Jeremy
Iteration over all databases with an appropriate query.
--vvv
Well, I would have used sulutil, except I want to bundle it with some other things, so it looks like i'll do my original query. Thanks for the help,
Manish*Earth* http://en.wikipedia.org/wiki/User:Manishearth*Talk* • Stalkhttp://en.wikipedia.org/wiki/Special:Contributions/Manishearth
On Sat, Apr 16, 2011 at 10:42 PM, Victor Vasiliev vasilvv@gmail.com wrote:
On Sat, Apr 16, 2011 at 5:59 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
See also, http://toolserver.org/~vvv/sulutil.php?user=Manishearth
I don't know how that's implemented, but whatever the method is probably isn't causing problems because it's a very widely used tool. (it's linked from the footer of contribs and other interface pages for user/IPs on several major wikis including enwp)
-Jeremy
Iteration over all databases with an appropriate query.
--vvv
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
An important key in every allwiki-iteration is imho the re-use of connections.
In php this is quite easy actually.
$connections = array(); $connections['1'] = mysql_connect( ... ); $connections['2'] = mysql_connect( ... ); etc.
then, when you're going to query toolserver.wiki sql db table, in the loop that follows afterwards, something like
foreach ( $rows as $row ) { $currentSqlCon = isset( $connections[ $row->server ] ) ? $connections[ $row->server ] : false; if ( $currentSqlCon ) [ $query = mysql_query( $my_query, $currentSqlCon ); .... }
That way you won't have to make 100s of connections.
From what I remember both Luxo's, vvv's and my own do it like this for global tools.
Ofcourse you can always look at own source code in svn and/or my looking at our php files directly from your toolserver account.
-- Krinkle
An important key in every allwiki-iteration is imho the re-use of connections.
Yeah, I already do that. I've ORDER BY server'd , and I only close and reopen when the server changes. It's quite fast.
What's the quick way to get user/page contribution data? I tried using the page table through putty, but it hanged. The revision table works, but I can't get the page title from it. For some reason, the column rev_page is populated with zeroes. I am able to get the username, revid, timestamp, and summary, but I can't fetch the page name/pageid. ANy help?
Manish*Earth* http://en.wikipedia.org/wiki/User:Manishearth*Talk* • Stalkhttp://en.wikipedia.org/wiki/Special:Contributions/Manishearth
*bump* Manish*Earth* http://en.wikipedia.org/wiki/User:Manishearth*Talk* • Stalkhttp://en.wikipedia.org/wiki/Special:Contributions/Manishearth
On Tue, Apr 19, 2011 at 2:46 PM, Manish Goregaokar manishsmail@gmail.comwrote:
An important key in every allwiki-iteration is imho the re-use of
connections.
Yeah, I already do that. I've ORDER BY server'd , and I only close and reopen when the server changes. It's quite fast.
What's the quick way to get user/page contribution data? I tried using the page table through putty, but it hanged. The revision table works, but I can't get the page title from it. For some reason, the column rev_page is populated with zeroes. I am able to get the username, revid, timestamp, and summary, but I can't fetch the page name/pageid. ANy help?
Manish*Earth* http://en.wikipedia.org/wiki/User:Manishearth*Talk* • Stalk http://en.wikipedia.org/wiki/Special:Contributions/Manishearth
Manish Goregaokar wrote:
An important key in every allwiki-iteration is imho the re-use of connections.
Yeah, I already do that. I've ORDER BY server'd , and I only close and reopen when the server changes. It's quite fast.
What's the quick way to get user/page contribution data? I tried using the page table through putty, but it hanged. The revision table works, but I can't get the page title from it. For some reason, the column rev_page is populated with zeroes. I am able to get the username, revid, timestamp, and summary, but I can't fetch the page name/pageid. ANy help?
The quick way to get the user's edit count is to use the stored value from user.user_editcount.
If you're trying to use the page table to get edit contribution data, you're going to have trouble, because all edits (contributions) are in the revision table.
If you want to use the revision table and fetch page titles, do a join on rev_page = page_id. Then you can select page_namespace and page_title (or the namespace name instead of an integer from the toolserver.namespacename table).
More info about accessing and using the replicated databases is available here: https://wiki.toolserver.org/view/Database_access.
MZMcBride
P.S. Plaintext e-mails, please. :-)
Yeah, I see now. Previously, I'd just run "select * from revision limit 1" to see the lay of the land. I got some columns with rev_page=0, which confused me. Now I realized that I was looking at the ancient part of the table, which followed the old schema. I got it to work now. Thanks for the help, everyone! -ManishEarth
On Sat, Apr 23, 2011 at 12:09 PM, MZMcBride z@mzmcbride.com wrote:
Manish Goregaokar wrote:
An important key in every allwiki-iteration is imho the re-use of connections.
Yeah, I already do that. I've ORDER BY server'd , and I only close and
reopen
when the server changes. It's quite fast.
What's the quick way to get user/page contribution data? I tried using
the
page table through putty, but it hanged. The revision table works, but I
can't
get the page title from it. For some reason, the column rev_page is
populated
with zeroes. I am able to get the username, revid, timestamp, and
summary, but
I can't fetch the page name/pageid. ANy help?
The quick way to get the user's edit count is to use the stored value from user.user_editcount.
If you're trying to use the page table to get edit contribution data, you're going to have trouble, because all edits (contributions) are in the revision table.
If you want to use the revision table and fetch page titles, do a join on rev_page = page_id. Then you can select page_namespace and page_title (or the namespace name instead of an integer from the toolserver.namespacename table).
More info about accessing and using the replicated databases is available here: https://wiki.toolserver.org/view/Database_access.
MZMcBride
P.S. Plaintext e-mails, please. :-)
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
If it's not a time-critical figure, then user:emijrp does something like this at http://meta.wikimedia.org/wiki/User:Emijrp/List_of_Wikimedians_by_number_of_... . On 16/04/2011 13:36, Manish Goregaokar wrote:
Exactly how resource-intensitive would be a bunch of queries that got the editcount from every wiki that you find in toolserver.wiki (821 of them). Basically it'll be a PHP code, which cycles through all the rows in toolserver.wiki, then connects to each server/db in turn (Ill ORDER BY server so I don't have to connect-disconnect every time), and runs SELECT user_editcount,user_registration FROM mw_user WHERE user_name="foo";
Thanks, Manish/Earth/ http://en.wikipedia.org/wiki/User:Manishearth^*Talk* • Stalk http://en.wikipedia.org/wiki/Special:Contributions/Manishearth
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
toolserver-l@lists.wikimedia.org