Hello Fae,
Are you generating json somewhere or thiking to create an API for that information?
I have some ideas that I started to develop last year, I will ping you in private.
Best regards and thanks by helping us with the important information.
Rodrigo Padula
2015-02-26 14:01 GMT-03:00 Fæ faewik@gmail.com:
Now the run has been running a bit longer, it can be seen[1] that the real sink is the logging table from wikidata. Other projects are pretty quick from a few minutes to Commons taking half an hour. Wikidata takes 7 hours for each query. It's a big log!
I am testing out separating wikidata from the other queries, so this might get the majority done much more quickly. Either way, all the reports should be finished within about 3 days. After that the monthly updates should be done within an afternoon. :-)
Links
Fae
On 26 February 2015 at 06:02, MZMcBride z@mzmcbride.com wrote:
Fæ wrote:
Re: https://meta.wikimedia.org/wiki/User:Faebot/thanks
I am glad the tables are useful, hopefully stimulating more positive use of the thanks notifier by contributors.
The reports are updating *slowly*, currently at April 2014... This is in part because of the WMFlabs outage yesterday, though in general any report of the "logging" table is going to be slow (it is the largest table on the wiki database). The first run-through will take several days as it is going back through all of 2014. Once it is only reporting on the previous month, I suspect it will finish monthly updates within the first first day.
Hi.
Hmm, I'm not sure how you're measuring largest, but I imagine on most wikis there are more rows in the revision table than there are in the logging table. For example, on the English Wikipedia, there are approximately 598,859,006 rows in the revision table and 62,731,285 rows in the logging table. I suspect on most wikis, revision, text, and maybe archive would typically be larger than logging, except in weird cases such as loginwiki[_p]. And then of course there are the *links tables.
But
it depends on whether you're comparing size on disk or number of rows.
Yes I got it wrong! :-)
You probably want to use logging_userindex instead of logging. The former is typically significantly faster due to the way we use table views.
That's how it works. The key line is SELECT log_user_text, count(log_action) FROM {}wiki_p.logging_userindex
Where {} iterates over wiki short names.
I have a bit of experience with database reports. Off-hand I'd say it should be possible to query all of this information in under an hour.
With
the index on logging[_userindex].log_action, even a large table such as logging shouldn't be too awful to query for this information. If you have queries that are taking a very long time, we should probably investigate.
Maybe, I have experimented a little with ways of writing the query (such as using log_user rather than log_user_text) but have not found a magic trick yet. Remember I'm reporting all of 2014 first. After this first run, the time taken is unlikely to be noticed by anyone, so volunteer time vs. processing time will mean I'll probably get on with other projects, rather than spending my wiki-time making this SQL based report more efficient. Naturally, if someone wants to offer me some funding, or bribes me by nominating my Commons RFA, I'll look again. ;-)
Cheers, Fae
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe