Hello Fae,
Are you generating json somewhere or thiking to create an API for that
information?
I have some ideas that I started to develop last year, I will ping you in
private.
Best regards and thanks by helping us with the important information.
Rodrigo Padula
2015-02-26 14:01 GMT-03:00 Fæ <faewik(a)gmail.com>om>:
Now the run has been running a bit longer, it can be
seen[1] that the
real sink is the logging table from wikidata. Other projects are
pretty quick from a few minutes to Commons taking half an hour.
Wikidata takes 7 hours for each query. It's a big log!
I am testing out separating wikidata from the other queries, so this
might get the majority done much more quickly. Either way, all the
reports should be finished within about 3 days. After that the monthly
updates should be done within an afternoon. :-)
Links
1.
https://meta.wikimedia.org/wiki/Special:Contributions/Faebot
Fae
On 26 February 2015 at 06:02, MZMcBride <z(a)mzmcbride.com> wrote:
Fæ wrote:
Re:
https://meta.wikimedia.org/wiki/User:Faebot/thanks
I am glad the tables are useful, hopefully stimulating more positive
use of the thanks notifier by contributors.
The reports are updating *slowly*, currently at April 2014... This is
in part because of the WMFlabs outage yesterday, though in general any
report of the "logging" table is going to be slow (it is the largest
table on the wiki database). The first run-through will take several
days as it is going back through all of 2014. Once it is only
reporting on the previous month, I suspect it will finish monthly
updates within the first first day.
Hi.
Hmm, I'm not sure how you're measuring largest, but I imagine on most
wikis there are more rows in the revision table than there are in the
logging table. For example, on the English Wikipedia, there are
approximately 598,859,006 rows in the revision table and 62,731,285 rows
in the logging table. I suspect on most wikis, revision, text, and maybe
archive would typically be larger than logging, except in weird cases
such as loginwiki[_p]. And then of course there are the *links tables.
But
it depends on whether you're comparing size
on disk or number of rows.
Yes I got it wrong! :-)
You probably want to use logging_userindex
instead of logging. The former
is typically significantly faster due to the way we use table views.
That's how it works. The key line is
SELECT log_user_text, count(log_action)
FROM {}wiki_p.logging_userindex
Where {} iterates over wiki short names.
I have a bit of experience with database reports.
Off-hand I'd say it
should be possible to query all of this information in under an hour.
With
the index on logging[_userindex].log_action, even
a large table such as
logging shouldn't be too awful to query for this information. If you have
queries that are taking a very long time, we should probably investigate.
Maybe, I have experimented a little with ways of writing the query
(such as using log_user rather than log_user_text) but have not found
a magic trick yet. Remember I'm reporting all of 2014 first. After
this first run, the time taken is unlikely to be noticed by anyone, so
volunteer time vs. processing time will mean I'll probably get on with
other projects, rather than spending my wiki-time making this SQL
based report more efficient. Naturally, if someone wants to offer me
some funding, or bribes me by nominating my Commons RFA, I'll look
again. ;-)
Cheers,
Fae
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>