Hey all,
I just got a warning from Ops that our log table is growing extremely fast. One write up by this is here:
https://bugzilla.wikimedia.org/show_bug.cgi?id=47415
Basically, a vast majority of edits on Wikidata are written to the log table as they are autopatrolled. And since we have a lot of edits, this makes the table grow very very quickly.
We would like to: * stop logging so many edits * drop those logs that are already there about patrolling
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
Cheers, Denny
Denny Vrandečić, 22/04/2013 18:35:
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
What change? I don't understand from your email what you're actually going to do. This RC patrolling setup is quite common on several projects.
Nemo
On Mon, Apr 22, 2013 at 7:31 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Denny Vrandečić, 22/04/2013 18:35:
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
What change? I don't understand from your email what you're actually going to do. This RC patrolling setup is quite common on several projects.
(Denny please clarify) it sounds like: * stop logging autopatrolling done by bots and * delete those existing entries from the log
-Jeremy
A big feature for botuseres would be the possibility to change everythingon an item at once (like descriptions, labels and interwikilinks yet but with claims, references and qualifiers too). This would decrease the bot edits by a factor of 2 (as most of the bots add a claim and then a reference).
Sk!d
On Mon, Apr 22, 2013 at 9:59 PM, Jeremy Baron jeremy@tuxmachine.com wrote:
On Mon, Apr 22, 2013 at 7:31 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Denny Vrandečić, 22/04/2013 18:35:
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
What change? I don't understand from your email what you're actually
going
to do. This RC patrolling setup is quite common on several projects.
(Denny please clarify) it sounds like:
- stop logging autopatrolling done by bots and
- delete those existing entries from the log
-Jeremy
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 22/04/13 21:59, Jeremy Baron wrote:
On Mon, Apr 22, 2013 at 7:31 PM, Federico Leva (Nemo) wrote:
Denny Vrandečić, 22/04/2013 18:35:
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
What change? I don't understand from your email what you're actually going to do. This RC patrolling setup is quite common on several projects.
(Denny please clarify) it sounds like:
- stop logging autopatrolling done by bots and
- delete those existing entries from the log
-Jeremy
That a fact created by a bot was autopatrolled seems obvious but not logging it, and thus creating a special case of "this user was a bot at the time he edited this page", looks wrong.
Please pardon the non-tech person, as I may be asking a question with obvious answers, but what, exactly, is the problem here? Storage space is cheap and logs are text, which takes up very little space...
Sven
On Apr 22, 2013, at 6:19 PM, Platonides platonides@gmail.com wrote:
On 22/04/13 21:59, Jeremy Baron wrote:
On Mon, Apr 22, 2013 at 7:31 PM, Federico Leva (Nemo) wrote:
Denny Vrandečić, 22/04/2013 18:35:
We want to understand how that influences your workflows and what we can do about that. Please speak up if this change would be an issue.
What change? I don't understand from your email what you're actually going to do. This RC patrolling setup is quite common on several projects.
(Denny please clarify) it sounds like:
- stop logging autopatrolling done by bots and
- delete those existing entries from the log
-Jeremy
That a fact created by a bot was autopatrolled seems obvious but not logging it, and thus creating a special case of "this user was a bot at the time he edited this page", looks wrong.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Sven, 23/04/2013 02:02:
Please pardon the non-tech person, as I may be asking a question with obvious answers, but what, exactly, is the problem here? Storage space is cheap and logs are text, which takes up very little space...
I suppose it's related to what below.
Nemo
-------- Messaggio originale -------- Oggetto: [Xmldatadumps-l] wikidatawiki -- toooo many edits Data: Tue, 23 Apr 2013 19:31:00 +0300 Mittente: Ariel T. Glenn Organizzazione: Wikimedia Foundation A: Wikipedia Xmldatadumps-l
Hello dumps users and developers,
You may have noticed that the wikidata pages-logging xml dump step has taken days for the last couple of runs. In fact for the most recent run, it did not complete properly, as the database handling the query was upgraded in the middle to mariadb.
So the short version is, if you are using that file, go get a new copy: http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages...
If I don't have a patch in by next run, I have a workaround I will run by hand that takes 2 hours or less, as opposed to 4 days.
The long version is that the pages-logging file is already about half the size of en wp's table, and that the number of edits per minute is much larger, see: https://wikipulse.herokuapp.com/ There's a lot of deletion and a lot of churn too due to the dispatch mechanism. Also, they apparently have RCPatrol enabled and a pile of bots, which means that the log consists of 99% entries 'bot X editing Y marked it as autopatrolled'. These things in combo turn out to be the perfect storm for my simple select query, causing it to start at normal speed and then get ever slower. I suppose in another couple months it would take so long to run it would never finish...
Ariel
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
There is also no need to log that an edit, which is marked as autopatrolled in the edit table, has been patrolled.
For edits, which are not autopatrolled, it makes sense to log by whom and when it was patrolled, but for an autopatrolled edit that is kinda useless.
Getting rid of this would already eliminate the vast majority of log entries.
2013/4/23 Federico Leva (Nemo) nemowiki@gmail.com
Sven, 23/04/2013 02:02:
Please pardon the non-tech person, as I may be asking a question with
obvious answers, but what, exactly, is the problem here? Storage space is cheap and logs are text, which takes up very little space...
I suppose it's related to what below.
Nemo
-------- Messaggio originale -------- Oggetto: [Xmldatadumps-l] wikidatawiki -- toooo many edits Data: Tue, 23 Apr 2013 19:31:00 +0300 Mittente: Ariel T. Glenn Organizzazione: Wikimedia Foundation A: Wikipedia Xmldatadumps-l
Hello dumps users and developers,
You may have noticed that the wikidata pages-logging xml dump step has taken days for the last couple of runs. In fact for the most recent run, it did not complete properly, as the database handling the query was upgraded in the middle to mariadb.
So the short version is, if you are using that file, go get a new copy: http://dumps.wikimedia.org/**wikidatawiki/20130417/** wikidatawiki-20130417-pages-**logging.xml.gzhttp://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-logging.xml.gz
If I don't have a patch in by next run, I have a workaround I will run by hand that takes 2 hours or less, as opposed to 4 days.
The long version is that the pages-logging file is already about half the size of en wp's table, and that the number of edits per minute is much larger, see: https://wikipulse.herokuapp.**com/ https://wikipulse.herokuapp.com/ There's a lot of deletion and a lot of churn too due to the dispatch mechanism. Also, they apparently have RCPatrol enabled and a pile of bots, which means that the log consists of 99% entries 'bot X editing Y marked it as autopatrolled'. These things in combo turn out to be the perfect storm for my simple select query, causing it to start at normal speed and then get ever slower. I suppose in another couple months it would take so long to run it would never finish...
Ariel
______________________________**_________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.**wikimedia.org Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**lhttps://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
______________________________**_________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l
Denny Vrandečić, 24/04/2013 17:05:
There is also no need to log that an edit, which is marked as autopatrolled in the edit table,
There is no such a thing. :) However, we already have silent (unlogged) patrolling by rollback.
Nemo
has been patrolled.
For edits, which are not autopatrolled, it makes sense to log by whom and when it was patrolled, but for an autopatrolled edit that is kinda useless.
Getting rid of this would already eliminate the vast majority of log entries.
2013/4/23 Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com>
Sven, 23/04/2013 02:02: Please pardon the non-tech person, as I may be asking a question with obvious answers, but what, exactly, is the problem here? Storage space is cheap and logs are text, which takes up very little space... I suppose it's related to what below. Nemo -------- Messaggio originale -------- Oggetto: [Xmldatadumps-l] wikidatawiki -- toooo many edits Data: Tue, 23 Apr 2013 19:31:00 +0300 Mittente: Ariel T. Glenn Organizzazione: Wikimedia Foundation A: Wikipedia Xmldatadumps-l Hello dumps users and developers, You may have noticed that the wikidata pages-logging xml dump step has taken days for the last couple of runs. In fact for the most recent run, it did not complete properly, as the database handling the query was upgraded in the middle to mariadb. So the short version is, if you are using that file, go get a new copy: http://dumps.wikimedia.org/__wikidatawiki/20130417/__wikidatawiki-20130417-pages-__logging.xml.gz <http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-logging.xml.gz> If I don't have a patch in by next run, I have a workaround I will run by hand that takes 2 hours or less, as opposed to 4 days. The long version is that the pages-logging file is already about half the size of en wp's table, and that the number of edits per minute is much larger, see: https://wikipulse.herokuapp.__com/ <https://wikipulse.herokuapp.com/> There's a lot of deletion and a lot of churn too due to the dispatch mechanism. Also, they apparently have RCPatrol enabled and a pile of bots, which means that the log consists of 99% entries 'bot X editing Y marked it as autopatrolled'. These things in combo turn out to be the perfect storm for my simple select query, causing it to start at normal speed and then get ever slower. I suppose in another couple months it would take so long to run it would never finish... Ariel _________________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.__wikimedia.org <mailto:Xmldatadumps-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/xmldatadumps-__l <https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l> _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l