Hi, I am Chinmay Naik, a Gsoc intern and operator of ProteinBoxBot(runs on pywikipedia - formerly rewrite branch). When i run the bot, there exists a sleep time of around 10 s (7s average) between two successive writes. The bot will handle around 40,000 wikidata items corresponding to gene wiki articles from http://en.wikipedia.org/wiki/Category:Human_proteins . Also , i will be uploading around 30 (claims + sources) for each wikidata item.
If u take a look at the recent edits ( https://www.wikidata.org/w/index.php?title=Special:Contributions/ProteinBoxB...), it takes around 5 mins to completely write all claims+sources to each wikidata item. This is a very large time lag and will hugely affect deployment. Is there any way to reduce this sleep time? Any pointers on this would be helpful.
Thanks, Chinmay
The sleep time as implemented in pywikipedia is influenced by three factors:
- the minimum time between requests, minthrottle=1 (by default) in user-config.py - the minimum time between page saves, put_throttle=10 - the database lag on the DB servers - pages will not be saved unless the lag is less than maxlag = 5
(all values in seconds).
In the case of wikidata, the maximum number of saves/edits per minute is 60 (iirc), so you could reduce put_throttle to 1 by adding
put_throttle=1
in your user-config.py
Best, Merlijn
On 20 August 2013 20:16, Chinmay Naik chin.naik26@gmail.com wrote:
Hi, I am Chinmay Naik, a Gsoc intern and operator of ProteinBoxBot(runs on pywikipedia - formerly rewrite branch). When i run the bot, there exists a sleep time of around 10 s (7s average) between two successive writes. The bot will handle around 40,000 wikidata items corresponding to gene wiki articles from http://en.wikipedia.org/wiki/Category:Human_proteins. Also , i will be uploading around 30 (claims + sources) for each wikidata item.
If u take a look at the recent edits ( https://www.wikidata.org/w/index.php?title=Special:Contributions/ProteinBoxB...), it takes around 5 mins to completely write all claims+sources to each wikidata item. This is a very large time lag and will hugely affect deployment. Is there any way to reduce this sleep time? Any pointers on this would be helpful.
Thanks, Chinmay
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Thanks.Now, there is no sleep time between writes. It will reduce our deployment time to a large extent. I tried several hacks to remove sleep time without success. I didnt imagine such a simple fix would settle this. I believed i was completely familiar with pywikipedia framework but i guess there is still a lot to know. I noticed several threads to improve pywikipedia framework documentation.I strongly agree now. :)
Thanks, Chinmay
On Wed, Aug 21, 2013 at 12:27 AM, Merlijn van Deen valhallasw@arctus.nlwrote:
The sleep time as implemented in pywikipedia is influenced by three factors:
- the minimum time between requests, minthrottle=1 (by default) in
user-config.py
- the minimum time between page saves, put_throttle=10
- the database lag on the DB servers - pages will not be saved unless the
lag is less than maxlag = 5
(all values in seconds).
In the case of wikidata, the maximum number of saves/edits per minute is 60 (iirc), so you could reduce put_throttle to 1 by adding
put_throttle=1
in your user-config.py
Best, Merlijn
On 20 August 2013 20:16, Chinmay Naik chin.naik26@gmail.com wrote:
Hi, I am Chinmay Naik, a Gsoc intern and operator of ProteinBoxBot(runs on pywikipedia - formerly rewrite branch). When i run the bot, there exists a sleep time of around 10 s (7s average) between two successive writes. The bot will handle around 40,000 wikidata items corresponding to gene wiki articles from http://en.wikipedia.org/wiki/Category:Human_proteins. Also , i will be uploading around 30 (claims + sources) for each wikidata item.
If u take a look at the recent edits ( https://www.wikidata.org/w/index.php?title=Special:Contributions/ProteinBoxB...), it takes around 5 mins to completely write all claims+sources to each wikidata item. This is a very large time lag and will hugely affect deployment. Is there any way to reduce this sleep time? Any pointers on this would be helpful.
Thanks, Chinmay
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
You should not change your user-config.py unless you are only working on wikidata site. You may change the put_throttle lag for wikidata site scripts via command line option as -pt:1 or -putthrottle:1 (see -help option for further information).
On pywikibot core you may also use -put_throttle:1 because all numeric config vars are also availlable as option.
xqt
----- Original Nachricht ---- Von: Chinmay Naik chin.naik26@gmail.com An: Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Datum: 21.08.2013 08:44 Betreff: Re: [Pywikipedia-l] Time lag between wikidata writes
Thanks.Now, there is no sleep time between writes. It will reduce our deployment time to a large extent. I tried several hacks to remove sleep time without success. I didnt imagine such a simple fix would settle this. I believed i was completely familiar with pywikipedia framework but i guess there is still a lot to know. I noticed several threads to improve pywikipedia framework documentation.I strongly agree now. :)
Thanks, Chinmay
On Wed, Aug 21, 2013 at 12:27 AM, Merlijn van Deen valhallasw@arctus.nlwrote:
The sleep time as implemented in pywikipedia is influenced by three factors:
- the minimum time between requests, minthrottle=1 (by default) in
user-config.py
- the minimum time between page saves, put_throttle=10
- the database lag on the DB servers - pages will not be saved unless
the
lag is less than maxlag = 5
(all values in seconds).
In the case of wikidata, the maximum number of saves/edits per minute is 60 (iirc), so you could reduce put_throttle to 1 by adding
put_throttle=1
in your user-config.py
Best, Merlijn
On 20 August 2013 20:16, Chinmay Naik chin.naik26@gmail.com wrote:
Hi, I am Chinmay Naik, a Gsoc intern and operator of ProteinBoxBot(runs on pywikipedia - formerly rewrite branch). When i run the bot, there exists
a
sleep time of around 10 s (7s average) between two successive writes. The bot will handle around 40,000 wikidata items corresponding to gene wiki articles from http://en.wikipedia.org/wiki/Category:Human_proteins.
Also , i will be uploading around 30 (claims + sources) for each
wikidata item.
If u take a look at the recent edits (
https://www.wikidata.org/w/index.php?title=Special:Contributions/ProteinBoxB ot&offset=&limit=250&target=ProteinBoxBot),
it takes around 5 mins to completely write all claims+sources to each wikidata item. This is a very large time lag and will hugely affect deployment. Is there any way to reduce this sleep time? Any pointers on this would be helpful.
Thanks, Chinmay
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Yes! Essentially it's like a swiss army knife;
It can do almost anything, but you have to know how!
Currently the documentation is this maillist (sorry) - just ask here in case of doubts.
Regarding the documentation of the install process; I am working on this in the context of externals, patch.exe and automatically satisfying the deps needed. Any help is very wellcome!!! :)
Greetings DrTrigon
On 21.08.2013 08:44, Chinmay Naik wrote:
Thanks.Now, there is no sleep time between writes. It will reduce our deployment time to a large extent. I tried several hacks to remove sleep time without success. I didnt imagine such a simple fix would settle this. I believed i was completely familiar with pywikipedia framework but i guess there is still a lot to know. I noticed several threads to improve pywikipedia framework documentation.I strongly agree now. :)
Thanks, Chinmay
On Wed, Aug 21, 2013 at 12:27 AM, Merlijn van Deen <valhallasw@arctus.nl mailto:valhallasw@arctus.nl> wrote:
The sleep time as implemented in pywikipedia is influenced by three factors:
- the minimum time between requests, minthrottle=1 (by default) in
user-config.py - the minimum time between page saves, put_throttle=10 - the database lag on the DB servers - pages will not be saved unless the lag is less than maxlag = 5
(all values in seconds).
In the case of wikidata, the maximum number of saves/edits per minute is 60 (iirc), so you could reduce put_throttle to 1 by adding
put_throttle=1
in your user-config.py
Best, Merlijn
On 20 August 2013 20:16, Chinmay Naik <chin.naik26@gmail.com mailto:chin.naik26@gmail.com> wrote:
Hi, I am Chinmay Naik, a Gsoc intern and operator of ProteinBoxBot(runs on pywikipedia - formerly rewrite branch). When i run the bot, there exists a sleep time of around 10 s (7s average) between two successive writes. The bot will handle around 40,000 wikidata items corresponding to gene wiki articles from http://en.wikipedia.org/wiki/Category:Human_proteins . Also , i will be uploading around 30 (claims + sources) for each wikidata item.
If u take a look at the recent edits (https://www.wikidata.org/w/index.php?title=Special:Contributions/ProteinBoxB...),
it takes around 5 mins to completely write all claims+sources to
each wikidata item. This is a very large time lag and will hugely affect deployment. Is there any way to reduce this sleep time? Any pointers on this would be helpful.
Thanks, Chinmay
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org mailto:Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org mailto:Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l