Hi;
I run a script that downloads 1200 MB every night (from http://dammit.lt/wikistats/) to generate some stats (http://es.wikipedia.org/wiki/Wikipedia:Ranking_de_visitas). Any problem with that bandwidth usage? 30GB by month.
Thanks, regards
On Wed, Oct 22, 2008 at 7:03 AM, emijrp emijrp@gmail.com wrote:
Hi;
I run a script that downloads 1200 MB every night (from http://dammit.lt/wikistats/) to generate some stats (http://es.wikipedia.org/wiki/Wikipedia:Ranking_de_visitas). Any problem with that bandwidth usage? 30GB by month.
Why would you download that much every night? The data doesn't grow by anywhere near that much every day.
1200MB isn't significant, so there isn't an issue there (perhaps on the other side), but there is no need to be wasteful.
"Gregory Maxwell" gmaxwell@gmail.com writes:
On Wed, Oct 22, 2008 at 7:03 AM, emijrp emijrp@gmail.com wrote:
I run a script that downloads 1200 MB every night (from http://dammit.lt/wikistats/) to generate some stats (http://es.wikipedia.org/wiki/Wikipedia:Ranking_de_visitas). Any problem with that bandwidth usage? 30GB by month.
Why would you download that much every night? The data doesn't grow by anywhere near that much every day.
There is roughly 50MB worth of data each hour, which gets pretty close to 1200MB a day.
I download 1200MB per day because of each 50MB hour file * 24 hours ~= 1200MB.
Any stats of Toolserver bandwidth usage? I suppose that replication uses many GBs every day, but yes, I don't want to be wasteful.
Thanks.
Gregory Maxwell escribió:
On Wed, Oct 22, 2008 at 7:03 AM, emijrp emijrp@gmail.com wrote:
Hi;
I run a script that downloads 1200 MB every night (from http://dammit.lt/wikistats/) to generate some stats (http://es.wikipedia.org/wiki/Wikipedia:Ranking_de_visitas). Any problem with that bandwidth usage? 30GB by month.
Why would you download that much every night? The data doesn't grow by anywhere near that much every day.
1200MB isn't significant, so there isn't an issue there (perhaps on the other side), but there is no need to be wasteful.
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Hello, Am Wednesday 22 October 2008 13:03:31 schrieb emijrp:
Hi;
I run a script that downloads 1200 MB every night
if you do this, please save the data at
/mnt/user-store/
(create a directoy there). So every usercan use the data and they have to downloaded one 1 time.
Perhaps there is a better way (rsync or something) to get the data from the source.
Thanks, regards
Sincerly, DaB.
DaB. wrote:
I am following up on a discussion here in October:
I run a script that downloads 1200 MB every night
if you do this, please save the data at
/mnt/user-store/
(create a directoy there). So every usercan use the data and they have to downloaded one 1 time.
Since I am becoming involved with statistics too, I have setup such a scheme in /mnt/user-store/stats. Data files starting from 1 October 2008 are currently available (emijrp asked if I could get older files too, which should be doable but I haven't looked into it yet). I still have to fine tune the update process, but basically a cron task will take care of this at least every day (probably more often, but I have to see when the original files are actually updated)
Let me know if anyone else is interested in using this data.
Perhaps there is a better way (rsync or something) to get the data from the source.
I use wget; it will not download files twice unless they have been modified (which should not happen). Also, files are already gz'ipped, so compression would not be of much use here. Even though rsync is a better solution on paper, all in all, I don't think it would improve the situation much here.
Currently, the directory contains 112 Go, growing by about 1.2 Gb everyday. So far, it is not a problem (2.5 Tb are currently available in user-store), but I'd like to know when it would start to be considered "too big". What do the admins think ?
On the main statistics server of the WMF, Erik Zachte is developing scripts to compact these individual hourly files into daily files, reducing the size of the data by two; this could also be used here.
Frédéric
toolserver-l@lists.wikimedia.org