Thanks to everyone who got the enwiki dumps going
again! Should we expect
more regular dumps now? What was the final solution of fixing this?
We are having to resort to crawling
en.wikipedia.org while we await
for regular dumps.
What is the minimum crawling delay we can get away with? I figure if we
have 1 second delay then we'd be able to crawl the 2+ million articles
in a month.
I know crawling is discouraged but it seems a lot of parties still do
so after looking at robots.txt
I have to assume that is how Google et al. is able to keep up to date.
Are their private data feeds? I noticed a wg_enwiki dump listed.
Christian
On Jan 28, 2009, at 10:47 AM, Christian Storm wrote:
That would be great. I second this notion whole
heartedly.
On Jan 28, 2009, at 7:34 AM, Russell Blau wrote:
"Brion Vibber"
<brion(a)wikimedia.org> wrote in message
news:497F9C35.9050500@wikimedia.org...
> On 1/27/09 2:55 PM, Robert Rohde wrote:
>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber<brion(a)wikimedia.org>
>> wrote:
>>> On 1/27/09 2:35 PM, Thomas Dalton wrote:
>>>> The way I see it, what we need is to get a really powerful server
>>> Nope, it's a software architecture issue. We'll restart it with
>>> the new
>>> arch when it's ready to go.
>> The simplest solution is just to kill the current dump job if you
>> have
>> faith that a new architecture can be put in place in less than a
>> year.
>
> We'll probably do that.
>
> -- brion
FWIW, I'll add my vote for aborting the current dump *now* if we
don't
expect it ever to actually be finished, so we can at least get a
fresh dump
of the current pages.
Russ
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org