Hi Robert,
Thanks for the detailed answer. I will use the dumps. Out of curiosity,
though, can you tell me where that explicit live mirror prohibition is
stated? I couldn't find any controlling documents on the subject. Again,
I'm referring to fetching the wiktionary mark-up source document through the
API, not the rendered page.
Thanks,
James
On Thu, Sep 3, 2009 at 9:07 AM, Robert Ullmann <rlullmann(a)gmail.com> wrote:
Hi,
In general a small number of requests is fine, but large numbers
(using the wikts as a live back-end database) is not so good. (Note
that "live mirrors", re-presenting WM data as part of another site are
explicitly prohibited. ;-)
What you should probably do is use the XML dumps from
http://download.wikimedia.org/backup-index.html which at the moment
(thanks to a bunch of work done after a lot of whining from us ;-) is
running on a 3-4 day cycle. It is very reasonable to download each
wiktionary's dump file as produced (not hard to automate). The English
wikt dump is running right now as I write this.
Then you can load each as it arrives into your local cache or server
as desired, and use as you will.
You can also get the en.wikt dumps from
http://70.79.96.121/w/dump/xmlu/ updated mid-morning UTC every day.
These are a bit smaller, as they only include the content pages. (Eg.
you won't even find the Main page in the dump, as it is in Wiktionary:
namespace.)
best,
Robert
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l