Hello, I am interested in using the Wiktionary API (located at en.wiktionary.org/w/api.php) and was having trouble finding any information on what is acceptable commercial use. If there are any controlling documents on the subject, can you please direct me to them? In particular, I would like to know if there are any restrictions on the number of requests allowed in a given time period, and if there are any other restrictions on volume or frequency of use that I should keep in mind.
If determining acceptable use of the API remains a subjective exercise, let me explain how I would like to use it and perhaps you can tell me if my intended use is acceptable.
I am starting a new language translation service bureau that will use online tools to make the translation process more accurate and less expensive for the end customer. We also intend to offer free access to our tools to any open source project or non-profit organization (in such a case, they would be free to use our project management, version control, and translator tools free of charge, but they would have to find their own volunteer translators to do the actual translation work).
As part of our translation tool set, we would like to provide access to monolingual and bilingual dictionaries. Wiktionary appears to be the perfect choice for this. We would like to use the Wiktionary API to fetch words that are requested by our users (translators) and then render them on our own servers for viewing in our translator tool. We would keep a local cache of fetched documents to minimize the number of API calls that we need to make. We will, of course, give proper attribution, etc., but it is possible that we will eventually be making quite a large number of requests, so we thought we should check with you first.
Is that acceptable use?
Cheers, James
2009/9/1 James Richard james.richard050@gmail.com:
Hello, I am interested in using the Wiktionary API (located at en.wiktionary.org/w/api.php) and was having trouble finding any information on what is acceptable commercial use. If there are any controlling documents on the subject, can you please direct me to them? In particular, I would like to know if there are any restrictions on the number of requests allowed in a given time period, and if there are any other restrictions on volume or frequency of use that I should keep in mind.
If determining acceptable use of the API remains a subjective exercise, let me explain how I would like to use it and perhaps you can tell me if my intended use is acceptable.
I am starting a new language translation service bureau that will use online tools to make the translation process more accurate and less expensive for the end customer. We also intend to offer free access to our tools to any open source project or non-profit organization (in such a case, they would be free to use our project management, version control, and translator tools free of charge, but they would have to find their own volunteer translators to do the actual translation work).
As part of our translation tool set, we would like to provide access to monolingual and bilingual dictionaries. Wiktionary appears to be the perfect choice for this. We would like to use the Wiktionary API to fetch words that are requested by our users (translators) and then render them on our own servers for viewing in our translator tool. We would keep a local cache of fetched documents to minimize the number of API calls that we need to make. We will, of course, give proper attribution, etc., but it is possible that we will eventually be making quite a large number of requests, so we thought we should check with you first.
Is that acceptable use?
Another approach is to download the Wiktionary dump archive to parse offline: http://download.wikipedia.org/enwiktionary/latest/enwiktionary-latest-pages-...
Andrew Dunbar (hippietrail)
Cheers, James _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
You can try to work with the parsed Wiktionary database (now only a part of Russian Wiktionary, English Wiktionary is planning for the next year). See http://code.google.com/p/wikokit
Good luck!
On Wed, Sep 2, 2009 at 5:01 AM, PlatonidesPlatonides@gmail.com wrote:
I think it would...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
In general a small number of requests is fine, but large numbers (using the wikts as a live back-end database) is not so good. (Note that "live mirrors", re-presenting WM data as part of another site are explicitly prohibited. ;-)
What you should probably do is use the XML dumps from http://download.wikimedia.org/backup-index.html which at the moment (thanks to a bunch of work done after a lot of whining from us ;-) is running on a 3-4 day cycle. It is very reasonable to download each wiktionary's dump file as produced (not hard to automate). The English wikt dump is running right now as I write this.
Then you can load each as it arrives into your local cache or server as desired, and use as you will.
You can also get the en.wikt dumps from http://70.79.96.121/w/dump/xmlu/ updated mid-morning UTC every day. These are a bit smaller, as they only include the content pages. (Eg. you won't even find the Main page in the dump, as it is in Wiktionary: namespace.)
best, Robert
Hi Robert,
Thanks for the detailed answer. I will use the dumps. Out of curiosity, though, can you tell me where that explicit live mirror prohibition is stated? I couldn't find any controlling documents on the subject. Again, I'm referring to fetching the wiktionary mark-up source document through the API, not the rendered page.
Thanks, James
On Thu, Sep 3, 2009 at 9:07 AM, Robert Ullmann rlullmann@gmail.com wrote:
Hi,
In general a small number of requests is fine, but large numbers (using the wikts as a live back-end database) is not so good. (Note that "live mirrors", re-presenting WM data as part of another site are explicitly prohibited. ;-)
What you should probably do is use the XML dumps from http://download.wikimedia.org/backup-index.html which at the moment (thanks to a bunch of work done after a lot of whining from us ;-) is running on a 3-4 day cycle. It is very reasonable to download each wiktionary's dump file as produced (not hard to automate). The English wikt dump is running right now as I write this.
Then you can load each as it arrives into your local cache or server as desired, and use as you will.
You can also get the en.wikt dumps from http://70.79.96.121/w/dump/xmlu/ updated mid-morning UTC every day. These are a bit smaller, as they only include the content pages. (Eg. you won't even find the Main page in the dump, as it is in Wiktionary: namespace.)
best, Robert
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Sep 3, 2009 at 3:27 PM, James Richardjames.richard050@gmail.com wrote:
Thanks for the detailed answer. I will use the dumps. Out of curiosity, though, can you tell me where that explicit live mirror prohibition is stated? I couldn't find any controlling documents on the subject.
http://meta.wikimedia.org/wiki/Live_mirrors
I don't know how formal or authoritative that is. You might want to ask someone like Brion. I think the answer in practice is that nobody's going to waste time blocking you if you don't cause noticeable load, but I don't know if there's an official statement anywhere. I vaguely recall that some sites might pay Wikimedia a fee to do commercial live mirroring, but I'm not sure on that.
Again, I'm referring to fetching the wiktionary mark-up source document through the API, not the rendered page.
The cost of the API is roughly the same to the servers as of the rendered page, or perhaps higher (due to worse caching). It's just in a more bot-friendly format.
On Thu, Sep 3, 2009 at 10:26 PM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
I don't know how formal or authoritative that is. You might want to ask someone like Brion. I think the answer in practice is that nobody's going to waste time blocking you if you don't cause noticeable load, but I don't know if there's an official statement anywhere. I vaguely recall that some sites might pay Wikimedia a fee to do commercial live mirroring, but I'm not sure on that.
AFAIK one of these is spiegel.de which gets some kind of live feed, they arranged it with WM DE. Marco
wikitech-l@lists.wikimedia.org