My initial thought would be that this is not allowed. The exception for small extracts is intended for diffs and similar. Actual definitions are basically the same as article text.
However, WM-DE will need to provide a final opinion on this.
River. (Sorry for top-posting.)
----- Reply message ----- From: "Conrad Irwin" conrad.irwin@gmail.com Date: Sat, May 15, 2010 3:39 pm Subject: [Toolserver-l] How much wiki-data is too much? To: toolserver-l@lists.wikimedia.org
I recently re-read the toolserver rools, and am concerned that publishing definitions extracted from XML dumps of en.Wiktionary may be in violation of rule 10.
# Tools may not serve significant portions of wiki page text to clients. "Significant" means distributing actual page content; for example, installing MediaWiki to serve the text of wikis would not be allowed, but showing a short extract to provide context for a tool would be okay.
Could someone please clarify whether this precludes publishing lots of short extracts combined? I had intended to (eventually) publish similar dumps of other information in Wiktionary (such as the Translations), so it would be nice to check that this is permitted on the toolserver, or whether I have to find some alternative hosting.
The current output is at http://toolserver.org/~enwikt/definitions/
Thanks Conrad
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 15 May 2010 18:30, River Tarnell river.tarnell@wikimedia.de wrote:
My initial thought would be that this is not allowed. The exception for small extracts is intended for diffs and similar. Actual definitions are basically the same as article text.
However, WM-DE will need to provide a final opinion on this.
Ok.
Will I get a reply from them on this list, or should I forward my query elsewhere? In the event that I can't publish these files here, is there another Wikimedia-related place I could?
Conrad
River. (Sorry for top-posting.)
----- Reply message ----- From: "Conrad Irwin" conrad.irwin@gmail.com Date: Sat, May 15, 2010 3:39 pm Subject: [Toolserver-l] How much wiki-data is too much? To: toolserver-l@lists.wikimedia.org
I recently re-read the toolserver rools, and am concerned that publishing definitions extracted from XML dumps of en.Wiktionary may be in violation of rule 10.
# Tools may not serve significant portions of wiki page text to clients. "Significant" means distributing actual page content; for example, installing MediaWiki to serve the text of wikis would not be allowed, but showing a short extract to provide context for a tool would be okay.
Could someone please clarify whether this precludes publishing lots of short extracts combined? I had intended to (eventually) publish similar dumps of other information in Wiktionary (such as the Translations), so it would be nice to check that this is permitted on the toolserver, or whether I have to find some alternative hosting.
The current output is at http://toolserver.org/~enwikt/definitions/
Thanks Conrad
As long as its just parsing a datadump and posting a compressed archive of those results Im not sure I see a problem.
On Sat, May 15, 2010 at 1:43 PM, Conrad Irwin conrad.irwin@gmail.comwrote:
On 15 May 2010 18:30, River Tarnell river.tarnell@wikimedia.de wrote:
My initial thought would be that this is not allowed. The exception for small extracts is intended for diffs and similar. Actual definitions are basically the same as article text.
However, WM-DE will need to provide a final opinion on this.
Ok.
Will I get a reply from them on this list, or should I forward my query elsewhere? In the event that I can't publish these files here, is there another Wikimedia-related place I could?
Conrad
River. (Sorry for top-posting.)
----- Reply message ----- From: "Conrad Irwin" conrad.irwin@gmail.com Date: Sat, May 15, 2010 3:39 pm Subject: [Toolserver-l] How much wiki-data is too much? To: toolserver-l@lists.wikimedia.org
I recently re-read the toolserver rools, and am concerned that publishing definitions extracted from XML dumps of en.Wiktionary may be in violation of rule 10.
# Tools may not serve significant portions of wiki page text to clients. "Significant" means distributing actual page content; for example, installing MediaWiki to serve the text of wikis would not be allowed, but showing a short extract to provide context for a tool would be okay.
Could someone please clarify whether this precludes publishing lots of short extracts combined? I had intended to (eventually) publish similar dumps of other information in Wiktionary (such as the Translations), so it would be nice to check that this is permitted on the toolserver, or whether I have to find some alternative hosting.
The current output is at http://toolserver.org/~enwikt/definitions/http://toolserver.org/%7Eenwikt/definitions/
Thanks Conrad
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
John Doe schrieb:
As long as its just parsing a datadump and posting a compressed archive of those results Im not sure I see a problem.
The problem is that if the content is served from a domain owned by WMDE, WMDE might be held responsible for such content. Depending on jurisdiction, the judge, and the phase of the moon.
It's a tricky question, I'll have to consult our ED on this next week. I have it on my list, i'll post a reply here.
-- daniel
Just a quick update: to give a good answer, I'm forwarding the question to WMDE's lawyers. It may take a while to get a response, though. I'll let you know.
-- daniel
On 15 May 2010 22:14, Platonides platonides@gmail.com wrote:
Conrad Irwin wrote:
In the event that I can't publish these files here, is there another Wikimedia-related place I could?
Conrad
Maybe you could convince Tomasz to run your script on dumps.wikimedia.org
That's a pretty good idea, though I'll wait until the WMF dumps are running again before adding yet more to his plate.
Conrad
Στις 15-05-2010, ημέρα Σαβ, και ώρα 23:14 +0200, ο/η Platonides έγραψε:
Conrad Irwin wrote:
In the event that I can't publish these files here, is there another Wikimedia-related place I could?
Conrad
Maybe you could convince Tomasz to run your script on dumps.wikimedia.org
If the datasets would be useful to more than just a couple people, we could consider it for the "datasets" host, which is meant for image colections, piles of statistics, our standard dumps and other datasets useful to the public.
Ariel
On 16 May 2010 01:38, Ariel T. Glenn ariel@wikimedia.org wrote:
Στις 15-05-2010, ημέρα Σαβ, και ώρα 23:14 +0200, ο/η Platonides έγραψε:
Conrad Irwin wrote:
In the event that I can't publish these files here, is there another Wikimedia-related place I could?
Conrad
Maybe you could convince Tomasz to run your script on dumps.wikimedia.org
If the datasets would be useful to more than just a couple people, we could consider it for the "datasets" host, which is meant for image colections, piles of statistics, our standard dumps and other datasets useful to the public.
I've so far pointed a few people to it, but got no affirmative response. Obviously I think they're wonderful, and have been using them on-site for cleanup-lists and statistics. I think Amgine was part of a strategic planning team that was looking into making Wiktionary's data accessible, but I don't know what their plan-of-action was.
Conrad
Ariel T. Glenn έγραψε:
Στις 15-05-2010, ημέρα Σαβ, και ώρα 23:14 +0200, ο/η Platonides έγραψε:
Maybe you could convince Tomasz to run your script on dumps.wikimedia.org
If the datasets would be useful to more than just a couple people, we could consider it for the "datasets" host, which is meant for image colections, piles of statistics, our standard dumps and other datasets useful to the public.
Ariel
They probably are, given that periodically someone chimes into mediawiki-api-l asking how to get a definition using the api. So definitions is something that people wants.
toolserver-l@lists.wikimedia.org