I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119, Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
Love it!
2013/8/7, Denny Vrandečić denny.vrandecic@wikimedia.de:
I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119, Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
This may work very fine for little stubs about repetitive stuff, like the introductions of cities (location, population, foundation date, country, etc). But, how will that work for the rest of sections of Berlin (history, geography, politics...)? https://en.wikipedia.org/wiki/Berlin
2013/8/7 Denny Vrandečić denny.vrandecic@wikimedia.de
I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
< http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119, Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: < http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
I thought so myself, but then I did a bit of research to figure out the state of natural language generation. I could not find easily a current state of the art, but I found this list of examples on the KPML website that is linked from the proposal, they are from 1998:
< http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/R3b12-English/D...
< http://www.fb10.uni-bremen.de/anglistik/langpro/kpml/genbank/R3b12-English/D...
There are examples like: "Analysts say that the private position is far more sensible, because it leads to much needed capital for European computer and semiconductor companies, while giving them a toehold in the lucrative Japanese domestic market."
"Because of its importance, any reaction of the sixty people whose televisions are attached to the system is monitored closely."
Since they managed it 15 years ago, I believe we can do it too. At least try and fail. Even if the complexity of our sentences does not raise that high, it seems to me that there is plenty of content that would be beneficial to make available.
Cheers, Denny
2013/8/7 Emilio J. Rodríguez-Posada emijrp@gmail.com
This may work very fine for little stubs about repetitive stuff, like the introductions of cities (location, population, foundation date, country, etc). But, how will that work for the rest of sections of Berlin (history, geography, politics...)? https://en.wikipedia.org/wiki/Berlin
2013/8/7 Denny Vrandečić denny.vrandecic@wikimedia.de
I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
<
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already
could
achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many
of
the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the
following
we present a proposal for a different approach, which is based on the
idea
of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call
such
as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters
Q5119,
Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for
a
simple article.
That wiki would consist of *content*, i.e. the article pages, possibly
just
a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done
in
normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: <
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Thanks for sharing your very interesting ideas. While I am not fully support your idea of implementation, I share your basic view of the need and think some of the concepts you introduce has a very high potential to better utilize the power of us having many versions.
I have put in my feedback on the talkpage and hope there will be a possibility to evolve this concept further in some type of workgroup. I also see an interesting relation to the talk of machine translation where I believe we can do a lot very quickly if we limit the vocabulary to be included in such a tool
Anders
Denny Vrandečić skrev 2013-08-07 02:20:
I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119, Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
Most times the best approach is a compilation of several approaches.
Perhaps we can use the Denny system for the little introduction of articles (for example: geography, biographies) and optional automatic translation for the rest of the article.
I mean, if you follow a red link in a little Wikipedia, it loads the i18n template + wikidata bits, so you have a brief summary about the topic. Then you can save that "live" generated stub, and expand it (using autotraslation from other WIkipedia).
2013/8/7 Anders Wennersten mail@anderswennersten.se
Thanks for sharing your very interesting ideas. While I am not fully support your idea of implementation, I share your basic view of the need and think some of the concepts you introduce has a very high potential to better utilize the power of us having many versions.
I have put in my feedback on the talkpage and hope there will be a possibility to evolve this concept further in some type of workgroup. I also see an interesting relation to the talk of machine translation where I believe we can do a lot very quickly if we limit the vocabulary to be included in such a tool
Anders
Denny Vrandečić skrev 2013-08-07 02:20:
I have been thinking about this for a while, and now finally managed to
write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
<http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipediahttp://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119, Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: <http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipediahttp://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
______________________________**_________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.**org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-lhttps://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@**lists.wikimedia.orgwikimedia-l-request@lists.wikimedia.org ?subject=**unsubscribe>
Obviously, this system should be only used as far as it carries. I don't know how far it might carry us - it might fail miserably, and not get beyond the "Rome is a city. Rome is in Italy. Rome is known for The Colosseum, coffee and Vatican City (state)." stage. It might lead to a glorious future, where we really create an open source system that allows everyone to write in every language and express a wide range of human thought.
I am personally hesitant about automatic translations, and whether we can achieve the coverage (in language pairs) and the quality (of Wikipedia). But that is only my opinion. A hybrid approach, if we can support it and build it, would obviously be the safest bet, as both endeavors are rather risky. I see a lot of possible space for a hybrid system, as you describe it.
One advantage of my proposal is that it's cost is rather small. For supporting translation I haven't seen yet a sufficiently sketched proposal that allows to estimate the potential cost and potential benefit.
Cheers, Denny
2013/8/7 Emilio J. Rodríguez-Posada emijrp@gmail.com
Most times the best approach is a compilation of several approaches.
Perhaps we can use the Denny system for the little introduction of articles (for example: geography, biographies) and optional automatic translation for the rest of the article.
I mean, if you follow a red link in a little Wikipedia, it loads the i18n template + wikidata bits, so you have a brief summary about the topic. Then you can save that "live" generated stub, and expand it (using autotraslation from other WIkipedia).
2013/8/7 Anders Wennersten mail@anderswennersten.se
Thanks for sharing your very interesting ideas. While I am not fully support your idea of implementation, I share your basic view of the need and think some of the concepts you introduce has a very high potential to better utilize the power of us having many versions.
I have put in my feedback on the talkpage and hope there will be a possibility to evolve this concept further in some type of workgroup. I also see an interesting relation to the talk of machine translation
where I
believe we can do a lot very quickly if we limit the vocabulary to be included in such a tool
Anders
Denny Vrandečić skrev 2013-08-07 02:20:
I have been thinking about this for a while, and now finally managed to
write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
<http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipedia<
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a
place
to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards
closing
this gap, mostly focusing on increasing the number of contributors to
the
small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the
following
we present a proposal for a different approach, which is based on the
idea
of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call
such
as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist
die
Hauptstadt Deutschlands.”* (in the example, the template parameters
Q5119,
Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide
for a
simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: <http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipedia<
http://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
______________________________**_________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.**org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l%3E,
<mailto:wikimedia-l-request@**lists.wikimedia.org<
wikimedia-l-request@lists.wikimedia.org>
?subject=**unsubscribe>
Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Wed, Aug 7, 2013 at 8:50 AM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
[...] It might lead to a glorious future, where we really create an open source system that allows everyone to write in every language and express a wide range of human thought.
As much as I love this proposal, I have some some reservations, namely: http://www.xkcd.com/191/
So if there are already complains about the skewed participation in the Wikipedia, this would be the right step to skew it even further. This is not necessarily bad (maybe it could even work without patrolling), however instead of stating "allow everyone to write in every language", I think it would be more realistic to state "allow everyone, who wants to take the extra effort in participating in such a project, to write in every language". Predicted demographics: 95% women from the "global south" :)
I am personally hesitant about automatic translations, and whether we can achieve the coverage (in language pairs) and the quality (of Wikipedia). But that is only my opinion. A hybrid approach, if we can support it and build it, would obviously be the safest bet, as both endeavors are rather risky. I see a lot of possible space for a hybrid system, as you describe it.
+1000
One advantage of my proposal is that it's cost is rather small. For supporting translation I haven't seen yet a sufficiently sketched proposal that allows to estimate the potential cost and potential benefit.
As with so many things, it will be hard to assess cost/benefits without making some effort. A safe bet could be to try with an existing pair or develop a pair with an estimated high demand. If that works, escalate, otherwise stop there.
Micru
On Tue, Aug 13, 2013 at 1:57 PM, David Cuenca dacuetu@gmail.com wrote:
Predicted demographics: 95% women from the "global south" :)
(-:
I am personally hesitant about automatic translations, and whether we can achieve the coverage (in language pairs) and the quality (of Wikipedia). But that is only my opinion. A hybrid approach, if we can support it and build it, would obviously be the safest bet, as both endeavors are rather risky. I see a lot of possible space for a hybrid system, as you describe it.
+1000
I have a lot of love for this idea in general, and a hybrid approach to this part; thank you for articulating it so clearly.
One advantage of my proposal is that it's cost is rather small. For supporting translation I haven't seen yet a sufficiently sketched proposal that allows to estimate the potential cost and potential benefit.
As with so many things, it will be hard to assess cost/benefits without making some effort. A safe bet could be to try with an existing pair or develop a pair with an estimated high demand.
Is there a pair where some work has already been done?
SJ
On Mon, Aug 19, 2013 at 5:31 PM, Samuel Klein meta.sj@gmail.com wrote:
As with so many things, it will be hard to assess cost/benefits without making some effort. A safe bet could be to try with an existing pair or develop a pair with an estimated high demand.
Is there a pair where some work has already been done?
For Apertium there are quite a few already done: http://wiki.apertium.org/wiki/Main_Page
Regarding new language pairs, no idea if the priorities for Wikipedia would be the same as the priorities the Apertium community has. It might be worth considering which languages to prioritize and how to measure success or lack thereof.
Cheers, Micru
Using a rather simple pair like Afrikaans - Dutch or a heavily researched one like English - Spanish would be giving us a wrong impression of how this will scale. We should at least add a few random pairs like Yoruba - Gujarati or Kazakh - Lombard. Most of our 67,000 language pairs that we will have to cover will fall in the latter group, not in the first two.
2013/8/23 David Cuenca dacuetu@gmail.com
On Mon, Aug 19, 2013 at 5:31 PM, Samuel Klein meta.sj@gmail.com wrote:
As with so many things, it will be hard to assess cost/benefits without making some effort. A safe bet could be to try with an existing pair or develop a pair with an estimated high demand.
Is there a pair where some work has already been done?
For Apertium there are quite a few already done: http://wiki.apertium.org/wiki/Main_Page
Regarding new language pairs, no idea if the priorities for Wikipedia would be the same as the priorities the Apertium community has. It might be worth considering which languages to prioritize and how to measure success or lack thereof.
Cheers, Micru _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Something to take into account should be the efficiency a language pair can have. For instance, how many articles there are available, how easy is to translate articles, how many bilingual speakers there are for a given pair, and perhaps also, how much it can help to harmonize relationships between speakers of both languages.
There seems to be much more demand for languages that are geographically closer. While speakers of Kazakh might have little interest in reading the Lombard or Gujarati wikipedias, they might be more inclined to visit the Tatar wikipedia, which by the way is closely related and much easier to translate.
So no, I don't think we should base our decisions on the theoretical number of pairs that can exist, but on the ones that offer the best efficiency.
Cheers, Micru
On Fri, Aug 23, 2013 at 4:45 AM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
Using a rather simple pair like Afrikaans - Dutch or a heavily researched one like English - Spanish would be giving us a wrong impression of how this will scale. We should at least add a few random pairs like Yoruba - Gujarati or Kazakh - Lombard. Most of our 67,000 language pairs that we will have to cover will fall in the latter group, not in the first two.
2013/8/23 David Cuenca dacuetu@gmail.com
On Mon, Aug 19, 2013 at 5:31 PM, Samuel Klein meta.sj@gmail.com wrote:
As with so many things, it will be hard to assess cost/benefits
without
making some effort. A safe bet could be to try with an existing pair
or
develop a pair with an estimated high demand.
Is there a pair where some work has already been done?
For Apertium there are quite a few already done: http://wiki.apertium.org/wiki/Main_Page
Regarding new language pairs, no idea if the priorities for Wikipedia
would
be the same as the priorities the Apertium community has. It might be worth considering which languages to prioritize and how to measure success or lack thereof.
Cheers, Micru _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Thank you, Anders. Yes, I published the idea in order to garner feedback and further evolve it. It is by no means ready-perfect-finished, it is rather really just a first draft. So suggestions, constructive critique, and improvements are obviously extremely welcome. --~~~~
2013/8/7 Anders Wennersten mail@anderswennersten.se
Thanks for sharing your very interesting ideas. While I am not fully support your idea of implementation, I share your basic view of the need and think some of the concepts you introduce has a very high potential to better utilize the power of us having many versions.
I have put in my feedback on the talkpage and hope there will be a possibility to evolve this concept further in some type of workgroup. I also see an interesting relation to the talk of machine translation where I believe we can do a lot very quickly if we limit the vocabulary to be included in such a tool
Anders
Denny Vrandečić skrev 2013-08-07 02:20:
I have been thinking about this for a while, and now finally managed to write it down as a proposal. Details are on meta on the following link, below is the intro to the proposal:
<http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipediahttp://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
I tried to anticipate some possible questions and provide answers on the page. Besides that, I obviously hope that Wikimania could provide a place to start this conversation. And yes, I am aware that the proposal would lead to a very restrictive solution, but imagine what good it already could achieve! And since it is not meant to replace anything, but enrich our current projects... well, read for yourself.
Cheers, Denny
Wikipedia provides knowledge in more than 200 languages. Whereas a small number of languages are fortunate enough to have a large Wikipedia, many of the language editions are far away from providing a comprehensive encyclopedia by any measure. There are several approaches towards closing this gap, mostly focusing on increasing the number of contributors to the small language editions or to improve the provision of automatic or semi-automatic translations of articles. Both are viable. In the following we present a proposal for a different approach, which is based on the idea of multilingual Wikipedia.
Imagine a small extension to the template system, where a template call like *{{F12}}* would not be expanded by a call to the template
Template:F12, but rather to Template:F12/en, i.e. the template name with the selected language code of the reader of the page. A template call such as *{{F12:Q64|Q5519|Q183}}* can be expanded by Template:F12/en into *“Berlin is the capital of Germany.”* and by Template:F12/de into *“Berlin ist die Hauptstadt Deutschlands.”* (in the example, the template parameters Q5119,
Q64 and Q183 refer to the Wikidata items for capital, Berlin and Germany respectively, which the templates query for the label in the respective language). Sentence by sentence could be created in order to provide for a simple article.
That wiki would consist of *content*, i.e. the article pages, possibly just a simple series of template calls, and *frames*, i.e. the templates that
lexicalize the parameters of a given template call into a sentence (Note that “sentence” here should not be considered literally. It could be a table, an image, anything). The implementation of the frames can be done in normal wiki template syntax, in Lua, in a novel mechanism, or a mix of these. This would be up to the communities creating them.
Read the rest here: <http://meta.wikimedia.org/**wiki/A_proposal_towards_a_** multilingual_Wikipediahttp://meta.wikimedia.org/wiki/A_proposal_towards_a_multilingual_Wikipedia
______________________________**_________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.**org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-lhttps://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@**lists.wikimedia.orgwikimedia-l-request@lists.wikimedia.org ?subject=**unsubscribe>
wikimedia-l@lists.wikimedia.org