Attention:
The User Apcbg is using Wikimedia's space to violate copyrights and for his own personal essays on several WIkimedias. He has so far focused on the Hispanic wikis like es:, gl:, ca:, eu:, pt:, and lad:
This message is to alert the Wikipedian community and to request that they keep an eye out for this behavior. When seen, the user should be warned and the undesirable content should be deleted.
Are there actually any Tajik native speakers working on the Tajik Wikipedia at the moment?
I'd like to discuss some software I'm making with them...
http://82.133.33.43/~spectre/tajik/tajik.php
I've had a look over at tg. but it seems to be very inactive.
Regards,
Fran
Hi Francis,
I have a suggestion to improve this software: build a corpus from the Tajik RFE website http://www.ozodi.org/
As you can pretty obviously tell, not all vowels are indicated in Farsi, so in some cases there *should* be multiple candidates for transliteration. For example, Farsi "yeh" can be transliterated in a number of different ways.
In these cases, a simple search of the corpus should reveal which alternative is an actual word, or which is most frequent, and select it.
Mark
On 28/05/06, Francis Tyers spectre@ivixor.net wrote:
Are there actually any Tajik native speakers working on the Tajik Wikipedia at the moment?
I'd like to discuss some software I'm making with them...
http://82.133.33.43/~spectre/tajik/tajik.php
I've had a look over at tg. but it seems to be very inactive.
Regards,
Fran
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
...having said that, that won't fix the fact that Tajik uses more Russian loanwords than Farsi.
Mark
On 28/05/06, Mark Williamson node.ue@gmail.com wrote:
Hi Francis,
I have a suggestion to improve this software: build a corpus from the Tajik RFE website http://www.ozodi.org/
As you can pretty obviously tell, not all vowels are indicated in Farsi, so in some cases there *should* be multiple candidates for transliteration. For example, Farsi "yeh" can be transliterated in a number of different ways.
In these cases, a simple search of the corpus should reveal which alternative is an actual word, or which is most frequent, and select it.
Mark
On 28/05/06, Francis Tyers spectre@ivixor.net wrote:
Are there actually any Tajik native speakers working on the Tajik Wikipedia at the moment?
I'd like to discuss some software I'm making with them...
http://82.133.33.43/~spectre/tajik/tajik.php
I've had a look over at tg. but it seems to be very inactive.
Regards,
Fran
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- Refije dirije lanmè yo paske nou posede pwòp bato.
Also, I noticed that you seemed to use italics for what you took to be incorrect transliterations.
In fact:
"бднео" is an incorrect transliteration of "ба дунё", same with "лҳоз" and "лиҳози"; "ҳқвақ" and "ҳуқуқ"; "боҳм" and "бо ҳам", "бробрнд" and "баробаранд"; "ҳмҳ" and "Ҳама"; "ваҷдон" and "виҷдонанд" (except for the -and suffix); "нсбта" and "нисбат"; "бекидевор" and "ба якдигар"; and possibly even "бо рваҳ бробре" and "бародарвор".
This is because not all vowels are always explicitly indicated in Farsi. The only way to know is to be a native speaker or to use a dictionary.
Mark
On 28/05/06, Mark Williamson node.ue@gmail.com wrote:
...having said that, that won't fix the fact that Tajik uses more Russian loanwords than Farsi.
Mark
On 28/05/06, Mark Williamson node.ue@gmail.com wrote:
Hi Francis,
I have a suggestion to improve this software: build a corpus from the Tajik RFE website http://www.ozodi.org/
As you can pretty obviously tell, not all vowels are indicated in Farsi, so in some cases there *should* be multiple candidates for transliteration. For example, Farsi "yeh" can be transliterated in a number of different ways.
In these cases, a simple search of the corpus should reveal which alternative is an actual word, or which is most frequent, and select it.
Mark
On 28/05/06, Francis Tyers spectre@ivixor.net wrote:
Are there actually any Tajik native speakers working on the Tajik Wikipedia at the moment?
I'd like to discuss some software I'm making with them...
http://82.133.33.43/~spectre/tajik/tajik.php
I've had a look over at tg. but it seems to be very inactive.
Regards,
Fran
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- Refije dirije lanmè yo paske nou posede pwòp bato.
-- Refije dirije lanmè yo paske nou posede pwòp bato.
Actually the opposite ;)
The italics were the ones that seemed to me to be discernable.
I'm working on messing around with the vowels. It would be helpful if there was an "official" transliteration standard, but I can't seem to find one.
Thanks for the link to RFE, I've actually already been trawling it to download as much as I can in Tajik, then I intend to produce a wordlist (possibly by frequency) and experiment with comparing "transliterated" Farsi with the wordlist by edit distance.
It would be helpful to have a bilingual dictionary, but I don't think any currently exist in machine tractable form.
Regards,
Fran
On Sun, 2006-05-28 at 14:37 -0700, Mark Williamson wrote:
Also, I noticed that you seemed to use italics for what you took to be incorrect transliterations.
In fact:
"бднео" is an incorrect transliteration of "ба дунё", same with "лҳоз" and "лиҳози"; "ҳқвақ" and "ҳуқуқ"; "боҳм" and "бо ҳам", "бробрнд" and "баробаранд"; "ҳмҳ" and "Ҳама"; "ваҷдон" and "виҷдонанд" (except for the -and suffix); "нсбта" and "нисбат"; "бекидевор" and "ба якдигар"; and possibly even "бо рваҳ бробре" and "бародарвор".
This is because not all vowels are always explicitly indicated in Farsi. The only way to know is to be a native speaker or to use a dictionary.
Mark
On 28/05/06, Mark Williamson node.ue@gmail.com wrote:
...having said that, that won't fix the fact that Tajik uses more Russian loanwords than Farsi.
Mark
On 28/05/06, Mark Williamson node.ue@gmail.com wrote:
Hi Francis,
I have a suggestion to improve this software: build a corpus from the Tajik RFE website http://www.ozodi.org/
As you can pretty obviously tell, not all vowels are indicated in Farsi, so in some cases there *should* be multiple candidates for transliteration. For example, Farsi "yeh" can be transliterated in a number of different ways.
In these cases, a simple search of the corpus should reveal which alternative is an actual word, or which is most frequent, and select it.
Mark
On 28/05/06, Francis Tyers spectre@ivixor.net wrote:
Are there actually any Tajik native speakers working on the Tajik Wikipedia at the moment?
I'd like to discuss some software I'm making with them...
http://82.133.33.43/~spectre/tajik/tajik.php
I've had a look over at tg. but it seems to be very inactive.
Regards,
Fran
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-- Refije dirije lanmè yo paske nou posede pwòp bato.
-- Refije dirije lanmè yo paske nou posede pwòp bato.
Is there a known IP address he comes from?
----- Original Message ----- From: "Kyle Moore" wikipediano@gmail.com To: wikipedia-l@Wikimedia.org Sent: Sunday, May 28, 2006 11:55 AM Subject: [Wikipedia-l] User Apcbg using Wikimedia space to violate copyrights
Attention:
The User Apcbg is using Wikimedia's space to violate copyrights and for
his
own personal essays on several WIkimedias. He has so far focused on the Hispanic wikis like es:, gl:, ca:, eu:, pt:, and lad:
This message is to alert the Wikipedian community and to request that they keep an eye out for this behavior. When seen, the user should be warned
and
the undesirable content should be deleted. _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
wikipedia-l@lists.wikimedia.org