Kevin Scannell gave a very good presentation at the Celtic Knot Conference 2024 on how the quality of the content of WP impacts the quality of language models it is used to train. The low-quality language of Wikiprojects created by people who claim or claimed to be native speakers of these languages will inevitably be used to train language models since often this is not only the easiest text to harvest, it could also be the only available text online. This then puts any actual language community at a major disadvantage, since they then have to fix texts á la Scots Wikipedia. This also places an undue burden on any existing language community that would not be there if people were honest about their language skills in the first place (for example, we have had multiple people who have claimed to be native speakers of both Nauruan and Kamassian at the same time*). If the language community does not exist or there are too few people to fix it, this low-quality material lives on, further distorting the language.
So I think we need to think about having some process in place when it has become apparent that the content of a Wikiproject is akin to what was in the Scots Wikipedia, but there is little to no language community to "save" the project, because in my opinion, what we can do now is not suitable anymore. For example, I do not think in this case that moving the text to the incubator is the right solution and as I have said on the closing proposal, the whole thing should just be nuked. If the language community exists and wants to later on start a Wikiproject of their own, I don't think they should be burdened with the low-to-no quality text produced by people who don't know the language. I know of one case where the language community has refused to touch a Wikipedia project in the incubator in their language because all the content was created by someone who had not the slightest clue about the language or how it works and they were not willing to fix his mess.
t. Kimberli * They always pick small to non-existent, widely different language communities to pretend they belong to for some reason.
________________________________ From: Sotiale Wiki sotiale.wm@gmail.com Sent: Monday, October 7, 2024 8:49 AM To: Wikimedia Foundation Language Committee langcom@lists.wikimedia.org Subject: [Langcom] Re: Closing proposal Norfolk and Pitcairn Wikipedia
In conclusion, I agree to accept this proposal.
In principle, the fact that Wikipedia is inactive is meaningless as long as there is already valid content. However, this project currently owns about 400 pages, and most of them are very short pages; this means that it cannot be highly evaluated as valid content. Also, it is unlikely that there will be any activity in the near future, so it should be closed considering these points.
If contributors could show up in the near future, it might be evaluated differently, but this project is not like that, and since it is already on its third proposal, it should be seriously considered for closure.
Sotiale
2024년 10월 7일 (월) 오전 1:20, MF-Warburg <mfwarburg@googlemail.commailto:mfwarburg@googlemail.com>님이 작성: https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Pi...
I suggest to accept the proposal. _______________________________________________ Langcom mailing list -- langcom@lists.wikimedia.orgmailto:langcom@lists.wikimedia.org To unsubscribe send an email to langcom-leave@lists.wikimedia.orgmailto:langcom-leave@lists.wikimedia.org