Hello, Persian Wikipedia is one of the largest wikis based on number of categories but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons of tons (before writing my engine that was 30K out of 170K) categories without any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list of categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that my bot analyzes and "guess" what is the correct interwiki of category based on patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did the same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can be ran in any language and get interwiki from any language (we planned to run this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find duplicates [3]: https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A... Best --- Amir
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of interwiki links for disambiguation pages.. The only answer I got was along the lines of "because we can". Similarly I do not understand the point of interwiki links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people appreciate it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of categories but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons of tons (before writing my engine that was 30K out of 170K) categories without any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list of categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that my bot analyzes and "guess" what is the correct interwiki of category based on patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did the same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can be ran in any language and get interwiki from any language (we planned to run this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find duplicates [3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A... Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That was a very good question! In Persian Wikipedia, there is another engine that adds categories in articles based on usage of them in English Wikipedia and in order to avoid mistakes that engine uses a pretty damn complicated algorithm that I have no clue what's that but It's working properly in Persian Wikipedia so We need interwiki of categories in order to fill them by bots
Best
On 9/22/13, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of interwiki links for disambiguation pages.. The only answer I got was along the lines of "because we can". Similarly I do not understand the point of interwiki links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people appreciate it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of categories but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons of tons (before writing my engine that was 30K out of 170K) categories without any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list of categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that my bot analyzes and "guess" what is the correct interwiki of category based on patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did the same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can be ran in any language and get interwiki from any language (we planned to run this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find duplicates [3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A... Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi, <grin> your answer is like "because they make similar categories like on en.wp" it does not answer the question if that makes sense and it does not answer the question why this should be replicated in turn in Wikidata. </grin> Thanks, GerardM
On 22 September 2013 22:41, Amir Ladsgroup ladsgroup@gmail.com wrote:
That was a very good question! In Persian Wikipedia, there is another engine that adds categories in articles based on usage of them in English Wikipedia and in order to avoid mistakes that engine uses a pretty damn complicated algorithm that I have no clue what's that but It's working properly in Persian Wikipedia so We need interwiki of categories in order to fill them by bots
Best
On 9/22/13, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of
interwiki
links for disambiguation pages.. The only answer I got was along the
lines
of "because we can". Similarly I do not understand the point of interwiki links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people appreciate it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of categories but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons
of
tons (before writing my engine that was 30K out of 170K) categories without any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list of categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that my bot analyzes and "guess" what is the correct interwiki of category based on patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did
the
same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can be ran in any language and get interwiki from any language (we planned to
run
this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find
duplicates
[3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A...
Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Amir
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
In the past I was asked do certain statistics based on categories for wikinews. Having interwikis was rather key for that.
-bawolff On 2013-09-22 5:26 PM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of interwiki links for disambiguation pages.. The only answer I got was along the lines of "because we can". Similarly I do not understand the point of interwiki links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people appreciate it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of
categories
but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons
of
tons (before writing my engine that was 30K out of 170K) categories
without
any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database:
1-list of
categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after
that my
bot analyzes and "guess" what is the correct interwiki of category
based on
patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did
the
same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It
can be
ran in any language and get interwiki from any language (we planned to
run
this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find
duplicates
[3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A...
Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi, Again this does not answer the question why it makes sense to have interwiki links for categories in Wikidata. Thanks, GerardM
On 22 September 2013 23:56, Brian Wolff bawolff@gmail.com wrote:
In the past I was asked do certain statistics based on categories for wikinews. Having interwikis was rather key for that.
-bawolff On 2013-09-22 5:26 PM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of
interwiki
links for disambiguation pages.. The only answer I got was along the
lines
of "because we can". Similarly I do not understand the point of interwiki links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people appreciate it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of
categories
but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons
of
tons (before writing my engine that was 30K out of 170K) categories
without
any interwikis which is really bad. I wrote some codes to make it
better
but It wasn't enough So I wrote an engine that gets two database:
1-list of
categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after
that my
bot analyzes and "guess" what is the correct interwiki of category
based on
patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did
the
same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It
can be
ran in any language and get interwiki from any language (we planned to
run
this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find
duplicates
[3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A...
Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
One thing could be looking for an article in one language but either not knowing the correct phrase or exact spelling but knowing enough to use related categories to find the article on a different language project. And go from there
On Sunday, September 22, 2013, Gerard Meijssen wrote:
Hoi, Again this does not answer the question why it makes sense to have interwiki links for categories in Wikidata. Thanks, GerardM
On 22 September 2013 23:56, Brian Wolff bawolff@gmail.com wrote:
In the past I was asked do certain statistics based on categories for wikinews. Having interwikis was rather key for that.
-bawolff On 2013-09-22 5:26 PM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, What is the problem with people NOT considering interwiki links to categories as being relevant.. They are not the only ones.
In a similar way I have asked several times what the point is of
interwiki
links for disambiguation pages.. The only answer I got was along the
lines
of "because we can". Similarly I do not understand the point of
interwiki
links to categories. What is achieved by it ?
So technically you may do or have done a good job. When people
appreciate
it, good for you. But I fail to see the point.
Thanks, GerardM
On 22 September 2013 21:54, Amir Ladsgroup ladsgroup@gmail.com
wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of
categories
but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have
tons
of
tons (before writing my engine that was 30K out of 170K) categories
without
any interwikis which is really bad. I wrote some codes to make it
better
but It wasn't enough So I wrote an engine that gets two database:
1-list of
categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after
that my
bot analyzes and "guess" what is the correct interwiki of category
based on
patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very
huge
report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than
25K
categories without interwikis (and It's becoming less and less) we
did
the
same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It
can be
ran in any language and get interwiki from any language (we planned
to
run
this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do
you
want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find
duplicates
[3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A...
Best
Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list
Does it work outside Wikipedia too? I'd like to run it on some wikiquotes; seeing what categories exist in a language but not another is very useful to find out what's going on in the project(s).
Nemo
Yes It works, Just give me the languages
On Mon, Sep 23, 2013 at 3:18 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Does it work outside Wikipedia too? I'd like to run it on some wikiquotes; seeing what categories exist in a language but not another is very useful to find out what's going on in the project(s).
Nemo
______________________________**_________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikitech-lhttps://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 22 September 2013 15:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of categories but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons of tons (before writing my engine that was 30K out of 170K) categories without any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list of categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that my bot analyzes and "guess" what is the correct interwiki of category based on patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did the same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can be ran in any language and get interwiki from any language (we planned to run this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find duplicates [3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A... Best
Hi Amir -
I have a different question. Why is it in the interests of Fawiki to use the same categorization system as any other project? I ask this because I know that almost every Wikipedia has variations in the way that it categorizes articles and other pages, and there is not really a cross-wiki standard - nor would I expect one. Categorization is more or less in the same realm as defining notability, determining neutral point of view, and Manuals of Style: while philosophically we are very similar across all the Wikipedias, each project has a slightly different way of addressing these situations.
I'd suggest that the issue isn't really a technical problem, it's more a cultural one. That is, Wikipedia community cultures have developed categorization systems slightly differently, so it is unlikely that any one will be a perfect match for another.
Risker/Anne
Hi, Persian Wikipedia uses the same system of categorization but not the same categories, If I want to give you an example, categories related to Iran are more deeper and better organized than the English ones but the system is the same Best
On Mon, Sep 23, 2013 at 4:17 AM, Risker risker.wp@gmail.com wrote:
On 22 September 2013 15:54, Amir Ladsgroup ladsgroup@gmail.com wrote:
Hello, Persian Wikipedia is one of the largest wikis based on number of
categories
but It's not very common that people consider adding interwiki of categories (they think interwiki is just for articles) so we have tons of tons (before writing my engine that was 30K out of 170K) categories
without
any interwikis which is really bad. I wrote some codes to make it better but It wasn't enough So I wrote an engine that gets two database: 1-list
of
categories without interwiki 2-list of categories with interwiki to a certain language (e.g. English) with the target interwiki and after that
my
bot analyzes and "guess" what is the correct interwiki of category based
on
patterns of naming them in the second database and bot reports. After running this code on fa.wp there was a very huge report [1] and we started to sort things out (merging duplicates [2], deleting extra ones, adding the correct iw) and now it's less than 25K categories without interwikis (and It's becoming less and less) we did
the
same on templates namespace [3] and we interwikified more than 10K templates after that.
And because this engine doesn't use any language-related analyses It can
be
ran in any language and get interwiki from any language (we planned to
run
this on Persian Wikipedia again but this time we use Dutch and German languages as repo of interwiki)
So here is my question: Is there similar situation in your wiki? Do you want to run this code in your wiki too? Do you have any suggestion? [1]: https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:La... ردهها&oldid=10959457 [2]: One of the benefits of running this engine is we can find duplicates [3]:
https://fa.wikipedia.org/wiki/%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Ladsgroup/%D8%A...
Best
Hi Amir -
I have a different question. Why is it in the interests of Fawiki to use the same categorization system as any other project? I ask this because I know that almost every Wikipedia has variations in the way that it categorizes articles and other pages, and there is not really a cross-wiki standard - nor would I expect one. Categorization is more or less in the same realm as defining notability, determining neutral point of view, and Manuals of Style: while philosophically we are very similar across all the Wikipedias, each project has a slightly different way of addressing these situations.
I'd suggest that the issue isn't really a technical problem, it's more a cultural one. That is, Wikipedia community cultures have developed categorization systems slightly differently, so it is unlikely that any one will be a perfect match for another.
Risker/Anne _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
FWIW, categories can also be very useful for generating offline wiki snapshots. For example, for education I might want to ensure that my wikislice has all the articles related to chemical elements. One reasonable way to do this is to grab all the articles in [[Category:Chemical_elements]]. In order to efficiently make wikislices for the many different languages OLPC targets, it uses inter-language links to automatically 'translate' the root set for the wikislice. So language links for categories help. --scott
ps. OLPC's wikislice-puller doesn't actually use categories at the present time; instead it starts from [[Chemical_element]] and hopes that all the chemical elements are linked from (the translated version of) that page. Using categories would likely be an improvement.
wikitech-l@lists.wikimedia.org