yes, it is an old issue, what you say it's right but I would be more optimistic. To summarize my view (I couls send you more information privately)
1. Wikidata largely reflected what Wikipedia indicated, and that was not the right way to make it grow, but that was also the past. At the moment, the reference of the content is increasing, the clean-up too. In some areas, wikidata items are also created before the wikipedia articles nowadays.
2. new tools are great and will do a lot, but it's users who do the real tricks. You have to start to bring local users to wikidata, show them how it can be used (automatic infoboxes, fast creation of stubs, automatic lists, detecing missing images). They will start to fix the issues, curating their wikipedia, wikidata and also indirectly influence the other ones.
3. IMHO, the wikidata ecosystem is not so bad, it could have more expert users with real knowledge of topics, but commons with millions of automatically imported files, and tons of poorly described and uncategorized images faces a much worse perspective. You need more tools there than on wikidata, at the moment, if you want to keep some balanced workflow. What is really missing on wikidata are mostly active projects to coordinate and catalyze the ongoing efforts. This one https://www.wikidata.org/wiki/Wikidata:WikiProject_Ancient_Greece made miracles, for example. But I couldn't find one about peer-reviewed researchers or photographers to name a few, at least in the past months. Investing on this aspect would not change the final situation on wikidata (that will be positive for me), but it would speed up the process. it will also influence much more the content on local wikis because it will bring content-related users closer together and increase their wikidata literacy with lower effort. 4. In the end, even with a good high quality wikidata platform, there will always be communities that will not integrated in wikidata massively... but that's also a good thing for pluralism. You can't assume that a discrepancy is always a clue for a mistake (I am sure the examples of your experience are, of course), on the long term some of them are simply effects of gray areas that need to wait to be resolved even at the level of the sources. Insome fields, such as taxonomy, there is some confusion and asymmetric organization of the content and will never be solved easily. But in the other areas they probably will. Alex
Il Domenica 15 Luglio 2018 22:37, Gerard Meijssen gerard.meijssen@gmail.com ha scritto:
Hoi, Wikidata is a reflection of all the Wikimedia projects, particularly the Wikipedias. Both Wikidata and Wikipedia are secondary sources and when two Wikipedias have opposing information on singular information, it is a cop out to state both "opinions" on Wikidata and leave it at that.
Given that Wikidata largely reflects what a Wikipedia indicates, it is important to curate such differences. The first thing to consider is are we interested at all in knowing about "false facts" and then how we can indicate differences to our editing and reading community.
I have been editing about Africa for a long time now and I find that the content about Africa is woefully underdeveloped. Best Wikipedia practice has it that cities and villages are linked to "administrative territorial entities" like provinces and districts and I have added such relations from primary to secondary entities. Adding such information to villages and cities as well is too much for me. The basic principle is that I am being bold in doing so. I do relate to existing items and I have curated a lot of crap data so far. The result is that Wikidata in places differs considerably from Wikipedias, particularly the English Wikipedia.
As topics like the ones about Africa are severely underdeveloped, just adding new data is a 100% improvement even when arguably adding sources is a good thing. By being bold, by starting from a Wikipedia as a base line, it is important to note that not adding sources is established practice in Wikidata.
The issue I raise is that when "another" Wikipedia considers its information superior, it is all too easy to make accusations of adding "fake facts" particularly when it is not obvious that the "other" Wikipedia provides better information. To counter such insular behaviour, it becomes relevant to consider how we can indicate discrepancies between stated facts in any Wikimedia project vis a vis Wikidata. Obviously it would be wonderful when the total of all our projects are considered in a visualisation.
Particularly when a subject is of little interest to our current editor community, the data in the Wikipedias and by inference in Wikidata is weak. Many of the subjects, Africa just as one example, are relevant to a public, both a reading and editing public, that we want to develop. Without tools that help us curate our differences we will rely on insular opinions and every project is only a part of what we aim to achieve in all our projects. We will have a hard time growing our audience.
NB this is an old, old issue and it is not going away. Thanks, GerardM
https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging-fruit-... _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, Thanks for your reply. There is one big issue that you do not address and, it is best explained using a Wikipedia "best practice". The best practice is that a town, a village whatever is known to be in the next level "administrative territorial entities". This is done properly for the first world. Where Wikidata does not hold data, as it often does, it cannot help in info boxes but what I find is that the data of the Wikipedia is wrong for more than 6% when I add information.
It does not matter that the information is fractured; coming from many sources. The data for Egyptian subdivisions is largely in Arabic. This is not something I can curate but it is something that can be presented.
What does matter is that differences between Wikipedias and Wikidata are not noticed. Of particular importance is where the data is biased or wrong. Particularly where the data is wrong and is about "administrative territorial entities", I have had push back because English Wikipedia was said to be wrong [1]... My interpretation of the facts is that the German article was better written but out of date.
In this mail thread, I raise the issue of differences between Wikipedias, differences between projects and Wikidata. Particularly where the data/articles are biased or wrong our quality suffers. When for a subject the error rate is more than 6%, the error rate is more than can be expected of human adding good faith information to a project. The data I am adding at this time supports Wikipedia best practices. It is particularly intended for the "minority languages" [2] but the quality of all our data will be improved when we are aware of the differences and curate them everywhere.
This is distinctly different from the issue with Commons; its data is good enough for its current use case but is what holds it back from becoming the resource you goto because you can "find" what you are seeking.
In a nutshell our problem is that we work in an insular fashion. We do not have ways to find the differences, the errors, the bias between our projects. We could do, suggestions for a basic mechanism have been made. Our quality suffers and it does not need to [3]. Thanks, GerardM
[1] https://ultimategerardm.blogspot.com/2018/07/africagap-where-wikipedias- collide.html [2] https://ultimategerardm.blogspot.com/2018/07/africagap-support-for- minority-languages.html [3] https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging- fruit-from.html
On 16 July 2018 at 05:41, Alessandro Marchetti via Wikimedia-l < wikimedia-l@lists.wikimedia.org> wrote:
yes, it is an old issue, what you say it's right but I would be more optimistic. To summarize my view (I couls send you more information privately)
- Wikidata largely reflected what Wikipedia indicated, and that was not
the right way to make it grow, but that was also the past. At the moment, the reference of the content is increasing, the clean-up too. In some areas, wikidata items are also created before the wikipedia articles nowadays.
- new tools are great and will do a lot, but it's users who do the real
tricks. You have to start to bring local users to wikidata, show them how it can be used (automatic infoboxes, fast creation of stubs, automatic lists, detecing missing images). They will start to fix the issues, curating their wikipedia, wikidata and also indirectly influence the other ones.
- IMHO, the wikidata ecosystem is not so bad, it could have more expert
users with real knowledge of topics, but commons with millions of automatically imported files, and tons of poorly described and uncategorized images faces a much worse perspective. You need more tools there than on wikidata, at the moment, if you want to keep some balanced workflow. What is really missing on wikidata are mostly active projects to coordinate and catalyze the ongoing efforts. This one https://www.wikidata.org/wiki/Wikidata:WikiProject_Ancient_Greece made miracles, for example. But I couldn't find one about peer-reviewed researchers or photographers to name a few, at least in the past months. Investing on this aspect would not change the final situation on wikidata (that will be positive for me), but it would speed up the process. it will also influence much more the content on local wikis because it will bring content-related users closer together and increase their wikidata literacy with lower effort. 4. In the end, even with a good high quality wikidata platform, there will always be communities that will not integrated in wikidata massively... but that's also a good thing for pluralism. You can't assume that a discrepancy is always a clue for a mistake (I am sure the examples of your experience are, of course), on the long term some of them are simply effects of gray areas that need to wait to be resolved even at the level of the sources. Insome fields, such as taxonomy, there is some confusion and asymmetric organization of the content and will never be solved easily. But in the other areas they probably will. Alex
Il Domenica 15 Luglio 2018 22:37, Gerard Meijssen <
gerard.meijssen@gmail.com> ha scritto:
Hoi, Wikidata is a reflection of all the Wikimedia projects, particularly the Wikipedias. Both Wikidata and Wikipedia are secondary sources and when two Wikipedias have opposing information on singular information, it is a cop out to state both "opinions" on Wikidata and leave it at that.
Given that Wikidata largely reflects what a Wikipedia indicates, it is important to curate such differences. The first thing to consider is are we interested at all in knowing about "false facts" and then how we can indicate differences to our editing and reading community.
I have been editing about Africa for a long time now and I find that the content about Africa is woefully underdeveloped. Best Wikipedia practice has it that cities and villages are linked to "administrative territorial entities" like provinces and districts and I have added such relations from primary to secondary entities. Adding such information to villages and cities as well is too much for me. The basic principle is that I am being bold in doing so. I do relate to existing items and I have curated a lot of crap data so far. The result is that Wikidata in places differs considerably from Wikipedias, particularly the English Wikipedia.
As topics like the ones about Africa are severely underdeveloped, just adding new data is a 100% improvement even when arguably adding sources is a good thing. By being bold, by starting from a Wikipedia as a base line, it is important to note that not adding sources is established practice in Wikidata.
The issue I raise is that when "another" Wikipedia considers its information superior, it is all too easy to make accusations of adding "fake facts" particularly when it is not obvious that the "other" Wikipedia provides better information. To counter such insular behaviour, it becomes relevant to consider how we can indicate discrepancies between stated facts in any Wikimedia project vis a vis Wikidata. Obviously it would be wonderful when the total of all our projects are considered in a visualisation.
Particularly when a subject is of little interest to our current editor community, the data in the Wikipedias and by inference in Wikidata is weak. Many of the subjects, Africa just as one example, are relevant to a public, both a reading and editing public, that we want to develop. Without tools that help us curate our differences we will rely on insular opinions and every project is only a part of what we aim to achieve in all our projects. We will have a hard time growing our audience.
NB this is an old, old issue and it is not going away. Thanks, GerardM
https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging- fruit-from.html _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi. You said that you find an area where there is a problem. I found another one too, taxonomy, and in this case I am quite sure it won't be solved for a while even without better diagnostic tools. Yet I am optimistic on the long term. I have also found areas where problems were similar to yours, and they were solved. Like the examples of ancient Greece items. In that case you need enough people that knows ancient Greek, possibly, and those can be rare to find as well. For one thing you notice, there are other ones other people noticed. But they also see them improving, we have examples.
As far I can say from my experience, the main issue, if the discrepancies were not structural (that is: in the sources), was not having a super tool. In the end, it was about understanding the sources. Tools help, they are cool, is nice to show them, but you need human resources. For all these possible gaps I can notice, my strategy is to look for people. Sometimes I ask to improve tools based specifically on what these people, the newbies of wikidata, want, not what the "expert users" want. I don't say these people know what is best but they kinda feel what is necessary, especially what is necessary to integrate more users with specific necessary knowledge in the workflow.
So my core advice remain the same: create a dedicated project, ask users interested in the topic, teach them wikidata. You can teach them without a project too, but I guess the project could help.
I made you one example in the private mail, the situation of the Italian hamlets imported by some archive on some minor wikipedians (to pick a theme among possible dozens). Some of them are correct, some of them are weird . They are still there but, as I said, if you want to get rid of the trash I can find you 30 users now willing to clean up in a short amount of time and leave only what has a real meaning. So it's not so bad. I could have written general emails and the structural starting point would have not changed this way.
What I am trying to say is that you probably have around the human resources to tackle most of this cluster of work, you just have to find them. I see the energy inside the communities. Your mail is more centered on the issue, the guideline, the possible tool... it 's not "warm". You don't seem to consider the people who should do the continuous, constant work. You describe something where you are alone and I might say, if I ask this help inside the wikidata community, I have the same feeling sometimes. That is true, since there are many small tasks that are much simpler, very generic tasks that are interesting to write a nerdy post on ablog, or virgin areas ready to be conquered massively importing data from archives... and many established wikidata users prefer to focus on these things. But when I look for users at the level of local communities, I had much less problems, i had good feedback. That's it. And that is why I am basically optimistic.
When I see a situation that is not evolving inside wikidata, my instinct remains to ask around to people who create real content wherever they are. About this specific problem, did you contact the users who created these contents on local wikipedias? 50% of them should have a decent English working proficiency, in my experience. Did you scroll the history of the pages here and there, found the most common usernames dedicated to their creation and maintennace, and left the a message in their user talks? that's what I am trying to understand.
Il Lunedì 16 Luglio 2018 8:13, Gerard Meijssen gerard.meijssen@gmail.com ha scritto:
Hoi,Thanks for your reply. There is one big issue that you do not address and, it is best explained using a Wikipedia "best practice". The best practice is that a town, a village whatever is known to be in the next level "administrative territorial entities". This is done properly for the first world. Where Wikidata does not hold data, as it often does, it cannot help in info boxes but what I find is that the data of the Wikipedia is wrong for more than 6% when I add information. It does not matter that the information is fractured; coming from many sources. The data for Egyptian subdivisions is largely in Arabic. This is not something I can curate but it is something that can be presented. What does matter is that differences between Wikipedias and Wikidata are not noticed. Of particular importance is where the data is biased or wrong. Particularly where the data is wrong and is about "administrative territorial entities", I have had push back because English Wikipedia was said to be wrong [1]... My interpretation of the facts is that the German article was better written but out of date. In this mail thread, I raise the issue of differences between Wikipedias, differences between projects and Wikidata. Particularly where the data/articles are biased or wrong our quality suffers. When for a subject the error rate is more than 6%, the error rate is more than can be expected of human adding good faith information to a project. The data I am adding at this time supports Wikipedia best practices. It is particularly intended for the "minority languages" [2] but the quality of all our data will be improved when we are aware of the differences and curate them everywhere. This is distinctly different from the issue with Commons; its data is good enough for its current use case but is what holds it back from becoming the resource you goto because you can "find" what you are seeking. In a nutshell our problem is that we work in an insular fashion. We do not have ways to find the differences, the errors, the bias between our projects. We could do, suggestions for a basic mechanism have been made. Our quality suffers and it does not need to [3].Thanks, GerardM [1] https://ultimategerardm. blogspot.com/2018/07/ africagap-where-wikipedias- collide.html[2] https://ultimategerardm. blogspot.com/2018/07/ africagap-support-for- minority-languages.html[3] https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html On 16 July 2018 at 05:41, Alessandro Marchetti via Wikimedia-l wikimedia-l@lists.wikimedia.org wrote:
yes, it is an old issue, what you say it's right but I would be more optimistic. To summarize my view (I couls send you more information privately)
1. Wikidata largely reflected what Wikipedia indicated, and that was not the right way to make it grow, but that was also the past. At the moment, the reference of the content is increasing, the clean-up too. In some areas, wikidata items are also created before the wikipedia articles nowadays.
2. new tools are great and will do a lot, but it's users who do the real tricks. You have to start to bring local users to wikidata, show them how it can be used (automatic infoboxes, fast creation of stubs, automatic lists, detecing missing images). They will start to fix the issues, curating their wikipedia, wikidata and also indirectly influence the other ones.
3. IMHO, the wikidata ecosystem is not so bad, it could have more expert users with real knowledge of topics, but commons with millions of automatically imported files, and tons of poorly described and uncategorized images faces a much worse perspective. You need more tools there than on wikidata, at the moment, if you want to keep some balanced workflow. What is really missing on wikidata are mostly active projects to coordinate and catalyze the ongoing efforts. This one https://www.wikidata.org/wiki/ Wikidata:WikiProject_Ancient_ Greece made miracles, for example. But I couldn't find one about peer-reviewed researchers or photographers to name a few, at least in the past months. Investing on this aspect would not change the final situation on wikidata (that will be positive for me), but it would speed up the process. it will also influence much more the content on local wikis because it will bring content-related users closer together and increase their wikidata literacy with lower effort. 4. In the end, even with a good high quality wikidata platform, there will always be communities that will not integrated in wikidata massively... but that's also a good thing for pluralism. You can't assume that a discrepancy is always a clue for a mistake (I am sure the examples of your experience are, of course), on the long term some of them are simply effects of gray areas that need to wait to be resolved even at the level of the sources. Insome fields, such as taxonomy, there is some confusion and asymmetric organization of the content and will never be solved easily. But in the other areas they probably will. Alex
Il Domenica 15 Luglio 2018 22:37, Gerard Meijssen gerard.meijssen@gmail.com ha scritto:
Hoi, Wikidata is a reflection of all the Wikimedia projects, particularly the Wikipedias. Both Wikidata and Wikipedia are secondary sources and when two Wikipedias have opposing information on singular information, it is a cop out to state both "opinions" on Wikidata and leave it at that.
Given that Wikidata largely reflects what a Wikipedia indicates, it is important to curate such differences. The first thing to consider is are we interested at all in knowing about "false facts" and then how we can indicate differences to our editing and reading community.
I have been editing about Africa for a long time now and I find that the content about Africa is woefully underdeveloped. Best Wikipedia practice has it that cities and villages are linked to "administrative territorial entities" like provinces and districts and I have added such relations from primary to secondary entities. Adding such information to villages and cities as well is too much for me. The basic principle is that I am being bold in doing so. I do relate to existing items and I have curated a lot of crap data so far. The result is that Wikidata in places differs considerably from Wikipedias, particularly the English Wikipedia.
As topics like the ones about Africa are severely underdeveloped, just adding new data is a 100% improvement even when arguably adding sources is a good thing. By being bold, by starting from a Wikipedia as a base line, it is important to note that not adding sources is established practice in Wikidata.
The issue I raise is that when "another" Wikipedia considers its information superior, it is all too easy to make accusations of adding "fake facts" particularly when it is not obvious that the "other" Wikipedia provides better information. To counter such insular behaviour, it becomes relevant to consider how we can indicate discrepancies between stated facts in any Wikimedia project vis a vis Wikidata. Obviously it would be wonderful when the total of all our projects are considered in a visualisation.
Particularly when a subject is of little interest to our current editor community, the data in the Wikipedias and by inference in Wikidata is weak. Many of the subjects, Africa just as one example, are relevant to a public, both a reading and editing public, that we want to develop. Without tools that help us curate our differences we will rely on insular opinions and every project is only a part of what we aim to achieve in all our projects. We will have a hard time growing our audience.
NB this is an old, old issue and it is not going away. Thanks, GerardM
https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html ______________________________ _________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org?subject= unsubscribe>
______________________________ _________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org?subject= unsubscribe>
Gerard and Alessandro,
The taxonomy question is very important. I touched on it in the ethnicity categorization discussion:
https://lists.wikimedia.org/pipermail/wikimedia-l/2018-May/090366.html
I suggest that both the Enwiki Categories and Wikidata are most deficient from a utilitarian perspective because of their poor support of the bijection between subject matter experts and their subjects, which is one of the man reasons for the existence of encyclopedias and "Who's Who in ..." references to begin with. This issue has come up more and more in my mentoring, and these two patent applications caught my eye:
IBM: https://patentimages.storage.googleapis.com/ec/a6/fe/b47153da8a0a0d/US201002...
Siemens: https://patentimages.storage.googleapis.com/b0/7b/b1/5bdcddc6370ceb/US201601...
Those are different approaches to the general over-arching problem, pursued as patent applications -- even under the current pro-free (perhaps overly pro-free) software patent reexamination regime -- by those companies because they recognize the centrality of the problem to be solved.
Do you think Wikidata can serve as a unified subject matter expert database?
Best regards, Jim
On Mon, Jul 16, 2018 at 4:56 AM, Alessandro Marchetti via Wikimedia-l wikimedia-l@lists.wikimedia.org wrote:
Hi. You said that you find an area where there is a problem. I found another one too, taxonomy, and in this case I am quite sure it won't be solved for a while even without better diagnostic tools. Yet I am optimistic on the long term. I have also found areas where problems were similar to yours, and they were solved. Like the examples of ancient Greece items. In that case you need enough people that knows ancient Greek, possibly, and those can be rare to find as well. For one thing you notice, there are other ones other people noticed. But they also see them improving, we have examples.
As far I can say from my experience, the main issue, if the discrepancies were not structural (that is: in the sources), was not having a super tool. In the end, it was about understanding the sources. Tools help, they are cool, is nice to show them, but you need human resources. For all these possible gaps I can notice, my strategy is to look for people. Sometimes I ask to improve tools based specifically on what these people, the newbies of wikidata, want, not what the "expert users" want. I don't say these people know what is best but they kinda feel what is necessary, especially what is necessary to integrate more users with specific necessary knowledge in the workflow.
So my core advice remain the same: create a dedicated project, ask users interested in the topic, teach them wikidata. You can teach them without a project too, but I guess the project could help.
I made you one example in the private mail, the situation of the Italian hamlets imported by some archive on some minor wikipedians (to pick a theme among possible dozens). Some of them are correct, some of them are weird . They are still there but, as I said, if you want to get rid of the trash I can find you 30 users now willing to clean up in a short amount of time and leave only what has a real meaning. So it's not so bad. I could have written general emails and the structural starting point would have not changed this way.
What I am trying to say is that you probably have around the human resources to tackle most of this cluster of work, you just have to find them. I see the energy inside the communities. Your mail is more centered on the issue, the guideline, the possible tool... it 's not "warm". You don't seem to consider the people who should do the continuous, constant work. You describe something where you are alone and I might say, if I ask this help inside the wikidata community, I have the same feeling sometimes. That is true, since there are many small tasks that are much simpler, very generic tasks that are interesting to write a nerdy post on ablog, or virgin areas ready to be conquered massively importing data from archives... and many established wikidata users prefer to focus on these things. But when I look for users at the level of local communities, I had much less problems, i had good feedback. That's it. And that is why I am basically optimistic.
When I see a situation that is not evolving inside wikidata, my instinct remains to ask around to people who create real content wherever they are. About this specific problem, did you contact the users who created these contents on local wikipedias? 50% of them should have a decent English working proficiency, in my experience. Did you scroll the history of the pages here and there, found the most common usernames dedicated to their creation and maintennace, and left the a message in their user talks? that's what I am trying to understand.
Il Lunedì 16 Luglio 2018 8:13, Gerard Meijssen <gerard.meijssen@gmail.com> ha scritto:
Hoi,Thanks for your reply. There is one big issue that you do not address and, it is best explained using a Wikipedia "best practice". The best practice is that a town, a village whatever is known to be in the next level "administrative territorial entities". This is done properly for the first world. Where Wikidata does not hold data, as it often does, it cannot help in info boxes but what I find is that the data of the Wikipedia is wrong for more than 6% when I add information. It does not matter that the information is fractured; coming from many sources. The data for Egyptian subdivisions is largely in Arabic. This is not something I can curate but it is something that can be presented. What does matter is that differences between Wikipedias and Wikidata are not noticed. Of particular importance is where the data is biased or wrong. Particularly where the data is wrong and is about "administrative territorial entities", I have had push back because English Wikipedia was said to be wrong [1]... My interpretation of the facts is that the German article was better written but out of date. In this mail thread, I raise the issue of differences between Wikipedias, differences between projects and Wikidata. Particularly where the data/articles are biased or wrong our quality suffers. When for a subject the error rate is more than 6%, the error rate is more than can be expected of human adding good faith information to a project. The data I am adding at this time supports Wikipedia best practices. It is particularly intended for the "minority languages" [2] but the quality of all our data will be improved when we are aware of the differences and curate them everywhere. This is distinctly different from the issue with Commons; its data is good enough for its current use case but is what holds it back from becoming the resource you goto because you can "find" what you are seeking. In a nutshell our problem is that we work in an insular fashion. We do not have ways to find the differences, the errors, the bias between our projects. We could do, suggestions for a basic mechanism have been made. Our quality suffers and it does not need to [3].Thanks, GerardM [1] https://ultimategerardm. blogspot.com/2018/07/ africagap-where-wikipedias- collide.html[2] https://ultimategerardm. blogspot.com/2018/07/ africagap-support-for- minority-languages.html[3] https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html On 16 July 2018 at 05:41, Alessandro Marchetti via Wikimedia-l wikimedia-l@lists.wikimedia.org wrote:
yes, it is an old issue, what you say it's right but I would be more optimistic. To summarize my view (I couls send you more information privately)
Wikidata largely reflected what Wikipedia indicated, and that was not the right way to make it grow, but that was also the past. At the moment, the reference of the content is increasing, the clean-up too. In some areas, wikidata items are also created before the wikipedia articles nowadays.
new tools are great and will do a lot, but it's users who do the real tricks. You have to start to bring local users to wikidata, show them how it can be used (automatic infoboxes, fast creation of stubs, automatic lists, detecing missing images). They will start to fix the issues, curating their wikipedia, wikidata and also indirectly influence the other ones.
IMHO, the wikidata ecosystem is not so bad, it could have more expert users with real knowledge of topics, but commons with millions of automatically imported files, and tons of poorly described and uncategorized images faces a much worse perspective. You need more tools there than on wikidata, at the moment, if you want to keep some balanced workflow. What is really missing on wikidata are mostly active projects to coordinate and catalyze the ongoing efforts. This one https://www.wikidata.org/wiki/ Wikidata:WikiProject_Ancient_ Greece made miracles, for example. But I couldn't find one about peer-reviewed researchers or photographers to name a few, at least in the past months. Investing on this aspect would not change the final situation on wikidata (that will be positive for me), but it would speed up the process. it will also influence much more the content on local wikis because it will bring content-related users closer together and increase their wikidata literacy with lower effort.
In the end, even with a good high quality wikidata platform, there will always be communities that will not integrated in wikidata massively... but that's also a good thing for pluralism. You can't assume that a discrepancy is always a clue for a mistake (I am sure the examples of your experience are, of course), on the long term some of them are simply effects of gray areas that need to wait to be resolved even at the level of the sources. Insome fields, such as taxonomy, there is some confusion and asymmetric organization of the content and will never be solved easily. But in the other areas they probably will.
Alex
Il Domenica 15 Luglio 2018 22:37, Gerard Meijssen <gerard.meijssen@gmail.com> ha scritto:
Hoi, Wikidata is a reflection of all the Wikimedia projects, particularly the Wikipedias. Both Wikidata and Wikipedia are secondary sources and when two Wikipedias have opposing information on singular information, it is a cop out to state both "opinions" on Wikidata and leave it at that.
Given that Wikidata largely reflects what a Wikipedia indicates, it is important to curate such differences. The first thing to consider is are we interested at all in knowing about "false facts" and then how we can indicate differences to our editing and reading community.
I have been editing about Africa for a long time now and I find that the content about Africa is woefully underdeveloped. Best Wikipedia practice has it that cities and villages are linked to "administrative territorial entities" like provinces and districts and I have added such relations from primary to secondary entities. Adding such information to villages and cities as well is too much for me. The basic principle is that I am being bold in doing so. I do relate to existing items and I have curated a lot of crap data so far. The result is that Wikidata in places differs considerably from Wikipedias, particularly the English Wikipedia.
As topics like the ones about Africa are severely underdeveloped, just adding new data is a 100% improvement even when arguably adding sources is a good thing. By being bold, by starting from a Wikipedia as a base line, it is important to note that not adding sources is established practice in Wikidata.
The issue I raise is that when "another" Wikipedia considers its information superior, it is all too easy to make accusations of adding "fake facts" particularly when it is not obvious that the "other" Wikipedia provides better information. To counter such insular behaviour, it becomes relevant to consider how we can indicate discrepancies between stated facts in any Wikimedia project vis a vis Wikidata. Obviously it would be wonderful when the total of all our projects are considered in a visualisation.
Particularly when a subject is of little interest to our current editor community, the data in the Wikipedias and by inference in Wikidata is weak. Many of the subjects, Africa just as one example, are relevant to a public, both a reading and editing public, that we want to develop. Without tools that help us curate our differences we will rely on insular opinions and every project is only a part of what we aim to achieve in all our projects. We will have a hard time growing our audience.
NB this is an old, old issue and it is not going away. Thanks, GerardM
https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html ______________________________ _________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org?subject= unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org?subject= unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, I have no problem adding information to Wikidata. At the moment they are districts of Ghana and I am adding new items, adding missing information like what it is an instance of. I will eventually add the information to several African Wikipedias, the information will be available as Listeria boxes. Consequently this information will be easy to adopt.
I totally agree that a bigger community involved in any subject may make a difference. One great example where we make that difference, where it is acknowledged that there is an issue is the gender gap. I can make a difference improving the coverage of Africa but it only makes a difference when this data is used. Hence, copying the listeria lists to other, particularly African language Wikipedias and pointing out the extend information about a subject like African are problematic on ANY Wikipedia. There is no project big enough or its support for Africa is easily seen as insufficient.
The community that I seek is not only Engish, French or German. It is first and foremost Swahili, Xhosa, Igbo, Yoruba and and and... We lack coverage of any and all subjects that are about Africa. We lack a public both editing and reading in Africa. We do not have infrastructure in Africa that makes Wikipedia zippy. We do not raise money in Africa, we do not spend money in Africa. We need people like Shevon Silva (working on administrative territorial entities). That work will enable an English Wikipedia policy of always linking to the lowest level. It is really important for the African Cinema group to start so that we can share what they find. Developing Listeria lists is cheap, easily transportable and it will bring content to wherever it is welcome. I can do that, I can add information to Wikidata improving lists like this [1] one at a time and so can you.
The big elephant in the room is quality. When a subject is underdeveloped, there are situational errors, systemic errors, errors by omission and errors at entry. Check this query, compare the info from Wikidata with the data at OpenStreetMap and be glad that we have such queries [2]. But a map like this, a query like this, should fit in Listeria lists and articles, we should enable data exploration much more. There is so much more than just text that is too long to read, images, maps with interactive queries will make it much more of .
The big elephant in the room is quality. We do not know the differences between projects, we maintain that "our" Wikipedia is better, that the text of the other is inconsistent EVEN THOUGH it talks about reforms that have not been properly entered into the text. You can expect 6% error rate because of manual entry. When I add Wikipedia data in Wikidata, I will catch some errors and make some others.. the error rate is at least 6%. We would catch these errors when we compare. We could do this but we don't.
The big elephant in the room is quality. This is true for the Internet, it is why a Facebook and Youtube point to us. We could and should do 6% better for their readers, for our readers. We should do this for Africa. Because I do not know the capital cities of Africa and so do you. Facts like these we want to find in Wikipedia and we do not know he capital cities of the administrative territorial entities because I did not put them there and many of these places have no article. Thanks, GerardM
[1] https://en.wikipedia.org/wiki/User:GerardM/Subprefectures_of_Guinea [2] https://www.wikidata.org/wiki/Wikidata:Request_a_query#Third-level_administr...
On 16 July 2018 at 12:56, Alessandro Marchetti alexmar983@yahoo.it wrote:
Hi.
You said that you find an area where there is a problem. I found another one too, taxonomy, and in this case I am quite sure it won't be solved for a while even without better diagnostic tools. Yet I am optimistic on the long term. I have also found areas where problems were similar to yours, and they were solved. Like the examples of ancient Greece items. In that case you need enough people that knows ancient Greek, possibly, and those can be rare to find as well.
For one thing you notice, there are other ones other people noticed. But they also see them improving, we have examples.
As far I can say from my experience, the main issue, if the discrepancies were not structural (that is: in the sources), was not having a super tool. In the end, it was about understanding the sources. Tools help, they are cool, is nice to show them, but you need human resources. For all these possible gaps I can notice, my strategy is to look for people.
Sometimes I ask to improve tools based specifically on what these people, the newbies of wikidata, want, not what the "expert users" want. I don't say these people know what is best but they kinda feel what is necessary, especially what is necessary to integrate more users with specific necessary knowledge in the workflow.
So my core advice remain the same: create a dedicated project, ask users interested in the topic, teach them wikidata. You can teach them without a project too, but I guess the project could help.
I made you one example in the private mail, the situation of the Italian hamlets imported by some archive on some minor wikipedians (to pick a theme among possible dozens). Some of them are correct, some of them are weird . They are still there but, as I said, if you want to get rid of the trash I can find you 30 users now willing to clean up in a short amount of time and leave only what has a real meaning. So it's not so bad. I could have written general emails and the structural starting point would have not changed this way.
What I am trying to say is that you probably have around the human resources to tackle most of this cluster of work, you just have to find them. I see the energy inside the communities. Your mail is more centered on the issue, the guideline, the possible tool... it 's not "warm". You don't seem to consider the people who should do the continuous, constant work. You describe something where you are alone and I might say, if I ask this help inside the wikidata community, I have the same feeling sometimes. That is true, since there are many small tasks that are much simpler, very generic tasks that are interesting to write a nerdy post on ablog, or virgin areas ready to be conquered massively importing data from archives... and many established wikidata users prefer to focus on these things. But when I look for users at the level of local communities, I had much less problems, i had good feedback. That's it. And that is why I am basically optimistic.
When I see a situation that is not evolving inside wikidata, my instinct remains to ask around to people who create real content wherever they are.
About this specific problem, did you contact the users who created these contents on local wikipedias? 50% of them should have a decent English working proficiency, in my experience. Did you scroll the history of the pages here and there, found the most common usernames dedicated to their creation and maintennace, and left the a message in their user talks? that's what I am trying to understand.
Il Lunedì 16 Luglio 2018 8:13, Gerard Meijssen gerard.meijssen@gmail.com ha scritto:
Hoi, Thanks for your reply. There is one big issue that you do not address and, it is best explained using a Wikipedia "best practice". The best practice is that a town, a village whatever is known to be in the next level "administrative territorial entities". This is done properly for the first world. Where Wikidata does not hold data, as it often does, it cannot help in info boxes but what I find is that the data of the Wikipedia is wrong for more than 6% when I add information.
It does not matter that the information is fractured; coming from many sources. The data for Egyptian subdivisions is largely in Arabic. This is not something I can curate but it is something that can be presented.
What does matter is that differences between Wikipedias and Wikidata are not noticed. Of particular importance is where the data is biased or wrong. Particularly where the data is wrong and is about "administrative territorial entities", I have had push back because English Wikipedia was said to be wrong [1]... My interpretation of the facts is that the German article was better written but out of date.
In this mail thread, I raise the issue of differences between Wikipedias, differences between projects and Wikidata. Particularly where the data/articles are biased or wrong our quality suffers. When for a subject the error rate is more than 6%, the error rate is more than can be expected of human adding good faith information to a project. The data I am adding at this time supports Wikipedia best practices. It is particularly intended for the "minority languages" [2] but the quality of all our data will be improved when we are aware of the differences and curate them everywhere.
This is distinctly different from the issue with Commons; its data is good enough for its current use case but is what holds it back from becoming the resource you goto because you can "find" what you are seeking.
In a nutshell our problem is that we work in an insular fashion. We do not have ways to find the differences, the errors, the bias between our projects. We could do, suggestions for a basic mechanism have been made. Our quality suffers and it does not need to [3]. Thanks, GerardM
[1] https://ultimategerardm. blogspot.com/2018/07/ africagap-where-wikipedias- collide.html https://ultimategerardm.blogspot.com/2018/07/africagap-where-wikipedias-collide.html [2] https://ultimategerardm. blogspot.com/2018/07/ africagap-support-for- minority-languages.html https://ultimategerardm.blogspot.com/2018/07/africagap-support-for-minority-languages.html [3] https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging-fruit-from.html
On 16 July 2018 at 05:41, Alessandro Marchetti via Wikimedia-l < wikimedia-l@lists.wikimedia.org> wrote:
yes, it is an old issue, what you say it's right but I would be more optimistic. To summarize my view (I couls send you more information privately)
- Wikidata largely reflected what Wikipedia indicated, and that was not
the right way to make it grow, but that was also the past. At the moment, the reference of the content is increasing, the clean-up too. In some areas, wikidata items are also created before the wikipedia articles nowadays.
- new tools are great and will do a lot, but it's users who do the real
tricks. You have to start to bring local users to wikidata, show them how it can be used (automatic infoboxes, fast creation of stubs, automatic lists, detecing missing images). They will start to fix the issues, curating their wikipedia, wikidata and also indirectly influence the other ones.
- IMHO, the wikidata ecosystem is not so bad, it could have more expert
users with real knowledge of topics, but commons with millions of automatically imported files, and tons of poorly described and uncategorized images faces a much worse perspective. You need more tools there than on wikidata, at the moment, if you want to keep some balanced workflow. What is really missing on wikidata are mostly active projects to coordinate and catalyze the ongoing efforts. This one https://www.wikidata.org/wiki/ Wikidata:WikiProject_Ancient_ Greece https://www.wikidata.org/wiki/Wikidata:WikiProject_Ancient_Greece made miracles, for example. But I couldn't find one about peer-reviewed researchers or photographers to name a few, at least in the past months. Investing on this aspect would not change the final situation on wikidata (that will be positive for me), but it would speed up the process. it will also influence much more the content on local wikis because it will bring content-related users closer together and increase their wikidata literacy with lower effort.
- In the end, even with a good high quality wikidata platform, there will
always be communities that will not integrated in wikidata massively... but that's also a good thing for pluralism. You can't assume that a discrepancy is always a clue for a mistake (I am sure the examples of your experience are, of course), on the long term some of them are simply effects of gray areas that need to wait to be resolved even at the level of the sources. Insome fields, such as taxonomy, there is some confusion and asymmetric organization of the content and will never be solved easily. But in the other areas they probably will. Alex
Il Domenica 15 Luglio 2018 22:37, Gerard Meijssen <
gerard.meijssen@gmail.com> ha scritto:
Hoi, Wikidata is a reflection of all the Wikimedia projects, particularly the Wikipedias. Both Wikidata and Wikipedia are secondary sources and when two Wikipedias have opposing information on singular information, it is a cop out to state both "opinions" on Wikidata and leave it at that.
Given that Wikidata largely reflects what a Wikipedia indicates, it is important to curate such differences. The first thing to consider is are we interested at all in knowing about "false facts" and then how we can indicate differences to our editing and reading community.
I have been editing about Africa for a long time now and I find that the content about Africa is woefully underdeveloped. Best Wikipedia practice has it that cities and villages are linked to "administrative territorial entities" like provinces and districts and I have added such relations from primary to secondary entities. Adding such information to villages and cities as well is too much for me. The basic principle is that I am being bold in doing so. I do relate to existing items and I have curated a lot of crap data so far. The result is that Wikidata in places differs considerably from Wikipedias, particularly the English Wikipedia.
As topics like the ones about Africa are severely underdeveloped, just adding new data is a 100% improvement even when arguably adding sources is a good thing. By being bold, by starting from a Wikipedia as a base line, it is important to note that not adding sources is established practice in Wikidata.
The issue I raise is that when "another" Wikipedia considers its information superior, it is all too easy to make accusations of adding "fake facts" particularly when it is not obvious that the "other" Wikipedia provides better information. To counter such insular behaviour, it becomes relevant to consider how we can indicate discrepancies between stated facts in any Wikimedia project vis a vis Wikidata. Obviously it would be wonderful when the total of all our projects are considered in a visualisation.
Particularly when a subject is of little interest to our current editor community, the data in the Wikipedias and by inference in Wikidata is weak. Many of the subjects, Africa just as one example, are relevant to a public, both a reading and editing public, that we want to develop. Without tools that help us curate our differences we will rely on insular opinions and every project is only a part of what we aim to achieve in all our projects. We will have a hard time growing our audience.
NB this is an old, old issue and it is not going away. Thanks, GerardM
https://ultimategerardm. blogspot.com/2016/01/ wikipedia-lowest-hanging- fruit-from.html https://ultimategerardm.blogspot.com/2016/01/wikipedia-lowest-hanging-fruit-from.html ______________________________ _________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org wikimedia-l-request@lists.wikimedia.org?subject= unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia. org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@ lists.wikimedia.org wikimedia-l-request@lists.wikimedia.org?subject= unsubscribe>
wikimedia-l@lists.wikimedia.org