Hoi, Many of you have done research on the gender gap in Wikipedia articles. As a result you must have associated articles with people and those people with their gender.
It would be awesome if you would do the following:
- provide us with files that include at least that information. - better, add pertinent information to Wikidata ... at least the fact that they are human and, their sex - It would be stellar when you can identify differences between what you know and what is known in WIkidata
The point is very much that a lot of information is added to Wikidata all the time and when your base line information is known to Wikidata, It will cover Wikipedia that much better.
In your research you may want to look into the current difference in sex between men and women... You can find it all the time, near real time.. Currently there are 150.801 females for 755.747 males known to Wikidata. Yes, you can change the queries to find only female painters or females with India as their nationality.. or males obviously
* male http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
*female http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
When you would like your own database, you can. Thanks, GerardM
I have huge issues with using wikidata in this fashion. The blp and gender guidelines on wiki.en have evolved over a very long time for some very good rreasons.
I invite you to explain for example how culturally appropriate your approach is for non western cultures with more than two genders.
Or the usefulness of describing the gender of prominent transgendered people using a website with no policy against attack pages.
cheers stuart
On 19/04/2014 11:32 PM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, Many of you have done research on the gender gap in Wikipedia articles.
As a result you must have associated articles with people and those people with their gender.
It would be awesome if you would do the following: provide us with files that include at least that information. better, add pertinent information to Wikidata ... at least the fact that
they are human and, their sex
It would be stellar when you can identify differences between what you
know and what is known in WIkidata
The point is very much that a lot of information is added to Wikidata all
the time and when your base line information is known to Wikidata, It will cover Wikipedia that much better.
In your research you may want to look into the current difference in sex
between men and women... You can find it all the time, near real time.. Currently there are 150.801 females for 755.747 males known to Wikidata. Yes, you can change the queries to find only female painters or females with India as their nationality.. or males obviously
- male
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
*female
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
When you would like your own database, you can. Thanks, GerardM
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi, Let me be simple. I do not know what you are talking about.
What I do know is that at Wikidata we harvest information from all Wikipedias. It does include en,wp but it is not exclusively so. It does include the Russian, the Chinese, the Arabic ... all Wikipedias. As you know, the first operational task for Wikidata is to replace the old inter language links. A next objective is to include all the information that is currently held in info boxes.
These are published objectives, they are known for the last two years and, for the last two years work has been underway to do just that. What I am asking for is your research data that identifies people who are associated with articles, not user profiles, so that we will be more complete, more correct sooner rather than later.
PS Wikidata has a thing against including user page information. Thanks, GerardM
On 20 April 2014 00:07, Stuart A. Yeates syeates@gmail.com wrote:
I have huge issues with using wikidata in this fashion. The blp and gender guidelines on wiki.en have evolved over a very long time for some very good rreasons.
I invite you to explain for example how culturally appropriate your approach is for non western cultures with more than two genders.
Or the usefulness of describing the gender of prominent transgendered people using a website with no policy against attack pages.
cheers stuart
On 19/04/2014 11:32 PM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
Hoi, Many of you have done research on the gender gap in Wikipedia articles.
As a result you must have associated articles with people and those people with their gender.
It would be awesome if you would do the following: provide us with files that include at least that information. better, add pertinent information to Wikidata ... at least the fact that
they are human and, their sex
It would be stellar when you can identify differences between what you
know and what is known in WIkidata
The point is very much that a lot of information is added to Wikidata
all the time and when your base line information is known to Wikidata, It will cover Wikipedia that much better.
In your research you may want to look into the current difference in sex
between men and women... You can find it all the time, near real time.. Currently there are 150.801 females for 755.747 males known to Wikidata. Yes, you can change the queries to find only female painters or females with India as their nationality.. or males obviously
- male
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
*female
http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B31%3A5%5D%20A...]
When you would like your own database, you can. Thanks, GerardM
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 20/04/2014 11:05 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
What I do know is that at Wikidata we harvest information from all
Wikipedias. It does include en,wp but it is not exclusively so. It does include the Russian, the Chinese, the Arabic ... all Wikipedias. As you know, the first operational task for Wikidata is to replace the old inter language links. A next objective is to include all the information that is currently held in info boxes.
What process does wikidata have when different wikis have different policies about what should appear by default in infoboxes. In particular when a policy calls for discretion or human judgement?
cheers stuart
Hoi, Again, what is done is harvest from everywhere. Such actionar are not exclusive to Wikidata, it is also done by DBpedia, they have been longer at it and at this time they are better at it. What I hope is that there is quality data in the research done in the past. Making this available will improved quality now.
When Wikipedias have different data about the same item, there will be a process to consoledate it There is no such process at this time. If there is data where you feel that it is "problematic" and it cannot be easily retrieved from Wikipedia, you can opt to not make it available in Wikidata.
At this time more than 50% of all items have one or no statement. From a quality point of view, it is poor but improving. Again, if your data is problematic fine, do consider that we are interested in every subject. Not only people. So if you have data that can be shared please do, it helps.
From where I stand data that "cannot" be shared but can be easily harvested
anyway is openly available. Making this into a big circus does not help. It just supports the unease that many have about academic issues. Thanks, GerardM
On 20 April 2014 01:14, Stuart A. Yeates syeates@gmail.com wrote:
On 20/04/2014 11:05 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
What I do know is that at Wikidata we harvest information from all
Wikipedias. It does include en,wp but it is not exclusively so. It does include the Russian, the Chinese, the Arabic ... all Wikipedias. As you know, the first operational task for Wikidata is to replace the old inter language links. A next objective is to include all the information that is currently held in info boxes.
What process does wikidata have when different wikis have different policies about what should appear by default in infoboxes. In particular when a policy calls for discretion or human judgement?
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, Apr 20, 2014 at 6:38 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Again, what is done is harvest from everywhere. Such actionar are not exclusive to Wikidata, it is also done by DBpedia, they have been longer at it and at this time they are better at it. What I hope is that there is quality data in the research done in the past. Making this available will improved quality now.
Gerard, I havent seen sex/gender properties in DBpedia, yet you said you wanted to batch import sex/gender from the Dutch? DBpedia data.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#DBpedia_...
Can you find an example of DBpedia having gender/sex data for people with a sex/gender identification other than bog standard 'male' vs 'female', and it being accurate?
Hoi,
The problem that I want to solve is that the number of humans that are neither male nor female is currently 419.862 and growing.
I am not particularly interested in other sex/gender issues. I know from a conversation with DBpedia people that they have a property for "penis length".. Not especially my cup of tea either. I was told that that property exists because of an application in combination with French data ...
The reason for the DBpedia headsup is that their bots run repeatedly. This in contrast to what bots have done so far and, they are interested and willing to report the differences they find. This makes their work superior from a quality point of view.
To be blunt, Wikidata gains the quantitative quality I am looking for when only male and female is added where applicable. Transgender issues with respect are edge cases. I am sure somewhere there is a number of the actual number that might represent them all. John indicated that the quality of that data is poor and his example does not provide the current information.
As I indicated before I am interested in fixing the quantitative quality problem that Wikidata has. To fix this, I am among other things interested in the male and female sex. It allows for studies like the ration of male vs female painters, male vs female authors etc. I am not interested to involve myself in the issues around the transgender sexes. As far as I am concerned, everything that can be easily queried from sources is openly available. If it makes sense to not include particular information, it needs to be raised at Wikidata. It is outside my scope to have more as an opinion.
The one reason why I raise the issue of women on Wikidata is that regularly there are edit-a-thons where people write quality articles about notable females. Wikidata should provide adequate information about what this ratio is for the particular fields that are addressed. It is trivial to query Wikidata to find the number of "profession x" for males and females and calculate a ratio. Thanks, GerardM
On 20 April 2014 04:56, John Mark Vandenberg jayvdb@gmail.com wrote:
On Sun, Apr 20, 2014 at 6:38 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Again, what is done is harvest from everywhere. Such actionar are not exclusive to Wikidata, it is also done by DBpedia, they have been longer
at
it and at this time they are better at it. What I hope is that there is quality data in the research done in the past. Making this available will improved quality now.
Gerard, I havent seen sex/gender properties in DBpedia, yet you said you wanted to batch import sex/gender from the Dutch? DBpedia data.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#DBpedia_...
Can you find an example of DBpedia having gender/sex data for people with a sex/gender identification other than bog standard 'male' vs 'female', and it being accurate?
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi all,
I think there are several issues mixed up in this thread.
One is the (sometimes very emotion-laden) of transsexuals, a field I am not very familiar with. However, Wikidata does have several options available in this field (see https://www.wikidata.org/wiki/Property:P21 ), and I am certain the community will find ways to correctly "tag" each individual, given time.
Another issue, which is what Gerard was talking about in his original post is the sheer number of "majority cases" (male or female), and the increasing lack of such information on Wikidata; increasing because items are tagged as "person" more quickly than they are tagged with a gender. Currently, ~31% of people on Wikidata have no gender tag.
The success of Wikidata is tightly coupled to the re-use of its wealth of data, both in Wikimedia projects and by third parties. Completeness of data is very much a factor here; for some research purposes, completeness may even be more important than 100% accuracy. As we have seen on Wikipedia, accuracy will improve over time, if a "critical mass" of contributors can be achieved.
Both Wikipedia and and Wikidata want to collect the world's knowledge. That means (a) importing new data and (b) checking existing data in relation to primary sources. Which is precisely what Gerard asked for.
Cheers, Magnus
On Sun, Apr 20, 2014 at 8:11 AM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi,
The problem that I want to solve is that the number of humans that are neither male nor female is currently 419.862 and growing.
I am not particularly interested in other sex/gender issues. I know from a conversation with DBpedia people that they have a property for "penis length".. Not especially my cup of tea either. I was told that that property exists because of an application in combination with French data ...
The reason for the DBpedia headsup is that their bots run repeatedly. This in contrast to what bots have done so far and, they are interested and willing to report the differences they find. This makes their work superior from a quality point of view.
To be blunt, Wikidata gains the quantitative quality I am looking for when only male and female is added where applicable. Transgender issues with respect are edge cases. I am sure somewhere there is a number of the actual number that might represent them all. John indicated that the quality of that data is poor and his example does not provide the current information.
As I indicated before I am interested in fixing the quantitative quality problem that Wikidata has. To fix this, I am among other things interested in the male and female sex. It allows for studies like the ration of male vs female painters, male vs female authors etc. I am not interested to involve myself in the issues around the transgender sexes. As far as I am concerned, everything that can be easily queried from sources is openly available. If it makes sense to not include particular information, it needs to be raised at Wikidata. It is outside my scope to have more as an opinion.
The one reason why I raise the issue of women on Wikidata is that regularly there are edit-a-thons where people write quality articles about notable females. Wikidata should provide adequate information about what this ratio is for the particular fields that are addressed. It is trivial to query Wikidata to find the number of "profession x" for males and females and calculate a ratio. Thanks, GerardM
On 20 April 2014 04:56, John Mark Vandenberg jayvdb@gmail.com wrote:
On Sun, Apr 20, 2014 at 6:38 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Again, what is done is harvest from everywhere. Such actionar are not exclusive to Wikidata, it is also done by DBpedia, they have been
longer at
it and at this time they are better at it. What I hope is that there is quality data in the research done in the past. Making this available
will
improved quality now.
Gerard, I havent seen sex/gender properties in DBpedia, yet you said you wanted to batch import sex/gender from the Dutch? DBpedia data.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/04#DBpedia_...
Can you find an example of DBpedia having gender/sex data for people with a sex/gender identification other than bog standard 'male' vs 'female', and it being accurate?
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Mon, Apr 21, 2014 at 7:10 AM, Magnus Manske magnusmanske@googlemail.com wrote:
The success of Wikidata is tightly coupled to the re-use of its wealth of data, both in Wikimedia projects and by third parties. Completeness of data is very much a factor here; for some research purposes, completeness may even be more important than 100% accuracy. As we have seen on Wikipedia, accuracy will improve over time, if a "critical mass" of contributors can be achieved.
I'm surprised that the WMF lawyers signed off on this. Deliberately getting the sex of living people wrong seems like the kind of thing litigation is made of. But then again, I'm not a lawyer.
cheers stuart
I don't work for the Foundation. This is my opinion, and I do not require any lawyers to sign off on it. What a sad world that would be.
On Mon, Apr 21, 2014 at 2:06 AM, Stuart A. Yeates syeates@gmail.com wrote:
On Mon, Apr 21, 2014 at 7:10 AM, Magnus Manske magnusmanske@googlemail.com wrote:
The success of Wikidata is tightly coupled to the re-use of its wealth of data, both in Wikimedia projects and by third parties. Completeness of
data
is very much a factor here; for some research purposes, completeness may even be more important than 100% accuracy. As we have seen on Wikipedia, accuracy will improve over time, if a "critical mass" of contributors
can be
achieved.
I'm surprised that the WMF lawyers signed off on this. Deliberately getting the sex of living people wrong seems like the kind of thing litigation is made of. But then again, I'm not a lawyer.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for when only male and female is added where applicable. Transgender issues with respect are edge cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Hoi, I blogged about the issue of sex ratios on Wikidata [1]. The experiment I did with Harvard alumni was to get some idea about the number of humans who were not yet known as human. I added a substantial number of them to have an item for each entry in the category on the English Wikipedia. I assume that as a group they are relatively well covered; they are ivy league and some of the best and brightest studied there. When you look at the sex ratio for the Harvard educated, you will find that it is worse than for the general population. I suppose it is an indication of the amount of items that still need to be identified as human. Thanks, Gerard
[1] http://ultimategerardm.blogspot.nl/2014/04/wikidata-its-sex-ratio.html
On 21 April 2014 00:53, Stuart A. Yeates syeates@gmail.com wrote:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for
when only male and female
is added where applicable. Transgender issues with respect are edge
cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Gerard, Actually historically speaking, there will be fewer Harvard alumni as women because they graduated from Radcliffe, not Harvard, no?
Anyway, how about a trade - I will send you all of my male-female data with Wikipedia entity names, and you send me back the Q numbers? Or can you only accept data with Q numbers as a field?
Jane
2014-04-21 7:58 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, I blogged about the issue of sex ratios on Wikidata [1]. The experiment I did with Harvard alumni was to get some idea about the number of humans who were not yet known as human. I added a substantial number of them to have an item for each entry in the category on the English Wikipedia. I assume that as a group they are relatively well covered; they are ivy league and some of the best and brightest studied there. When you look at the sex ratio for the Harvard educated, you will find that it is worse than for the general population. I suppose it is an indication of the amount of items that still need to be identified as human. Thanks, Gerard
[1] http://ultimategerardm.blogspot.nl/2014/04/wikidata-its-sex-ratio.html
On 21 April 2014 00:53, Stuart A. Yeates syeates@gmail.com wrote:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for
when only male and female
is added where applicable. Transgender issues with respect are edge
cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi, There are only 264 people identified as Radcliffe alumni. Someone did a job on adding this fact to Wikidata so I started off with some 250 already. I completed the list. The category information on Wikidata includes a query that shows you the current number.. There is a similar query on the Harvars alumni category by the way.
http://tools.wmflabs.org/reasonator/?&q=8618565
As to your proposal to have a list and idenfity the Wikidata items from them.. Given that ToolScript does JavaScript, it should be doable. I would ask Magnus to write an example that I could copy and change.. Thanks, GerardM
On 21 April 2014 08:28, Jane Darnell jane023@gmail.com wrote:
Gerard, Actually historically speaking, there will be fewer Harvard alumni as women because they graduated from Radcliffe, not Harvard, no?
Anyway, how about a trade - I will send you all of my male-female data with Wikipedia entity names, and you send me back the Q numbers? Or can you only accept data with Q numbers as a field?
Jane
2014-04-21 7:58 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, I blogged about the issue of sex ratios on Wikidata [1]. The experiment I did with Harvard alumni was to get some idea about the number of humans
who
were not yet known as human. I added a substantial number of them to have an item for each entry in the category on the English Wikipedia. I assume that as a group they are relatively well covered; they are ivy league and some of the best and brightest studied there. When you look at the sex ratio for the Harvard educated, you will find that it is worse than for
the
general population. I suppose it is an indication of the amount of items that still need to be identified as human. Thanks, Gerard
[1]
http://ultimategerardm.blogspot.nl/2014/04/wikidata-its-sex-ratio.html
On 21 April 2014 00:53, Stuart A. Yeates syeates@gmail.com wrote:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for
when only male and female
is added where applicable. Transgender issues with respect are edge
cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Gerard, That link you just sent shows the names in the category (I see there are already a few more than 264 - cool). Could it be possible to have the Q numbers shown as well? Now I see the Q number with mouse-over, but if Magnus (cc'ing him now) could let me screen-scrape those then I can first update my data and then send you my m-f data with Q numbers. Jane
2014-04-21 9:03 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, There are only 264 people identified as Radcliffe alumni. Someone did a job on adding this fact to Wikidata so I started off with some 250 already. I completed the list. The category information on Wikidata includes a query that shows you the current number.. There is a similar query on the Harvars alumni category by the way.
http://tools.wmflabs.org/reasonator/?&q=8618565
As to your proposal to have a list and idenfity the Wikidata items from them.. Given that ToolScript does JavaScript, it should be doable. I would ask Magnus to write an example that I could copy and change.. Thanks, GerardM
On 21 April 2014 08:28, Jane Darnell jane023@gmail.com wrote:
Gerard, Actually historically speaking, there will be fewer Harvard alumni as women because they graduated from Radcliffe, not Harvard, no?
Anyway, how about a trade - I will send you all of my male-female data with Wikipedia entity names, and you send me back the Q numbers? Or can you only accept data with Q numbers as a field?
Jane
2014-04-21 7:58 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, I blogged about the issue of sex ratios on Wikidata [1]. The experiment I did with Harvard alumni was to get some idea about the number of humans
who
were not yet known as human. I added a substantial number of them to have an item for each entry in the category on the English Wikipedia. I assume that as a group they are relatively well covered; they are ivy league and some of the best and brightest studied there. When you look at the sex ratio for the Harvard educated, you will find that it is worse than for
the
general population. I suppose it is an indication of the amount of items that still need to be identified as human. Thanks, Gerard
[1]
http://ultimategerardm.blogspot.nl/2014/04/wikidata-its-sex-ratio.html
On 21 April 2014 00:53, Stuart A. Yeates syeates@gmail.com wrote:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for
when only male and female
is added where applicable. Transgender issues with respect are edge
cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Reasonator, the list of matching items has a link in "You can also browse the list here." That link will take you to AutoList. I have added a new download function (next to "Permalink" and "Embed") that lets you download all the AutoList results with labels, descriptions, and site links in the current language as a tabbed file.
Does that help?
Cheers, Magnus
On Mon, Apr 21, 2014 at 8:51 AM, Jane Darnell jane023@gmail.com wrote:
Gerard, That link you just sent shows the names in the category (I see there are already a few more than 264 - cool). Could it be possible to have the Q numbers shown as well? Now I see the Q number with mouse-over, but if Magnus (cc'ing him now) could let me screen-scrape those then I can first update my data and then send you my m-f data with Q numbers. Jane
2014-04-21 9:03 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, There are only 264 people identified as Radcliffe alumni. Someone did a
job
on adding this fact to Wikidata so I started off with some 250 already. I completed the list. The category information on Wikidata includes a query that shows you the current number.. There is a similar query on the
Harvars
alumni category by the way.
http://tools.wmflabs.org/reasonator/?&q=8618565
As to your proposal to have a list and idenfity the Wikidata items from them.. Given that ToolScript does JavaScript, it should be doable. I
would
ask Magnus to write an example that I could copy and change.. Thanks, GerardM
On 21 April 2014 08:28, Jane Darnell jane023@gmail.com wrote:
Gerard, Actually historically speaking, there will be fewer Harvard alumni as women because they graduated from Radcliffe, not Harvard, no?
Anyway, how about a trade - I will send you all of my male-female data with Wikipedia entity names, and you send me back the Q numbers? Or can you only accept data with Q numbers as a field?
Jane
2014-04-21 7:58 GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, I blogged about the issue of sex ratios on Wikidata [1]. The
experiment
I did with Harvard alumni was to get some idea about the number of
humans
who
were not yet known as human. I added a substantial number of them to have an item for each entry in the category on the English Wikipedia. I assume that as a group they are relatively well covered; they are ivy league and some of the best and brightest studied there. When you look at the sex ratio for the Harvard educated, you will find that it is worse than
for
the
general population. I suppose it is an indication of the amount of items that still need to be identified as human. Thanks, Gerard
[1]
http://ultimategerardm.blogspot.nl/2014/04/wikidata-its-sex-ratio.html
On 21 April 2014 00:53, Stuart A. Yeates syeates@gmail.com wrote:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for
when only male and female
is added where applicable. Transgender issues with respect are edge
cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender
data
on nothing more than the apparent gender of the name on the cover
page
(librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context
and
assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Stuart, we also know that there were women in the arts working in the Renaissance and I wonder how many "Master of <name>" artists were women. In fact, I once spent a long time trying to see if there was any evidence that Geertgen tot Sint Jans was a man, because certain aspects of his life seem quite confusing, but would make sense if "he" was actually a "she". (I have since learned he was recorded in his lifetime as a "he") This is what makes Wikipedia valuable though - we can improve our knowledge of history by updating such biographies as reliable sources become available. What Gerard is asking is that we bring Wikidata up to speed with the rest of the projects on the gender field for biographies. Wikidata is just a reflection of Wikipedia: it is still a wiki and it's OK to have mistakes, as long as we can keep on correcting them. I would rather have the existing data to query than no data at all, because otherwise how can I see the mistakes so I can correct them? Article tracking through Wikidata will become a whole lot easier than article tracking on Wikipedia through categories I think. Jane 2014-04-21 0:53 GMT+02:00, Stuart A. Yeates syeates@gmail.com:
On Sun, Apr 20, 2014 at 7:11 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
To be blunt, Wikidata gains the quantitative quality I am looking for when only male and female is added where applicable. Transgender issues with respect are edge cases.
Transgender issues are primarily raised because they're vitally important for people today, but they're not the only issues.
Far more numerically superior are the issues of people writing under other-gendered pseudonyms; that's a systemic problem, in the GND data for example. "Lord Charles Albert" "Florian Wellesley" and "Currer Bell" were only outed as pseudonyms of Charlotte Brontë once she achieved a certain level of fame. Modern analysis suggests that there are probably thousands if not tens of thousands of other writers who never achieved that level of fame and never had their pseudonyms revealed. GND and similar library data commonly base their gender data on nothing more than the apparent gender of the name on the cover page (librarianship practice, unlike archival practise, takes such things at face value). To take that librarianship practise out of context and assert that that those thousands or tens of thousands of authors were men (rather than just publishing under male or ambiguous names) isn't going to get you sued, but that doesn't mean it's not the white-washing of generations of women writers.
cheers stuart
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Sun, Apr 20, 2014 at 6:14 AM, Stuart A. Yeates syeates@gmail.com wrote:
On 20/04/2014 11:05 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
What I do know is that at Wikidata we harvest information from all Wikipedias. It does include en,wp but it is not exclusively so. It does include the Russian, the Chinese, the Arabic ... all Wikipedias. As you know, the first operational task for Wikidata is to replace the old inter language links. A next objective is to include all the information that is currently held in info boxes.
What process does wikidata have when different wikis have different policies about what should appear by default in infoboxes. In particular when a policy calls for discretion or human judgement?
It doesnt have good processes or good policies, and does have a lot of bots automatically importing data from every wiki.
And this causes the problem you are concerned about Stuart.
here is a sample query of transgender/transsexual people in Wikidata.
http://tools.wmflabs.org/wikidata-todo/autolist.html?props=P21&cat_name=...
They should either make no claims about sex and gender, or have a 'sex/gender' (property 21, or P21) that includes 'transgender' (e.g. Q1052281 = transgender woman), or English Wikipedia is wrong...
https://www.wikidata.org/wiki/Q1052281
It is trivial to find examples where the only P21 claim is female (Q6581072) or male (Q6581097).
e.g. Buck Angel
BeneBot* adds 'male animal'
https://www.wikidata.org/w/index.php?title=Q958281&diff=9027854&oldi...
legobot changes it to '[human] male'
https://www.wikidata.org/w/index.php?title=Q958281&diff=14944633&old...
a non-bot (now blocked on Korean Wikipedia) tried to change this to 'hefemale', and was reverted by Sk!d 12 hours later.
https://www.wikidata.org/w/index.php?title=Q958281&diff=49577100&old... https://www.wikidata.org/w/index.php?title=Q958281&diff=49664043&old...
Obviously 'hefemale' is not the best term, but it should have been corrected to be the more appropriate and more precise trans man.
Here are other bots (SamoaBot & VIAFbot) importing the wrong value from various datasets.
https://www.wikidata.org/w/index.php?title=Q5144952&diff=50988273&ol... https://www.wikidata.org/w/index.php?title=Q4709895&diff=52762663&ol...
Here is a human contributor doing it using Widar (semi-automated tool)
https://www.wikidata.org/w/index.php?title=Q6118283&diff=115936683&o...
These errors are typically all over a year old, without being corrected.
wiki-research-l@lists.wikimedia.org