(I know this thread was from many months ago. I've been reading recently about related
issues, and remembered this conversation)
Mackenzie,
That's a great question. Accusations of systemic bias in the deletion of biographies
is one of the most common criticisms of Wikipedia, and it would be great to have data on
this issue.
I would suggest scraping the New Pages feed
(
https://en.wikipedia.org/wiki/Special:NewPages) to get the full text and metadata of all
new articles before any of them get deleted. If you do this a week you'll get a sample
of thousands of articles that you can track longitudinally to see what kinds of articles
survive for, say, 30 days. If an article survives 30 days it's highly likely to stay
for good.
To get the biographies that are created via the Articles for Creation process, you would
need to scrape the New Pages feed for Draft space in addition to Article space.
When analyzing this data set, it would be good to categorize why each bio was deleted, and
maybe filter out the cases in which an article was deleted for being a copyright
violation, attack page, undecipherable nonsense, etc.
I'd also like to echo WereSpielChequers's second point below. Apparent biases in
deletion may reflect biases in article creation. You might be best off working with a
small sample set but analyzing the set really closely (i.e. have humans read every
article) to avoid the "maybe group X had more deleted articles because it had more
crap articles to start with" objection.
Best regards,
Su-Laine
Wikipedia volunteer
On -July-102020, 8:05 AM, "Wiki-research-l on behalf of WereSpielChequers"
<wiki-research-l-bounces(a)lists.wikimedia.org on behalf of
werespielchequers(a)gmail.com> wrote:
Hi Mackenie,
You may be correct in either or both of your hypotheses, but you might also
want to check out two other related ones.
1 Some academic institutions may have an element of misogyny in their HR
policies, leading to such situations as an academic becoming notable for
their work to the point where they merit a Wikipedia article, before they
become a full professor.
2 In Wikipedia's drive to address the gender skew in our content, we may
have some editors creating articles on women who don't yet meet our
notability criteria. Such articles are of course highly likely to be
deleted.
There is another way to approach this, check primary and secondary sources
to see how Wikipedia compares against them. For example, we have articles
on every female Fellow of the Royal Society, and we achieved that almost a
decade ago. I don't know if we yet have articles on all the blokes.. I
expect we have articles on every Nobel Prize Winner by now, but there will
be less well known awards and lists of people in STEM.
One problem in looking at deletion discussions is that they don't always
say what the person is known for, and so you can have confusion between
multiple people of the same name. I was once asked to restore a deleted
article so that someone could look at what was there and see if they could
make a clearer case re the notability of that eminent diplomat. After
looking at the deleted article, I told them not to start from the deleted
bit, and if it was the same person, to emphasise their subsequent career as
a diplomat, rather than their adolescent career as a "pro skateboarder".
So in order to find the articles on deleted female scientists, you either
need a list of deleted female scientists, or to check a lot of other
articles to find which are scientists.
Hope that's useful
WSC
On Fri, 10 Jul 2020 at 00:17, Stuart A. Yeates <syeates(a)gmail.com> wrote:
I recently completed a project writing en.wiki
articles for all female
and indigenous professors in my country, .nz.
I now write pronounless biographies, because there were a significant
number whose gender wasn't apparent from their public persona. My
guess is that women and LGBTIA+ minorities are incentivised to remove
markers of their gender from their online presence to keep a lower
profile to avoid the trolls and bigots.
There were also a number who clearly appeared to be a certain
ethnicity based on their staff photo, but where there were no reliable
sources as to that ethnicity.
I also had a one person ask for their article to be deleted. [If this
is of interest I can send details to you directly, but I will not post
their details to a public forum and ask you refrain from this also.]
I look forward to reading your experimental design taking these
factors into account.
cheers
stuart
--
...let us be heard from red core to black sky
On Fri, 10 Jul 2020 at 06:43, Mackenzie Lemieux
<mackenzie.lemieux(a)gmail.com> wrote:
Dear Wiki Community,
My name is Mackenzie Lemieux and I am a neuroscience researcher at the
Salk
Institute for Biological Studies and I am
interested in exploring biases
on
Wikipedia.
My research hypothesis is that gender or ethnicity mediate the rate of
flagging and deletion of pages for women in STEM. I hope to
retrospectively analyze Wikipedia's deletion history, harvest the
biographical articles about scientists that have been created over the
past
n years and then confirm the gender and ethnicity
of a large sample.
It appears that we can identify deleted pages with Wikipedia's deletion
log
<https://en.wikipedia.org/wiki/Wikipedia:Deletion_log>, but to actually
see
the page that was deleted we need to be members
of one of these Wikipedia
user groups: Administrators
<https://en.wikipedia.org/wiki/Wikipedia:Administrators>, Oversighters
<https://en.wikipedia.org/wiki/Wikipedia:Oversight>, Researchers
<https://en.wikipedia.org/wiki/Wikipedia:Researchers>, Checkusers
<https://en.wikipedia.org/wiki/Wikipedia:CheckUser>.
Does anyone have advice on how to obtain researcher status or is there
anyone willing to collaborate who has access to the data we need?
Warmly,
Mackenzie Lemieux
--
Mackenzie Lemieux
mackenzie.lemieux(a)gmail.com
cell: 416-806-0041
220 Gilmour Avenue
Toronto, Ontario
M6P 3B4
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l