Gaps

List overview All Threads
Download

newer

older

Research Showcase Wednesday,...

Research Project on Real-Time...

Heather Ford

8 Feb 2018 8 Feb '18

9:03 p.m.

Having a look at the new WMF research site, I noticed that it seems that notification and recommendations mechanisms are the key strategy being focused on re. the filling of Wikipedia's content gaps. Having just finished a research project on just this problem and coming to the opposite conclusion i.e. that automated mechanisms were insufficient for solving the gaps problem, I was curious to find out more.

This latest research that I was involved in with colleagues was based on an action research project aiming to fill gaps in topics relating to South Africa. The team tried a range of different strategies discussed in the literature for filling Wikipedia's gaps without any wild success. Automated mechanisms that featured missing and incomplete articles catalysed very few edits.

When looking for related research, it seemed that others had come to a similar conclusion i.e. that automated notification/recommendations alone didn't lead to improvements in particular target areas. That makes me think that a) I just haven't come across the right research or b) that there are different types of gaps and that those different types require different solutions i.e. the difference between filling gaps across language versions, gaps created by incomplete articles about topics for which there are few online/reliable sources is different from the lack of articles about topics for which there are many online/reliable sources, gaps in articles about particular topics, relating to particular geographic areas etc.

Does anyone have any insight here? - either on research that would help practitioners decide how to go about a project of filling gaps in a particular subject area or about whether the key focus of research at the WMF is on filling gaps via automated means such as recommendation and notification mechanisms?

Many thanks!

Best, Heather.

Show replies by date

Leila Zia

8 Feb 8 Feb

9:54 p.m.

Hi Heather,

Thanks for writing. Below are some of my thoughts.

* Whether automatic recommendations work rely heavily on at least a few factors: the users who interact with these recommendations and their level of expertise with editing Wikimedia projects, the quality of the recommendations, how much context is provided as part of the recommendations, incentives, and the design of the platform/tool/etc. where these recommendations get surfaced. The last point is something very critical. Design is key in this context.

* We've had some good success stories with recommendations. As you have seen, the work we did in 2015 shows that you can significantly increase article creation rate (factor of 3.2 without loss in quality) if you do personalized recommendations.[0] Obviously, creation of an article is a task suited more towards the more experienced editors as newcomers. Had we done a similar experiment with newcomers, my gut feeling is that we would have seen a very different result. We also build a recommendation API [1] that is now being used in Content Translation for editors to receive Suggestions on what to edit next. We could see a spike of increase in contributions in the tool after this feature was introduced. somewhere between 8-15% of the contributions through the tool come thanks to the recommendations today.[2] There are other success stories around as well. For example, Ma Commune [3] focuses on helping French Wikipedia editors expand the already existing articles (specific and limited types of articles for now). Recommendations have also worked really well in the context of Wikidata, where contributions can be made through games such as The Distributed Game [4].

* Specifically about the work we do in knowledge gaps, we're at the moment very much focused on the realm of machine in the loop (as opposed to human in the loop) [5]. By this I mean: our aim is to understand what humans are trying to do on Wikimedia projects and bring in machines/algorithms to do what they want to do more easily/efficiently, with least frustration and pain. An example of this approach was when we interviewed a couple of editathon organizers in Africa as part of The Africa Destubathon and learned that they were doing a lot of manual work extracting structures of articles to create templates for newcomers to learn how to expand an already existing article. That's when we became sure that investing on section recommendations actually makes sense (later we learned we can help other projects such as Ma Commune, too, which is great.)

* More recently, Contributors team conducted a research study to understand the needs of Wikipedia editors through in-person interviews with editors. The focus areas coming out of this research [6] suggest that proving in-context help and task recommendations are important.

I hope these pointers help. I know we will talk about these more when we talk next, but if you or others have questions or comments in the mean time, I'd be happy to expand. Just be aware that it's annual planning time around here and we may be slow in responding. :)

Best, Leila

[0] https://arxiv.org/abs/1604.03235 [1] https://www.mediawiki.org/wiki/GapFinder/Developers [2] These numbers are a few months old, I need to get updates. :) [3] https://macommune.wikipedia.fr/ [4] http://magnusmanske.de/wordpress/?p=362 [5] Borrowing the term from Ricardo Baeza-Yates. [6] https://www.mediawiki.org/wiki/New_Editor_Experiences#Focuses

-- Leila Zia Senior Research Scientist Wikimedia Foundation

On Thu, Feb 8, 2018 at 7:03 PM, Heather Ford hfordsa@gmail.com wrote:

...

Having a look at the new WMF research site, I noticed that it seems that notification and recommendations mechanisms are the key strategy being focused on re. the filling of Wikipedia's content gaps. Having just finished a research project on just this problem and coming to the opposite conclusion i.e. that automated mechanisms were insufficient for solving the gaps problem, I was curious to find out more.

This latest research that I was involved in with colleagues was based on an action research project aiming to fill gaps in topics relating to South Africa. The team tried a range of different strategies discussed in the literature for filling Wikipedia's gaps without any wild success. Automated mechanisms that featured missing and incomplete articles catalysed very few edits.

When looking for related research, it seemed that others had come to a similar conclusion i.e. that automated notification/recommendations alone didn't lead to improvements in particular target areas. That makes me think that a) I just haven't come across the right research or b) that there are different types of gaps and that those different types require different solutions i.e. the difference between filling gaps across language versions, gaps created by incomplete articles about topics for which there are few online/reliable sources is different from the lack of articles about topics for which there are many online/reliable sources, gaps in articles about particular topics, relating to particular geographic areas etc.

Does anyone have any insight here? - either on research that would help practitioners decide how to go about a project of filling gaps in a particular subject area or about whether the key focus of research at the WMF is on filling gaps via automated means such as recommendation and notification mechanisms?

Many thanks!

Best, Heather. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Kerry Raymond

10:56 p.m.

I think there are two parts to the problem of filling gaps. Drawing attention to the gaps is half of the problem. The other half of the problem is finding the editor who wants to write that article. For example, I often check on the "missing topics" list for WikiProject Queensland (which is machine-generated by counting the number of redlinks in articles tagged on the Talk page as belonging to that project).

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Queensland/Missing_topic...

This is not a highly sophisticated algorithm but it does result in my thinking "oh well, I am sure I could at least write a stub on that topic" and so I write an article.

But if you look at the first couple of screens of those "most missing" topics, there are lots of racing car drivers. I have no interest whatsoever in racing car drivers, I have no idea what sources might exist or which might be reliable. So as I pick off other topics from the "most missing" list, it has the effect of increasing the density of racing car drivers at the top of the list. Clearly we have a content gap around racing car drivers, but I won't be doing anything about it.

This reinforces the point Leila makes about personalising the recommendations. I think it's more important to target the right people even if the list you present to them isn't overly sophisticated. The right person will be able to mentally filter a list of things vaguely associated with their topic interests. As Leila says, there's probably less benefit in targeting new users to write new articles. But I've started over 4000 articles and I bet 90% are WikiProject Queensland. Show me any list of wanted Queensland topics and I'll probably be willing to write about *many * of them (but not all). Similarly if you look at the categories of the articles I write, the category Queensland Heritage Register will come up a lot (probably 1/3 of my articles are about heritage properties). Probably another 1/3 are articles about Queensland towns/suburbs/localities. I think looking at the categories/projects of the articles people write is a very strong indicator of interest areas. And the more articles they write, the more sure you can be that they are confident about starting new articles (a lot of people are not willing to start new articles but will happily contribute to a stub -- probably had a past bad experience with article creation) and the more you can be sure about their areas of interest.

With the exception of redirects and disambiguation pages, I would think anyone who has started many articles is likely to have easily-inferred topic space interests. For that matter, a lot of people (myself included) talk about their interest areas on their user page, so key words in user pages that fuzzy-match to project names or category names may be another indicator.

However, some of the content gaps on Wikipedia exist because we don't have contributors who are interested in the topic. Given that there is a known difference between the topics that women generally write about compared to men, it's clear that a lack of diversity in editors is likely to lead to content gaps. I would suspect the same is true about other personal characteristics. As an Australian, I am more likely to write about Australian than say Greenland, but I did holiday there last year, so actually I have written a little about Greenland and uploaded some photos, but that's just a "blip" in my contribution profile (and I don't think I started any new articles about Greenland). If we have a content gap about Greenland, maybe we don't have enough Greenlanders to fill it? I think we can't address content gaps unless we also address contributor gaps. This in turn may result in devolving responsibility for things like notability and verifiability down to the Project level. For example, it is often commented that Indigenous Australian topics are a content gap. The problem is a lack of sources. Indigenous Australians did not have a written language so oral sources are very important, but en.Wikipedia isn't keen on oral sources, so there's a content gap that's hard to fill. And I suspect we have very few Indigenous Australians writing for Wikipedia. Statistically 3% of our population self-identifies as Indigenous but they tend to have lower educational attainments which probably makes them less likely to be Wikipedia contributors who, based on the 2011 survey, have above average likelihood of having a university degree.

So I think we have two flavours of content gap, those for which we have active contributors in the broader topic space who may be enticed to write about the missing topics (which is the problem being principally addressed by this area of research), and those where we do not have active contributors.

Kerry

Leila Zia

11:18 p.m.

On Thu, Feb 8, 2018 at 8:56 PM, Kerry Raymond kerry.raymond@gmail.com wrote:

...

I think we can't address content gaps unless we also address contributor gaps.

This is very important. We very likely have reader/consumer gaps, (for sure) content gaps, and contributor gaps and these gaps are connected to each other in ways that we need to much better understand.

Leila

Heather Ford

9 Feb 9 Feb

3:44 a.m.

Thanks so much for the super helpful comments and suggestions, Leila, Kerry! I so appreciate it.

And yes, this is a great way to frame the distinction i.e. that some gaps can be filled by existing contributors (using automated techniques like recommendations) but others can only be filled by bringing in new contributors and/or by creating alternative support mechanisms or incentives (in the way that programmes like GLAM or editing competitions might do). Curious if anyone else on the list has recommendations for research in the latter category... I'm still convinced we need more academic research here :)

Best, Heather.

Dr Heather Ford Senior Lecturer, School of Arts & Media https://sam.arts.unsw.edu.au/, University of New South Wales w: hblog.org / EthnographyMatters.net http://ethnographymatters.net/ / t: @hfordsa http://www.twitter.com/hfordsa

On 9 February 2018 at 12:18, Leila Zia leila@wikimedia.org wrote:

...

On Thu, Feb 8, 2018 at 8:56 PM, Kerry Raymond kerry.raymond@gmail.com wrote:

...
I think we can't address content gaps unless we also address contributor

gaps.

This is very important. We very likely have reader/consumer gaps, (for sure) content gaps, and contributor gaps and these gaps are connected to each other in ways that we need to much better understand.

Leila

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Amir E. Aharoni

3:53 a.m.

Heather,

Thanks for starting this thread.

Where can I read your research that comes to the conclusion that automated mechanisms are insufficient for solving the gaps problem?

Sorry if this was mentioned somewhere already; I sometimes get lost on long emails, and it's possible that I missed it :)

בתאריך 9 בפבר׳ 2018 05:04,‏ "Heather Ford" hfordsa@gmail.com כתב:

Many thanks!

Best, Heather. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Heather Ford

20 Feb 20 Feb

5:28 p.m.

Dear Amir,

I did send this via Twitter, but wanted to send here too in case anyone else is interested. Our paper summarises some of the research on notifications. A pre-print is available here:

https://makebuildplay.files.wordpress.com/2018/02/wp_primary_school_paper_ac...

Happy to chat more and would very much like to chat to others doing research on knowledge gaps on Wikipedia.

Best, Heather.

On 9 February 2018 at 20:53, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:

...

Heather,

Thanks for starting this thread.

Where can I read your research that comes to the conclusion that automated mechanisms are insufficient for solving the gaps problem?

Sorry if this was mentioned somewhere already; I sometimes get lost on long emails, and it's possible that I missed it :)

בתאריך 9 בפבר׳ 2018 05:04,‏ "Heather Ford" hfordsa@gmail.com כתב:

Having a look at the new WMF research site, I noticed that it seems that notification and recommendations mechanisms are the key strategy being focused on re. the filling of Wikipedia's content gaps. Having just finished a research project on just this problem and coming to the opposite conclusion i.e. that automated mechanisms were insufficient for solving the gaps problem, I was curious to find out more.

This latest research that I was involved in with colleagues was based on an action research project aiming to fill gaps in topics relating to South Africa. The team tried a range of different strategies discussed in the literature for filling Wikipedia's gaps without any wild success. Automated mechanisms that featured missing and incomplete articles catalysed very few edits.

When looking for related research, it seemed that others had come to a similar conclusion i.e. that automated notification/recommendations alone didn't lead to improvements in particular target areas. That makes me think that a) I just haven't come across the right research or b) that there are different types of gaps and that those different types require different solutions i.e. the difference between filling gaps across language versions, gaps created by incomplete articles about topics for which there are few online/reliable sources is different from the lack of articles about topics for which there are many online/reliable sources, gaps in articles about particular topics, relating to particular geographic areas etc.

Does anyone have any insight here? - either on research that would help practitioners decide how to go about a project of filling gaps in a particular subject area or about whether the key focus of research at the WMF is on filling gaps via automated means such as recommendation and notification mechanisms?

Many thanks!

Best, Heather. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Jonathan Morgan

6:09 p.m.

Thanks, Heather! This looks super interesting and relevant. I look forward to reading it :)

Jonathan

On Tue, Feb 20, 2018 at 3:28 PM, Heather Ford hfordsa@gmail.com wrote:

...

Dear Amir,

I did send this via Twitter, but wanted to send here too in case anyone else is interested. Our paper summarises some of the research on notifications. A pre-print is available here:

https://makebuildplay.files.wordpress.com/2018/02/wp_primary_school_paper_ acceptedv.pdf

Happy to chat more and would very much like to chat to others doing research on knowledge gaps on Wikipedia.

Best, Heather.

Dr Heather Ford Senior Lecturer, School of Arts & Media https://sam.arts.unsw.edu.au/, University of New South Wales w: hblog.org / EthnographyMatters.net http://ethnographymatters.net/ / t: @hfordsa http://www.twitter.com/hfordsa

On 9 February 2018 at 20:53, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il

...
wrote:

...
Heather,

Thanks for starting this thread.

Where can I read your research that comes to the conclusion that

automated

...
mechanisms are insufficient for solving the gaps problem?

Sorry if this was mentioned somewhere already; I sometimes get lost on

long

...
emails, and it's possible that I missed it :)

בתאריך 9 בפבר׳ 2018 05:04,‏ "Heather Ford" hfordsa@gmail.com כתב:

Having a look at the new WMF research site, I noticed that it seems that notification and recommendations mechanisms are the key strategy being focused on re. the filling of Wikipedia's content gaps. Having just finished a research project on just this problem and coming to the

opposite

...
conclusion i.e. that automated mechanisms were insufficient for solving

the

...
gaps problem, I was curious to find out more.

This latest research that I was involved in with colleagues was based on

an

...
action research project aiming to fill gaps in topics relating to South Africa. The team tried a range of different strategies discussed in the literature for filling Wikipedia's gaps without any wild success.

Automated

...
mechanisms that featured missing and incomplete articles catalysed very

few

...
edits.

When looking for related research, it seemed that others had come to a similar conclusion i.e. that automated notification/recommendations alone didn't lead to improvements in particular target areas. That makes me

think

...
that a) I just haven't come across the right research or b) that there

are

...
different types of gaps and that those different types require different solutions i.e. the difference between filling gaps across language versions, gaps created by incomplete articles about topics for which

there

...
are few online/reliable sources is different from the lack of articles about topics for which there are many online/reliable sources, gaps in articles about particular topics, relating to particular geographic areas etc.

Does anyone have any insight here? - either on research that would help practitioners decide how to go about a project of filling gaps in a particular subject area or about whether the key focus of research at the WMF is on filling gaps via automated means such as recommendation and notification mechanisms?

Many thanks!

Best, Heather. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)

2493

Age (days ago)

2505

Last active (days ago)

wiki-research-l@lists.wikimedia.org

7 comments

5 participants

tags (0)

participants (5)

Amir E. Aharoni
Heather Ford
Jonathan Morgan
Kerry Raymond
Leila Zia