Wiki-research-l

wiki-research-l@lists.wikimedia.org

8 participants
2988 discussions

Re: [Wiki-research-l] Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning
by Lucie-Aimée Kaffee 07 Apr '18

07 Apr '18

Hi Leila, First of all thanks for your input! > > > Therefore, we worked on producing sentences from the information on > > Wikidata in the given language. We trained a neural network model, the > > details can be found in the preprint of the NAACL paper here: > > https://arxiv.org/abs/1803.07116 > > It would be good to do human (both readers and editors, and perhaps > both sets) evaluations for this research, too, to better understand > how well the model is doing from the perspective of the experienced > editors in some of the smaller languages as well as their readers. (I > acknowledge that finding experienced editors when you go to small > languages can become hard.) > We worked with editors in the follow-up study, to be published at ESWC. https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper… We also asked native speakers for their input on the fluency of the sentences. However, I agree it would be interesting to dive more into the question how the community perceives the ArticlePlaceholder in general and with the generated summary in particular. > > > Furthermore, we would love to hear your input: Do you believe, one > sentence > > summaries are enough, can we serve the communities needs better with more > > than one sentence? > > This is a hard question to answer. :) The answer may rely on many > factors including the language you want to implement such a system in > and the expectation the users of the language have in terms of online > content available to them in their language. > I agree. The best would probably be therefore to study the current usage of ArticlePlaceholder and communities targeted and draw conclusions for real needs from those points. > > > Is this still true if longer abstracts would be of lower > > text quality? > > same as above. You are signing yourself up for more experiments. ;) > > I would be interested to know: > * What is the perception of the readers of a given language about > Wikipedia if a lot of articles that they go to in their language have > one sentence (to a good extent accurate), a few sentences but with > some errors, more sentences with more errors, versus not finding the > article they're interested in at all? > * Related to the above: what is the error threshold beyond which the > brand perceptions will turn negative (to be defined: may be by > measuring if the user returns in the coming week or month.)? This may > well be different in different languages and cultures. > * Depending on the result of the above, we may want to look at > offering the user the option to access that information, but outside > of Wikipedia, or inside Wikipedia but very clearly labeled as Machine > Generated as you do to some extent in these projects. > The questions are very interesting, and in part formalize what we discussed already as well. The best way would be to actually study this with the communities involved, as we started in the ESWC paper, but focus on the different interest groups in particular: readers of Wikipedia, readers coming from outside Wikipedia, editors of Wikipedia and new editors. > > > What other interesting use cases for such a technology in the > > Wikimedia world can you imagine? > > The technology itself can have a variety of use-cases, including > providing captions or summaries of photos even without layers of image > processing applied to them. > This sounds like a very interesting idea. I saw that there is work on image captions by WMF already started, I will be following this with great curiosity :) Best, Lucie > > Best, > Leila > > > [1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and > > https://commons.wikimedia.org/wiki/File:Generating_Article_ > Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_ > Access_to_Free_and_Open_Knowledge.pdf > > [2] > > https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_ > Wikidata_Multilingual.pdf > > > > -- > > Lucie-Aimée Kaffee > > Web and Internet Science Group > > School of Electronics and Computer Science > > University of Southampton > > _______________________________________________ > > Wiki-research-l mailing list > > Wiki-research-l(a)lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton

1 0

Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning
by Lucie-Aimée Kaffee 06 Apr '18

06 Apr '18

Wikimedia as a movement has over the years given consideration to small language Wikipedias. I would like to point you to a recent study I alongside with Hady Elsahar of the Université de Lyon and Pavlos Vougiouklis of the University of Southampton have been pursuing, which has been recently translated to accepted publications. My research interest involves mainly underserved languages on Wikidata and Wikipedia, and how we can support them better. One of the ways to support small Wikipedias was the ArticlePlaceholder [1]. The idea is to use the existing multilingual information in Wikidata [2] and display it in a reader friendly way on Wikipedia in the respective language (if a Wikidata label exists in this language). However, at the moment the data is given only in a tabular form, which is not very reader friendly and might not be the ideal way to engage editors to work on the articles. Therefore, we worked on producing sentences from the information on Wikidata in the given language. We trained a neural network model, the details can be found in the preprint of the NAACL paper here: https://arxiv.org/abs/1803.07116 Given the promising results of the approach using our neural network, we extended the work to see how we could fit in this text generation into the existing ArticlePlaceholder and tested it with the Esperanto and Arabic Wikipedia communities. The ESWC paper preprint for this work can be found here: https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper… We show that our approach is feasible for generating text from Wikidata for Wikipedia. Editors tend to reuse the sentences, which shows it can be a good encouragement to create full articles from those summaries. We would like to implement the work in a test Wikipedia to see if communities are interested in adopting the technology on a large scale in their Wikipedias. Furthermore, we would love to hear your input: Do you believe, one sentence summaries are enough, can we serve the communities needs better with more than one sentence? Is this still true if longer abstracts would be of lower text quality? What other interesting use cases for such a technology in the Wikimedia world can you imagine? And especially if you are part of a underserved language Wikipedia community, what is your opinion to the project? [1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_fro… [2] https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multilin… -- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton

3 2

Wikimedia contributors survey is here: share your feedback
by Edward Galvez 04 Apr '18

04 Apr '18

Hi everyone, The Wikimedia Foundation is asking for your feedback in a survey. We want to know how well we are supporting your work on and off wiki, and how we can change or improve things in the future. The opinions you share will affect the current and future work of the Wikimedia Foundation. If you are volunteer developer, and have contributed code to any pieces of MediaWiki, gadgets, or tools, please complete the survey. It is available in various languages and will take between 20 and 40 minutes to complete. *Follow this link to the Survey:* https://wikimedia.qualtrics.com/jfe/form/SV_5ABs6WwrDHzAeLr?aud=DEV If you have already seen a similar message on Phabricator, Mediawiki.org, Discourse, or other platforms for volunteer developers, please don't take the survey twice. You can find more information about this survey on the project page <https://meta.wikimedia.org/wiki/Community_Engagement_Insights/About_CE_> and see how your feedback helps the Wikimedia Foundation support contributors like you. This survey is hosted by a third-party service and governed by this privacy statement <https://wikimediafoundation.org/wiki/Community_Engagement_Insights_2018_Sur…>. Please visit our frequently asked questions page <https://meta.wikimedia.org/wiki/Community_Engagement_Insights/Frequently_as…> to find more information about this survey. Feel free to email me directly with any questions you may have. Thank you! Edward Galvez -- Edward Galvez Evaluation Strategist, Surveys Learning & Evaluation Community Engagement Wikimedia Foundation

1 0

#MisinfoWeb track at the Web Conference 2018: Panel and Program update
by Giovanni Luca Ciampaglia 31 Mar '18

31 Mar '18

Dear all, As many of you know already, this year the Web Conference <https://www2018.thewebconf.org> will feature an alternate track on *Journalism, Misinformation, and Fact Checking*, jointly organized by Kristina Lerman, Takis Metaxas, and me. We are happy to announce that the final program is up on the website: https://www2018.thewebconf.org/program/misinfoweb/ Aside from the twelve accepted research presentations, we are particularly happy to announce that we have assembled an exciting panel. See below for more information about it. We hope to see you in Lyon, and if you have any question feel free to reach out at misinfochairs(a)www2018.thewebconf.org Cheers, Giovanni, Kristina, and Takis *The effects of “Fake News” on Journalism and Democracy* *Online propaganda and misinformation appeared along with the first search engine in the mid-90’s and it became harder to detect in the last decade with the development of social media applications. Yet, in the last few years it spread widely in the form of the so-called “fake news”, falsehoods online formatted and circulated in such a way that a reader might mistake them for legitimate news articles. How big of a problem is it, how technology and policy can help us address it, and what are the implications for Journalism and Democracy?* - Daniel Funke <https://www.poynter.org/person/dfunke> (Poynter Institute) - Katherine Maher <https://meta.wikimedia.org/wiki/User:Katherine_(WMF)> (Wikimedia Foundation) - P. Takis Metaxas <http://cs.wellesley.edu/~pmetaxas/> (Albright Institute for Global Affairs, Wellesley College) – Moderator - An Xiao Mina <https://about.me/anxiaostudio> (Credibility Coalition and Meedan) - Soroush Vosoughi <http://soroush.mit.edu/> (MIT Media Lab) -- Giovanni Luca Ciampaglia <glciampagl(a)gmail.com> ∙ Assistant Research Scientist IU Network Science Institute <http://iuni.iu.edu/> ∙ glciampaglia.com News 🕫 *WWW 2018* ∙ Alternate track on Journalism, Misinformation, and Fact Checking: https://www2018.thewebconf.org/call-for-papers/misinformation-cfp/

1 0

Growing Wikipedia Across Languages via Recommendation
by James Salsman 28 Mar '18

28 Mar '18

Leila, since https://arxiv.org/pdf/1604.03235.pdf worked so well, is it going to be implemented widely, or is it shelved? Does it need advocates?

2 1

trip report: Wiki Indaba 2018 [partial]
by Leila Zia 27 Mar '18

27 Mar '18

Hi all, Here is the report of the one session I attended in Wiki Indaba over the past weekend: https://meta.wikimedia.org/wiki/User:LZia_(WMF)/Trip_reports#Wiki_Indaba_20… Best, Leila

3 5

Upcoming research newsletter: new papers open for review
by Mohammed Sadat Abdulai 27 Mar '18

27 Mar '18

Hi everyone, We’re preparing for the March 2018 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201803 and add your name next to any paper you are interested in covering. Our target publication date is on April 1 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry in the etherpad. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: • A Brief History of Human Time: Exploring a database of 'notable people' • A Comparison of the Historical Entries in Wikipedia and Baidu Baike • A Hybrid Model for Quality Assessment of Wikipedia Articles • Becoming an online editor: perceived roles and responsibilities of Wikipedia editors • Capturing the influence of geopolitical ties from Wikipedia with reduced Google matrix • Community Detection with Metadata in a Network of Biographies of Western Art Painters • Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders • Is Catalonia an Independent Country? Tracking Implicit Biases in Crowdsourced Knowledge Graphs • Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata • Linking ImageNet WordNet Synsets with Wikidata • Mining Cross-Cultural Differences of Named Entities: A Preliminary Study • Modeling the Wikipedia to Understand the Dynamics of Long Disputes and Biased Articles • Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples • Semantic labeling for quantitative data using Wikidata • Sentiments in Wikipedia Articles for Deletion Discussions • The Pipeline of Online Participation Inequalities: The Case of Wikipedia Editing • The rise and decline" in a population of peer production projects • Towards a Question Answering System over the Semantic Web • Using big data and network analysis to understand Wikipedia article quality • Visualizing the Flow of Discourse with a Concept Ontology Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/ wiki/Research:Newsletter

1 0

Re: [Wiki-research-l] trip report: Wiki Indaba 2018 [partial]
by abdelwaheb turki 26 Mar '18

26 Mar '18

Dear Ms., I thank you for your answer. As a Programme Committee member of WikiIndaba 2018 and as the author of WikiResearch in Africa: Situation and Challenges also presented in the Research Showcase of WikiIndaba 2018, I was honoured to receive your report. I really greet the efforts of Wikimedia Foundation to raise WikiResearch in Africa and would like to contribute in this context. We can discuss about that if you want to. Concerning the role of Wikimedia Foundation concerning and as I already said after your presentation, I think that the matter is the lack of connection between LangCom and African language regulatory institutions. Another matter can be the difficulty of reaching LangCom. In fact, messages from communities to LangCom mailing list take days to be processed by moderators and then published. There is also a problem of contacting LangCom using Phabricator and Meta. Absolutely, such matters should be fixed. Finally, just for information concerning Wikimania proposal about using Wikidata in Medicine, I should inform you that I am Csisc who posted it in Wikidata talk page of Wikimania 2018. Yours Sincerely, Houcemeddine Turki ________________________________ De : Wiki-research-l <wiki-research-l-bounces(a)lists.wikimedia.org> de la part de wiki-research-l-request(a)lists.wikimedia.org <wiki-research-l-request(a)lists.wikimedia.org> Envoyé : vendredi 23 mars 2018 13:00 À : wiki-research-l(a)lists.wikimedia.org Objet : Wiki-research-l Digest, Vol 151, Issue 11 Send Wiki-research-l mailing list submissions to wiki-research-l(a)lists.wikimedia.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/wiki-research-l or, via email, send a message with subject or body 'help' to wiki-research-l-request(a)lists.wikimedia.org You can reach the person managing the list at wiki-research-l-owner(a)lists.wikimedia.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Wiki-research-l digest..." Today's Topics: 1. trip report: Wiki Indaba 2018 [partial] (Leila Zia) 2. Re: trip report: Wiki Indaba 2018 [partial] (Gerard Meijssen) ---------------------------------------------------------------------- Message: 1 Date: Thu, 22 Mar 2018 16:41:46 -0700 From: Leila Zia <leila(a)wikimedia.org> To: Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org> Subject: [Wiki-research-l] trip report: Wiki Indaba 2018 [partial] Message-ID: <CAK0Oe2ufs6Q33Y6thyGua4P3r6MejnXzybrAoNXsimueW7oMyQ(a)mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Hi all, Here is the report of the one session I attended in Wiki Indaba over the past weekend: https://meta.wikimedia.org/wiki/User:LZia_(WMF)/Trip_reports#Wiki_Indaba_20… Best, Leila ------------------------------ Message: 2 Date: Fri, 23 Mar 2018 08:13:31 +0100 From: Gerard Meijssen <gerard.meijssen(a)gmail.com> To: Research into Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org> Subject: Re: [Wiki-research-l] trip report: Wiki Indaba 2018 [partial] Message-ID: <CAO53wxVaE3DPkbbQfeUo1XH1HJETzAzJrzXt=tn_SMTcvLj+SQ(a)mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Hoi, I have read your comments on the WIki Indaba. Sad to hear that you could not make it. As a movement it is not our task to serve the "2000" languages that you mention. It is our task to serve the languages that we support in our existing Wikipedias. The difference is significant. When people aim to help themselves, their culture, their language by investing their efforts in a Wikipedia, we have a process that recognises this and that leads to the start of a Wikipedia. Thanks to the Incubator, translatewiki.net we provide a native interface in all our languages. There are strong arguments why we should invest more in other languages like the top 25 languages minus English and in the other languages. The easiest argument is that English is less than 50% of our traffic. Where you talk about subjects that people are likely to read, there are many predictive models possible. The big issue in current approaches is that they start with what we know from projects particularly the English Wikipedia. The English Wikipedia is biased and consequently many subjects that may be of a higher relevance in other languages or cultures will not be suggested when English Wikipedia and its traffic is the yard stone to measure by. Often there is more and better information in other Wikipedias. Arguably thanks to Wikidata it becomes easier to find a more composite view of the subjects people may be interested in. Anyway, thank you for reporting on your virtual presence; you made a difference in this way. Thanks, GerardM On 23 March 2018 at 00:41, Leila Zia <leila(a)wikimedia.org> wrote: > Hi all, > > Here is the report of the one session I attended in Wiki Indaba over the > past weekend: > https://meta.wikimedia.org/wiki/User:LZia_(WMF)/Trip_ > reports#Wiki_Indaba_2018 > > Best, > Leila > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > ------------------------------ Subject: Digest Footer _______________________________________________ Wiki-research-l mailing list Wiki-research-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ------------------------------ End of Wiki-research-l Digest, Vol 151, Issue 11 ************************************************

3 2

Re: [Wiki-research-l] research on topic coverage?
by James Salsman 26 Mar '18

26 Mar '18

Amy Bruckman wrote: > I was just re-reading Halavais & Lackaff’s 2008 paper on topic coverage in the English Wikipedia. > Has anyone redone or extended that analysis more recently? I've been keeping track of the length articles on https://en.wikipedia.org/wiki/Wikipedia:Short_popular_vital_articles every six months for the past six years. It's great news in that improvements as measured by byte count and controlled for maintenance templates has been growing at a constant rate, basically four bytes per day. I've never published anything on it and don't plan to, hoping that someone who can use the academic publication credit will some day. Plotting ORES scores over time is easy now, and should make it sufficiently interesting to journal editors. My favorite article on the topic is https://onlinelibrary.wiley.com/doi/full/10.1002/asi.23687 It has a lot of citing articles: https://scholar.google.com/scholar?hl=en&um=1&ie=UTF-8&lr&cites=13904159020… > Also, has anyone mapped comparative topic coverage for different languages? Yes, e.g. https://fenix.tecnico.ulisboa.pt/downloadFile/395144380424/popculture-paper… Best regards, Jim

2 1

Proportions of new Wikimedia users by program source
by Pine W 25 Mar '18

25 Mar '18

Hi Research-l folks, Is there a chart that shows the proportions of new users that register on Wikimedia in association with individual campaigns like Wiki Loves Monuments, chapter GLAM activities, education programs, etc? I am particularly interested in knowing what percentage of new users are likely to be unaffiliated with identifiable programs, and likely need to learn how Wikimedia works using exclusively online resources and initially without individualized help. If this information is available for a variety of snapshots in time, and for a variety of individual Wikimedia sites, that would be appreciated. If this information is available with productivity and attrition information for each group on each site, that would be even better. Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine )

3 2

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l