Wiki-research-l

wiki-research-l@lists.wikimedia.org

8 participants
2988 discussions

Editor Activity Analysis & Graphs
by jeph 29 Sep '15

29 Sep '15

Hi All, I been working on graphs to visualize the entire edit activity of in wiki for some time now. I'm documenting all of it at https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap… . The graphs can be viewed at https://cosmiclattes.github.io/wikigraphs/data/wikis.html. Currently only graphs for 'en' have been put up, I'll add the graphs for the wikis soon. Methodology - The editors are split into groups based on the month in which they made their first edit. - The active edit sessions (value or percentage etc) for the groups are then plotted as stacked bars or as a matrix. I've used the canonical definition of an active edit session. The value are + or - .1% of the values on https://stats.wikimedia.org/ Selector - There is a selector on each graph that lets you filter the data in the graph. On moving the cursor to the left end of the selector you will get a resize cursor. The selection can then are moved or redrawn. - In graphs 1,2 the selector filters by percentage. - In graphs 3,4,5 the selector filters by the age of the cohort. Preliminary Finding - Longevity of editors fell drastically starting Jan 06 and has since stabilized at levels from Jan 07. https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Grap… Would you to hear what you guys think of the graphs & any ideas you would have for me. Jeph

7 24

StrepHit IEG proposal: call for support
by Marco Fossati 28 Sep '15

28 Sep '15

[Begging pardon if you read this multiple times] Hello everyone, I would like to draw your attention to the StrepHit IEG proposal, which is now in its final form: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… To cut a long story short, StrepHit is a Natural Language Processing pipeline that understands human language, extracts structured data from raw text and produces Wikidata statements with reference URLs. I have already received support and feedback, but your voice is vital and it can be heard on the project page in multiple ways. If you: 1. like the idea, please click on the *endorse* blue button; 2. want to get involved, please click on the *join* blue button; 3. share your thoughts, please click on the *give feedback* link. Looking forward to your updates. Cheers! -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j

1 0

Upcoming research newsletter (September 2015): new papers open for review
by masssly＠ymail.com 28 Sep '15

28 Sep '15

Hi everybody, We’re preparing for the September 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201509 and add your name next to any paper you are interested in covering. Our target publication date is Wednesday September 30 UTC. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: Editorial Bias in Crowd-Sourced Political Information Disease identification and concept mapping using Wikipedia Recognizing Biographical Sections in Wikipedia The Descent of Pluto: Interactive dynamics, specialization and reciprocity of roles in a Wikipedia debate How will your workload look like in 6 years? Analyzing Wikimedia's workload Gender imbalance and Wikipedia “A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History How much is Wikipedia Lagging Behind News? Measuring the Effectiveness of Wikipedia Articles: How Does Open Content Succeed? Wikipedia entries on fiction and non-propositional knowledge representation Students' use of Wikipedia as an academic resource — Patterns of use and perceptions of usefulness If you have any question about the format or process feel free to get in touch off-list. Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/wiki/Research:Newsletter

4 4

Fwd: [Wikidata] SrepHit IEG proposal: call for support (was Re: [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool)
by Dario Taraborelli 21 Sep '15

21 Sep '15

cross-posting as this might be of interest to people on this list > Begin forwarded message: > > From: Marco Fossati <hell.j.fox(a)gmail.com> > Subject: [Wikidata] SrepHit IEG proposal: call for support (was Re: [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool) > Date: September 21, 2015 at 3:32:25 AM PDT > To: wikidata(a)lists.wikimedia.org > Reply-To: "Discussion list for the Wikidata project." <wikidata(a)lists.wikimedia.org> > > Dear all, > > The StrepHit IEG proposal is now pretty much complete: > https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… > > We have already received support and feedback, but you are the most relevant community and the project needs your specific help. > > Your voice is vital and it can be heard on the project page in multiple ways. If you: > 1. like the idea, please click on the *endorse* blue button; > 2. want to get involved, please click on the *join* blue button; > 3. share your thoughts, please click on the *give feedback* link. > > Looking forward to your updates. > Cheers! > > On 9/9/15 11:39, Marco Fossati wrote: >> Hi Markus, everyone, >> >> The project proposal is currently in active development. >> I would like to focus now on the dissemination of the idea and the >> engagement of the Wikidata community. >> Hence, I would love to gather feedback on the following question: >> >> Does StrepHit sounds interesting and useful for you? >> >> It would be great if you could report your thoughts on the project talk >> page: >> https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statemen… >> >> >> Cheers! >> >> On 9/8/15 2:02 PM, wikidata-request(a)lists.wikimedia.org wrote: >>> Date: Mon, 07 Sep 2015 16:47:16 +0200 >>> From: Markus Krötzsch<markus(a)semantic-mediawiki.org> >>> To: "Discussion list for the Wikidata project." >>> <wikidata(a)lists.wikimedia.org> >>> Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the >>> primary sources tool >>> Message-ID:<55EDA374.2090901(a)semantic-mediawiki.org> >>> Content-Type: text/plain; charset=utf-8; format=flowed >>> >>> Dear Marco, >>> >>> Sounds interesting, but the project page still has a lot of gaps. Will >>> you notify us again when you are done? It is a bit tricky to endorse a >>> proposal that is not finished yet;-) >>> >>> Markus >>> >>> On 04.09.2015 17:01, Marco Fossati wrote: >>>> >[Begging pardon if you have already read this in the Wikidata >>>> project chat] >>>> > >>>> >Hi everyone, >>>> > >>>> >As Wikidatans, we all know how much data quality matters. >>>> >We all know what high quality stands for: statements need to be >>>> >validated via references to external, non-wiki, sources. >>>> > >>>> >That's why the primary sources tool is being developed: >>>> >https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool >>>> >And that's why I am preparing the StrepHit IEG proposal: >>>> >https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… >>>> >>>> > >>>> > >>>> >StrepHit (pronounced "strep hit", means "Statement? repherence it!") is >>>> >a Natural Language Processing pipeline that understands human language, >>>> >extracts structured data from raw text and produces Wikidata statements >>>> >with reference URLs. >>>> > >>>> >As a demonstration to support the IEG proposal, you can find the >>>> >**FBK-strephit-soccer** dataset uploaded to the primary sources tool >>>> >backend. >>>> >It's a small dataset serving the soccer domain use case. >>>> >Please follow the instructions on the project page to activate it and >>>> >start playing with the data. >>>> > >>>> >What is the biggest difference that sets StrepHit datasets apart from >>>> >the currently uploaded ones? >>>> >At least one reference URL is always guaranteed for each statement. >>>> >This means that if StrepHit finds some new statement that was not there >>>> >in Wikidata before, it will always propose its external references. >>>> >We do not want to manually reject all the new statements with no >>>> >reference, right? >>>> > >>>> >If you like the idea, please endorse the StrepHit IEG proposal! >> > > -- > Marco Fossati > http://about.me/marco.fossati > Twitter: @hjfocs > Skype: hell_j > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>

1 0

[job opening] Software Engineer - Research
by Dario Taraborelli 21 Sep '15

21 Sep '15

Hello, lists. You may have heard of projects <https://www.mediawiki.org/wiki/Wikimedia_Research#Highlights> such as revision scoring or article recommendations. We’re now looking for a full-stack software engineer to join Wikimedia Research and support and scale up these and similar projects. Job description below, please help us find the best possible candidates. Dario Software Engineer - Research <http://grnh.se/b12qur> Summary Help us create a world in which every single human being can freely share in the sum of all knowledge. We are a team of scientists and UX researchers at the Wikimedia Foundation using data to understand and empower millions of users – readers, contributors, and donors – who interact with Wikipedia and its sister projects on a daily basis. We turn research questions into publicly shared knowledge, we design and test new technology, we produce data-driven insights to support product and engineering decisions and we publish research informing the organization’s and the movement’s strategy. We are strongly committed to principles of transparency, privacy and collaboration, we use free and open source technology and we collaborate with researchers in the industry and academia. As a full member of the Wikimedia Research department, you will help us build and scale the infrastructure our team needs for research and experimentation, implementing new technology and data-intensive applications. Description Collaborate with researchers to expose algorithms and machine learning systems through APIs and web applications Design, develop, test, and deploy new features, improvements and upgrades to the infrastructure that supports research and powers data-intensive applications. Support our data science team in optimizing computationally intensive data processing. Support our UX research capacity by growing, expanding and maintaining our user testing platform and instrumentation stack. Work in coordination with other infrastructure teams such as Services and Analytics Engineering as well as Product teams to grow and scale research-driven services and applications. Requirements Real world experience writing applications using both scripting (e.g. Python, Javascript, PHP) and compiled languages (e.g. Java, Scala, C, C#) Experience with MySQL/Postgres or similar database technology Experience developing APIs for data retrieval Understanding of basic statistical concepts BS, MS, or PhD in Computer Science, Mathematics, or equivalent work experience Pluses Experience with high-traffic web architectures and operations Production experience with Hadoop and ecosystem technology (Pig, Hive, streaming) Experience with web UI design (Javascript, HTML, CSS) Familiarity with scientific computing libraries in Python and R Experience working with volunteers Big ups if you are a contributor to Wikipedia or other open collaboration projects Show us your stuff! Please provide us with information you feel would be useful to us in gaining a better understanding of your technical background and accomplishments. Links to GitHub, your technical blogs, publications, personal projects, etc. are exceptionally useful. We especially appreciate pointers to your best contributions to open source projects. About the Wikimedia Foundation The Wikimedia Foundation is the non-profit organization that operates Wikipedia, the free encyclopedia. Wikipedia and the other projects operated by the Wikimedia Foundation receive more than 431 million unique visitors per month, making them the 5th most popular web property worldwide. Available in more than 287 languages, Wikipedia contains more than 32 million articles contributed by a global volunteer community of more than 100,000 people. Based in San Francisco, California, the Wikimedia Foundation is an audited, 501(c)(3) charity that is funded primarily through donations and grants. The Wikimedia Foundation was created in 2003 to manage the operation of Wikipedia and its sister projects. It currently employs over 208 staff members. Wikimedia is supported by local chapter organizations in 40 countries or regions. The Wikimedia Foundation offers competitive benefits. Fully paid medical, dental, and vision coverage for employees and their eligible families (yes, fully paid premiums!). A Wellness Program which provides reimbursement for mind, body and soul activities such as fitness memberships, massages, cooking classes and much more. 401(k) retirement plan with matched contributions of 4% of annual salary. More Information http://wikimediafoundation.org <http://wikimediafoundation.org/> http://blog.wikimedia.org <http://blog.wikimedia.org/> http://wikimediafoundation.org/wiki/Vision <http://wikimediafoundation.org/wiki/Vision> About Wikimedia Research https://www.mediawiki.org/wiki/Wikimedia_Research <https://www.mediawiki.org/wiki/Wikimedia_Research> Examples of code https://github.com/wiki-ai/revscoring <https://github.com/wiki-ai/revscoring> https://github.com/wiki-ai/ores <https://github.com/wiki-ai/revscoring> https://github.com/halfak/MediaWiki-Utilities <https://github.com/halfak/MediaWiki-Utilities> https://github.com/halfak/mwstreaming <https://github.com/halfak/mwstreaming> Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org <http://nitens.org/> • @readermeter <http://twitter.com/readermeter>

1 0

Verifying claims about ENWP project size
by Pine W 19 Sep '15

19 Sep '15

Hi researchers, I could use a little help with understanding these dumps: https://dumps.wikimedia.org/enwikisource/latest/ https://dumps.wikimedia.org/enwiki/20150901/ I'm trying to verify the claim that ENWP is the world's largest open text project, and to do that I need to verify that ENWP is larger than English Wikisource. Which files should I be comparing? Are there any other projects that could make a claim to be a larger open text project than ENWP? Perhaps there's a library somewhere that has such a huge volume of out-of-copyright materials that the combined bytes of published text are larger than ENWP? Thanks! Pine

5 6

Fwd: [GLAM] New Free Research Access via the Wikipedia Library
by Shani 18 Sep '15

18 Sep '15

Hi, everyone. Thought this might interest the Wiki-EDU and Wiki-Research communities as well. Cheers, Shani. ---------- Forwarded message ---------- From: Jake Orlowitz <jorlowitz(a)gmail.com> Date: Fri, Sep 18, 2015 at 3:47 AM Subject: [GLAM] New Free Research Access via the Wikipedia Library To: "wikimedia-l(a)lists.wikimedia.org" <wikimedia-l(a)lists.wikimedia.org> Cc: wikipedialibrary-l(a)wikimedia.org, wikipedia-l(a)lists.wikimedia.org, wikimediareference-l(a)lists.wikimedia.org, " wikimediaannounce-l(a)lists.wikimedia.org" < wikimediaannounce-l(a)lists.wikimedia.org>, "Wikimedia & GLAM collaboration [Public]" <glam(a)lists.wikimedia.org>, Wikimedia & Libraries < libraries(a)lists.wikimedia.org> Hi! The Wikipedia Library has several new, free research donations available: * EBSCO (very expansive access to academic, newspaper, and magazine sources): <https://en.wikipedia.org/wiki/WP:EBSCO> * Newspaperarchive.com (historical newspapers from the United States, Canada, UK and 20 other countries--includes Open Access "clippings" feature): <https://en.wikipedia.org/wiki/WP:Newspaperarchive.com> * IMF Elibary (archival collection of IMF reports, studies and research on global economics and development): <https://en.wikipedia.org/wiki/WP:IMF> * Sabinet (large African digital publisher with a wide range of content in English and other European and African languages): <https://en.wikipedia.org/wiki/WP:Sabinet> * Numérique Premium (French language social science and humanities ebook database with many topical collections): <https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Num%C3%A9rique_Premium> * Al Manhal (Arabic and English database with a variety of sources mainly focused on or published in the Middle East): < https://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9… > * Jamalon (Arabic book distributor with physical book delivery to volunteers): < https://ar.wikipedia.org/wiki/%D9%88%D9%8A%D9%83%D9%8A%D8%A8%D9%8A%D8%AF%D9… > Many other are resources are available, including Elsevier ScienceDirect, British Medical Journal, Dynamed, Project MUSE, DeGruyter, Newspapers.com, Highbeam, and HeinOnline. Do better research and help expand the use of high-quality references across Wikipedia projects. Sign up today! The Wikipedia Library Team <http://meta.wikimedia.org/wiki/The_Wikipedia_Library> p.s. We engaged in a thoughtful debate this week about our access donation partnerships. You can read about our view, and we welcome your comments. <https://blog.wikimedia.org/2015/09/16/open-access-in-a-closed-world/> _______________________________________________ GLAM mailing list GLAM(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glam

1 0

new PhD thesis incl. list posting quote
by koltzenburg＠w4w.net 17 Sep '15

17 Sep '15

hi all, hi Aaron, "Wikipedia entries on fiction and non-propositional knowledge representation" (wikified) = my PhD thesis Sept 2015, for a pdf version see http://arxiv.org/abs/1509.04206 abstract in English here: https://en.wikiversity.org/wiki/Wikipedia_entries_on_fiction_and_non- propositional_knowledge_representation for the full text see link to de.wikiversity.org, its talk page is already being used :-) debate welcome anywhere, I also quote from postings to this list, of Feb 2015, Aaron Shaw's https://lists.wikimedia.org/pipermail/wiki-research-l/2015- February/004172.html and mine https://lists.wikimedia.org/pipermail/wiki-research-l/2015- February/004151.html that triggered his, thanks, Aaron! I reference your paper with Mako in a foonote, hope readers will notice it best, Claudia koltzenburg(a)w4w.net

1 0

September 2015 Research Showcase
by Leila Zia 15 Sep '15

15 Sep '15

Hi everyone, The next Research showcase will be live-streamed this Wednesday (tomorrow), September 16 at 11.30 PST. The streaming link is: <http://www>http://www.youtube.com/watch?v=eJk6mxJZhH8 As usual, you can join the conversation on IRC at #wikimedia-research. We look forward to seeing you! Leila This month: <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#September_2015> Morten Warncke-Wang will talk about the misalignment between production and consumption of quality content on Wikipedia, and Besnik Fetahu proposes a news-article suggestion task to improve news coverage in Wikipedia. *Fun or Functional? The Misalignment Between Content Quality and Popularity in Wikipedia* By Morten Warncke-Wang In peer production communities like Wikipedia, individual community members typically decide for themselves where to make contributions, often driven by factors such as “fun” or a belief that “information should be free”. However, the extent to which this bottom-up, interest-driven content production paradigm meets the need of consumers of this content is unclear. In this talk, I analyse four large Wikipedia language editions, finding extensive misalignment between production and consumption of quality content in all of them, and I show how this greatly impacts Wikipedia’s readers. I also examine misalignment in more detail by studying how it relates to specific topics, and to what extent high popularity is related to sudden changes in demand (i.e. “breaking news”). Finally, I discuss technologies and community practices that can help reduce misalignment in Wikipedia. *Automated News Suggestions for Populating Wikipedia Entity Pages* By Besnik Fetahu Wikipedia entity pages are a valuable source of information for direct consumption and for knowledge-base construction, update and maintenance. Facts in these entity pages are typically supported by references. Recent studies show that as much as 20% of the references are from online news sources. However, many entity pages are incomplete even if relevant information is already available in existing news articles. Even for the already present references, there is often a delay between the news article publication time and the reference time. In this work, we therefore look at Wikipedia through the lens of news and propose a novel news-article suggestion task to improve news coverage in Wikipedia, and reduce the lag of newsworthy references. Our work finds direct application, as a precursor, to Wikipedia page generation and knowledge-base acceleration tasks that rely on relevant and high quality input sources. We propose a two-stage supervised approach for suggesting news articles to entity pages for a given state of Wikipedia. First, we suggest news articles to Wikipedia entities (article-entity placement) relying on a rich set of features which take into account the salience and relative authority of entities, and the novelty of news articles to entity pages. Second, we determine the exact section in the entity page for the input article (article-section placement) guided by class-based section templates. We perform an extensive evaluation of our approach based on ground-truth data that is extracted from external references in Wikipedia. We achieve a high precision value of up to 93% in the article-entity suggestion stage and upto 84% for the article-section placement. Finally, we compare our approach against competitive baselines and show significant improvements.

1 0

Anyone using the user_daily_contribs table/API?
by Federico Leva (Nemo) 09 Sep '15

09 Sep '15

See https://phabricator.wikimedia.org/T85984 The user_daily_contribs table (and associated API) is sometimes used for * JavaScript (e.g. CentralNotice) targeting users based on activity in a certain timeframe, * simplification of SQL queries (e.g. [1]), * other? If you use this data/feature or plan to use it, or if you replaced it with something else, your comment on the task is particularly welcome to assess whether to keep it. Nemo [1] https://phabricator.wikimedia.org/diffusion/TLST/browse/master/scripts/user…

2 1

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l