Wiki-research-l July 2017

wiki-research-l@lists.wikimedia.org

26 participants
23 discussions

Re: [Wiki-research-l] Recognizing domain experts contribution to Wikipedia
by Alex Yarovoy 24 Jul '17

24 Jul '17

Thank you, iolanda, for highlighting the role of educators in contributing to Wikipedia. We do recognize that teachers often have highly relevant domain expertise. Moreover, in many cases teachers are better than academics at providing an overview of a topic in a balanced way. Your approach for involving academic experts is interesting (i.e. having them produce scientific reports that are linked - but not part of - the wiki entry) Thank you for sharing the links and report.We would be very interested in reading your paper on the topic; please do share it when ready. As for our own project - As we wrote before, we take a very narrow definition of formal expertise and focus on those publishing research papers on the particular topic of the article they are contributing to. Investigating the role of teachers in Wikipedia is a worthy quest, but unfortunately it falls outside the scope of our current project. Alex, Einat and Ofer

1 0

Wikipedia sourcing digital knowledgebases?
by Stella Yu 21 Jul '17

21 Jul '17

Hi Everyone, Who here recalls a published report of research that determined Wikipedia was the source of the digital knowledgebases? If my memory serves me correctly, it was some amazingly huge number like 90%+ use Wikipedia as its source for content. Can anyone help with the URL of the report? Thanks! Sincere regards, Stella -- Stella Yu | STELLARESULTS | 415 690 7827 "Chronicling heritage brands and legendary people."

1 0

SEMANTiCS 2017, Amsterdam, Sep 11-14, Extended Submission Deadline July 25, 2017
by Sebastian Hellmann 20 Jul '17

20 Jul '17

***DEADLINE EXTENSION*** 2nd Call for Posters & Demos SEMANTiCS 2017 - The Linked Data Conference 13th International Conference on Semantic Systems Amsterdam, Netherlands September 11 -14, 2017 http://2017.semantics.cc For details please go to: https://2017.semantics.cc/calls Important Dates (Posters & Demos Track): *Submission Deadline: extended: July 25, 2017 (11:59 pm, Hawaii time) *Notification of Acceptance: August 10, 2017 (11:59 pm, Hawaii time) *Camera-Ready Paper: August 18, 2017 (11:59 pm, Hawaii time) As in the previous years, SEMANTiCS’17 proceedings will be published by ACM ICPS (pending) and CEUR WS proceedings. This year, SEMANTiCS features a special Data Science track, which is an opportunity to bring together researchers and practitioners interested in data science and its intersection with Linked Data to present their ideas and discuss the most important scientific, technical and socio-economical challenges of this emerging field. SEMANTiCS 2017 will especially welcome submissions for the following hot topics: *Metadata, Versioning and Data Quality Management *Semantics for Safety, Security & Privacy *Web Semantics, Linked (Open) Data & schema.org *Corporate Knowledge Graphs *Knowledge Integration and Language Technologies *Economics of Data, Data Services and Data Ecosystems Special Track (please check appropriate topic in submission system) *Data Science Following the success of previous years, we welcome any submissions related but not limited to the following ‘horizontal’ (research) and ‘vertical’ (industries) topics: Horizontals: *Enterprise Linked Data & Data Integration *Knowledge Discovery & Intelligent Search *Business Models, Governance & Data Strategies *Semantics in Big Data *Text Analytics *Data Portals & Knowledge Visualization *Semantic Information Management *Document Management & Content Management *Terminology, Thesaurus & Ontology Management *Smart Connectivity, Networking & Interlinking *Smart Data & Semantics in IoT *Semantics for IT Safety & Security *Semantic Rules, Policies & Licensing *Community, Social & Societal Aspects Data Science Special Track Horizontals: *Large-Scale Data Processing (stream processing, handling large-scale graphs) *Data Analytics (Machine Learning, Predictive Analytics, Network Analytics) *Communicating Data (Data Visualization, UX & Interaction Design, Crowdsourcing) *Cross-cutting Issues (Ethics, Privacy, Security, Provenance) Verticals: *Industry & Engineering *Life Sciences & Health Care *Public Administration *e-Science *Digital Humanities *Galleries, Libraries, Archives & Museums (GLAM) *Education & eLearning *Media & Data Journalism *Publishing, Marketing & Advertising *Tourism & Recreation *Financial & Insurance Industry *Telecommunication & Mobile Services *Sustainable Development: Climate, Water, Air, Ecology *Energy, Smart Homes & Smart Grids *Food, Agriculture & Farming *Safety, Security & Privacy *Transport, Environment & Geospatial Posters & Demos Track The Posters & Demonstrations Track invites innovative work in progress, late-breaking research and innovation results, and smaller contributions in all fields related to the broadly understood Semantic Web. These include submissions on innovative applications with impact on end users such as demos of solutions that users may test or that are yet in the conceptual phase, but are worth discussing, and also applications, use cases or pieces of code that may attract developers and potential research or business partners. This also concerns new data sets made publicly available. The informal setting of the Posters & Demonstrations Track encourages participants to present innovations to the research community, business users and find new partners or clients and engage in discussions about the presented work. Such discussions can be invaluable inputs for the future work of the presenters, while offering conference participants an effective way to broaden their knowledge of the emerging research trends and to network with other researchers. Poster and demo submissions should consist of a paper that describe the work, its contribution to the field or novelty aspects. Submissions must be original and must not have been submitted for publication elsewhere. Accepted papers will be published in HTML (RASH) in CEUR and, as such, the camera-ready version of the papers will be required in HTML, following the poster and demo guidelines (https://goo.gl/3BEpV7). Papers should be submitted through EasyChair (https://easychair.org/conferences/?conf=semantics2017 and should be less than 2200 words in length (equivalent to 4 pages), including the whole content of the paper. For the initial reviewing phase, authors may submit a PDF version of the paper following any layout. After acceptance, authors are required to submit the camera-ready in HTML (RASH). Submissions will be reviewed by experienced and knowledgeable researchers and practitioners; each submission will receive detailed feedback. For demos, we encourage authors to include links enabling the reviewers to test the application or review the component. For details please go to: https://2017.semantics.cc/calls

1 0

Re: [Wiki-research-l] Recognizing domain experts contribution to Wikipedia
by Alex Yarovoy 19 Jul '17

19 Jul '17

Thank you Leila, Stuart, Pine We will follow up on these comments and pointers A few additional words about this research - Our narrow definition of formal expertise focuses on those with academic qualifications who have published a scholarly work (i.e. appears in Google Scholar) in the topic of the specific Wikipedia articles where one was active. We acknowledge that many experts do not have academic qualifications. The choice of "formal" (i.e. academic in this context) expertise enabled a concrete operationalization and measurement. We welcome any ideas for pinpointing informal experts. We are currently in the first phase of research where we try to identify these formal experts. We've spent considerable amount of time in identifying 500 such experts, and now we use machine learning techniques to automatically spot them (preliminary results are quite good). Once this is done, we can start asking interesting questions, such as: - What is the relative role of these formal experts to overall content contributed to Wikipedia? - Are formal experts' contributions "better"? (e.g. survive longer or result in increased quality score (per ORES) - Who are those formal experts? anonymous contributors? registered users? do they take additional roles within the community? - Formal experts' motivation Any other ideas for taking this research forward are more than welcome. Thank you, Ofer, Einat and Alex

4 3

Kaggle competition to forecast Wikipedia article traffic
by Dario Taraborelli 18 Jul '17

18 Jul '17

Wanted to make sure everyone saw this challenge announced by Kaggle: https://www.kaggle.com/c/web-traffic-time-series-forecasting https://twitter.com/kaggle/status/887093338117201923 The timeline: - September 1st, 2017 - Deadline to accept competition rules. - September 1st, 2017 - Team Merger deadline. This is the last day participants may join or merge teams. - September 1st, 2017 - Final dataset is released. - September 10th, 2017 - Final submission deadline. Competition winners will be revealed after November 10, 2017. Dario -- *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

2 1

SEMANTICS2017 - Early registration discount reminder
by Sebastian Hellmann 14 Jul '17

14 Jul '17

Dear all, There are only two weeks left to benefit from the reduced registration fee for the SEMANTICS2017 <https://2017.semantics.cc/> conference in Amsterdam. *To get the discount, please register <https://2017.semantics.cc/prices>* *before July 1th, 2017* Looking forward to meeting you at the conference! Semantics Organizing Team

1 0

Re: [Wiki-research-l] category extraction question
by Marco Fossati 11 Jul '17

11 Jul '17

Hi Leila, I've been working on taxonomy learning from Wikipedia categories in my past research. Here's a recap of the approach I proposed to address the pruning problem you faced. It's a pipeline with a bottom-up direction, i.e., from the leaves up to the root. Stage 1: leaf nodes INPUT = category + category links SQL dumps, like you do 1.1. extract the full set of article pages; 1.2. extract categories that are linked to article pages only, by looking at the outgoing links for each article; 1.3. identify the set of categories with no sub-categories. Stage 2: prominent nodes INPUT = stage 1 output 2.1. traverse the leaf graph, see the algorithm [1]; 2.2. NLP to identify categories that hold is-a relations, i.e., *noun phrases* with *plural head*, inspired by the YAGO approach [2, 3]; 2.3. (optional) set a usage weight based on the number of category interlanguage links (more links = more usage across language chapters). These 2 stages should output the clean dataset you're looking for. Based on that, you can then build the taxonomy. Feel free to ping me if you need more information. Best, Marco [1] Input: L (leaf nodes set) Output: PN (prominent nodes set) for all l in L do isProminent = true; P = getTransitiveParents(l); for all p in P do C = getChildren(p); areAllLeaves = true; for all c in C do if c not in L then areAllLeaves = false; break; end for if areAllLeaves then PN.add(p); isProminent = false; end for if isProminent then PN.add(l); end for return PN [2] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, pages 697–706. ACM, 2007. [3] J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: a spatially and temporally enhanced knowledge base from wikipedia. AI, 194:28–61, 2013. On 7/11/17 03:21, wiki-research-l-request(a)lists.wikimedia.org wrote: > Date: Mon, 10 Jul 2017 18:20:47 -0700 > From: Leila Zia<leila(a)wikimedia.org> > To: Research into Wikimedia content and communities > <wiki-research-l(a)lists.wikimedia.org> > Subject: [Wiki-research-l] category extraction question > Message-ID: > <CAK0Oe2s_VDPS3JNLY8_0V+CFeXHMT+0p-VNbsv+0mtD2NMT7dA(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Hi all, > > [If you are not interested in discussions related to the category system > (on English Wikipedia) > , you can stop here. :)] > > We have run into a problem that some of you may have thought about or > addressed before. We are trying to clean up the category system on English > Wikipedia by turning the category structure to an IS-A hierarchy. (The > output of this work can be useful for the research on template > recommendation [1], for example, but the use-cases won't stop there). One > issue that we are facing is the following: > > We are currently > using > SQL dumps to extract categories associated with every article on English > Wikipedia (main namespace). [2] > Using this approach, we get 5 categories associated with Flow cytometry > bioinformatics article [3]: > > Flow_cytometry > Bioinformatics > > Wikipedia_articles_published_in_peer-reviewed_literature > Wikipedia_articles_published_in_PLOS_Computational_Biology > CS1_maint:_Multiple_names:_authors_list > > The problem is that only the first two categories are the ones we are > interested in. We have one cleaning step through which we only keep > categories that belong to category Article and that step removes the last > category above, but the other two Wikipedia_... remain there. We need to > somehow prune the data and clean it from those two categories. > > One way we could do the above would be to parse wikitext instead of the SQL > dumps and focus on extracting categories marked by pattern [[Category:XX]], > but in that case, we would lose a good category such as > Guided_missiles_of_Norway > because that's generated by a template. > > Any ideas on how we can start with a "cleaner" dataset of categories > related to the topic of the articles as opposed to maintenance related or > other types of categories? > > Thanks, > Leila > > [1]https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia > _stubs_across_languages > > [2] The exact code we use is > > SELECT p.page_id id, p.page_title title, cl.cl_to category > FROM categorylinks cl > JOIN page p > on cl.cl_from = p.page_id > where cl_type = 'page' > and page_namespace = 0 > and page_is_redirect = 0 > > and the edges of the category graph are extracted with > > *SELECT p.page_title category, cl.cl_to parent * > *FROM categorylinks cl * > *JOIN page p * > *ON p.page_id = cl.cl_from * > *where p.page_namespace = 14* > > > [3]https://en.wikipedia.org/wiki/Flow_cytometry_bioinformatics

2 1

Fwd: [Wikimedia-l] [fellowship] Opportunity for people working on "open projects that support a healthy Internet."
by Pine W 10 Jul '17

10 Jul '17

Forwarding. Pine ---------- Forwarded message ---------- From: Melody Kramer <mkramer(a)wikimedia.org> Date: Mon, Jul 10, 2017 at 2:26 PM Subject: [Wikimedia-l] [fellowship] Opportunity for people working on "open projects that support a healthy Internet." To: wikimedia-l(a)lists.wikimedia.org Hi all, I wanted to pass along an opportunity that I saw earlier today via Twitter: https://medium.com/read-write-participate/work-in-the-open- with-mozilla-1410be0a83b2 It sets up people working on "open projects that support a healthy Internet" with a mentor, a cohort of like-minded people from all over the world, and a trip to Mozfest, which is a London-based open Internet conference I've attended/presented at in past years and found really mind-expanding due to the cross-disciplinary conversations that take place. You can see previous projects here: https://mozilla.github. io/leadership-training/round-3/projects/ — it looks like there's quite a broad cross-section and many of the projects across the movement might be applicable. The post notes participants will learn about "best practices for project setup and communication, tools for collaboration, community building, and running events." Thank you to Leila for suggesting I pass this along to this listserv. Feel free to share it broadly. - Mel -- Melody Kramer <https://www.mediawiki.org/wiki/User:MKramer_(WMF)> Senior Audience Development Manager Read a random featured article from Wikipedia! <https://en.wikipedia.org/wiki/Special:RandomInCategory/Featured_articles> mkramer(a)wikimedia.org _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

1 0

Re: [Wiki-research-l] EventStreams launch and RCStream deprecation
by Andrew Otto 10 Jul '17

10 Jul '17

Hi all, This is just a friendly reminder that we plan to turn off the RCStream service after July 7th. We’re tracking as best we can the progress of porting clients over at https://phabricator.wikimedia.org/T156919. But, we can only help with what we know about. If you’ve got something still running on RCStream that hasn’t yet ported, let us know, and/or switch soon! Thanks! -Andrew Otto On Wed, Feb 8, 2017 at 9:28 AM, Andrew Otto <otto(a)wikimedia.org> wrote: > Hi everyone! > > Wikimedia is releasing a new service today: EventStreams > <https://wikitech.wikimedia.org/wiki/EventStreams>. This service allows > us to publish arbitrary streams of JSON event data to the public. > Initially, the only stream available will be good ol’ RecentChanges > <https://www.mediawiki.org/wiki/Manual:RCFeed>. This event stream > overlaps functionality already provided by irc.wikimedia.org and RCStream > <https://wikitech.wikimedia.org/wiki/RCStream>. However, this new > service has advantages over these (now deprecated) services. > > > 1. > > We can expose more than just RecentChanges. > 2. > > Events are delivered over streaming HTTP (chunked transfer) instead of > IRC or socket.io. This requires less client side code and fewer > special routing cases on the server side. > 3. > > Streams can be resumed from the past. By using EventSource, a > disconnected client will automatically resume the stream from where it left > off, as long as it resumes within one week. In the future, we would like > to allow users to specify historical timestamps from which they would like > to begin consuming, if this proves safe and tractable. > > > I did say deprecated! Okay okay, we may never be able to fully deprecate > irc.wikimedia.org. It’s used by too many (probably sentient by now) bots > out there. We do plan to obsolete RCStream, and to turn it off in a > reasonable amount of time. The deadline iiiiiis July 7th, 2017. All > services that rely on RCStream should migrate to the HTTP based > EventStreams service by this date. We are committed to assisting you in > this transition, so let us know how we can help. > > Unfortunately, unlike RCStream, EventStreams does not have server side > event filtering (e.g. by wiki) quite yet. How and if this should be done > is still under discussion <https://phabricator.wikimedia.org/T152731>. > > The RecentChanges data you are used to remains the same, and is available > at https://stream.wikimedia.org/v2/stream/recentchange. However, we may > have something different for you, if you find it useful. We have been > internally producing new Mediawiki specific events > <https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…> > for a while now, and could expose these via EventStreams as well. > > Take a look at these events, and tell us what you think. Would you find > them useful? How would you like to subscribe to them? Individually as > separate streams, or would you like to be able to compose multiple event > types into a single stream via an API? These things are all possible. > > I asked for a lot of feedback in the above paragraphs. Let’s try and > centralize this discussion over on the mediawiki.org EventStreams talk > page <https://www.mediawiki.org/wiki/Talk:EventStreams>. In summary, > the questions are: > > > - > > What RCStream clients do you maintain, and how can we help you migrate > to EventStreams? > <https://www.mediawiki.org/wiki/Topic:Tkjkee2j684hkwc9> > - > > Is server side filtering, by wiki or arbitrary event field, useful to > you? <https://www.mediawiki.org/wiki/Topic:Tkjkabtyakpm967t> > - > > Would you like to consume streams other than RecentChanges? > <https://www.mediawiki.org/wiki/Topic:Tkjk4ezxb4u01a61> (Currently > available events are described here > <https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…> > .) > > > > Thanks! > - Andrew Otto > > >

1 1

Recognizing domain experts contribution to Wikipedia
by Alex Yarovoy 09 Jul '17

09 Jul '17

Hi All, I'm a Master student working under the supervision of Drs. Arazy and Minkov (Haifa U) My research explores the extent to which "recognized domain experts" contribute to Wikipedia. (I use a narrow definition for "recognized domain experts" to include those with academic qualifications in the relevant topic). I manually tracked these experts using a variety of sources, and then use machine learning methods for automatically identifying domain experts within Wikipedia editors. I'm writing to explore whether this research is on interest to the community and to learn if other people have already tackled this research question. Thank you in advance for pointing me to relevant research projects Alex

4 4

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l July 2017