Wiki-research-l March 2016

wiki-research-l@lists.wikimedia.org

25 participants
31 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

Wikipedia aggregate clickstream data released

by Dario Taraborelli

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

6 years, 3 months

Fwd: [Wikitech-l] statistics about frequent section titles

by Jonathan Morgan

Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this? I'm not aware of any off the top of my head, Amir. - Jonathan ---------- Forwarded message ---------- From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> Date: Sat, Jul 11, 2015 at 3:29 AM Subject: [Wikitech-l] statistics about frequent section titles To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi, Did anybody ever try to collect statistics about frequent section titles in Wikimedia projects? For Wikipedia, for example, titles such as "Biography", "Early life", "Bibliography", "External links", "References", "History", etc., appear in a lot of articles, and their counterparts appear in a lot of languages. There are probably similar things in Wikivoyage, Wiktionary and possibly other projects. Did anybody ever try to collect statistics of the most frequent section titles in each language and project? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬ _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

8 years

Sections of Code of Conduct resolved and Code of Conduct approval process

by Matthew Flaschen

We’ve gotten good participation as we’ve worked on sections of the Code of Conduct over the past few months, and have made considerable improvements to the draft based on your feedback. Given that, and the community approval through the discussions on each section, the best approach is to proceed by approving section-by-section until the last section is done. So, please continue to improve the Code of Conduct by participating now and as future sections are discussed. When the last section is completed and approved on the talk page, the Code of Conduct will become policy and no longer be marked as a draft. Also, two more discussions regarding the Code of Conduct have been resolved and incorporated into the draft. * "Enforcement issues" addressed the reporting process and clarified that Committee decisions could not be circumvented * "Marginalized and underrepresented groups" forbids discrimination Thanks, Matt Flaschen

8 years

Applications open for WikiCite (Berlin, May 25-26, 2016)

by Dario Taraborelli

Citations and references are the building blocks of Wikimedia projects. However, as of today, they are still treated as second-class citizens. Structured data bases such as Wikidata offer a unique opportunity <https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> to turn into reality over a decade of endeavors to build the sum of all citations and bibliographic metadata into a centralized repository. To coordinate upcoming work in this space, we're organizing a technical event in late May and opening up applications for prospective participants. *WikiCite 2016 <https://meta.wikimedia.org/wiki/WikiCite_2016>* is a hands-on event focused on designing data models and technology to *improve the coverage, quality, standards-compliance and machine-readability of citations and source metadata in Wikipedia, Wikidata and other Wikimedia projects*. Our goal, in particular, is to define a technical roadmap for building a repository of all Wikimedia references in Wikidata. We are bringing together Wikidatans, Wikipedians, software engineers, data modelers, and information and library science experts from organizations including *Crossref*, *Zotero*, *CSL*, *ContentMine*, *Google*, *Datacite*, *NISO*, *OCLC* and the *NIH*. We are also inviting academic researchers with experience working with Wikipedia's citations and bibliographic data. WikiCite will be hosted in *Berlin* on *May 25-26, 2016*. Participation to the event is capped at about 50 participants and we expect to have a number of open slots for applicants: - if you were pre-invited and have already filled in a form, you will receive a separate note from the organizers - if you have not been invited but you would like to participate, please fill in this application form <http://goo.gl/forms/Yv6rve2wCt> to give us some information about you and your interest and expected contribution to the event. Please help us pass this on to anyone who has done important technical work on Wikimedia references and citations. *Important dates* - *March 29, 2016*: applications open - *April 11, 2016*: applications close - *April 15, 2016*: notifications of acceptance are issued (if you applied for a travel grant, we'll be able to confirm by this date if we can cover the costs of your trip) For any question, you can contact the organizing committee: wikicite(a)wikimedia.org The organizers, Dario Taraborelli Jonathan Dugan Lydia Pintcher Daniel Mietchen Cameron Neylon *Dario Taraborelli *Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

8 years

Fwd: Please provide feedback on new discrimination and enforcement sections of Code of Conduct

by Matthew Flaschen

I usually send these to multiple lists, but I realized I forgot to send this to the ones besides wikitech-l. The "Marginalized and underrepresented groups" discussion (https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#New_proposed_word…) is still open. I'll probably give it two weeks total, which means closing it late tomorrow. Matt Flaschen -------- Forwarded Message -------- Subject: Please provide feedback on new discrimination and enforcement sections of Code of Conduct Date: Wed, 16 Mar 2016 20:23:24 -0400 From: Matthew Flaschen <mflaschen(a)wikimedia.org> To: Wikitech List <wikitech-l(a)lists.wikimedia.org> Thanks for your participation in the recent Code of Conduct discussions. The "Marginalized and underrepresented groups" discussion had a lot of feedback. There was not consensus to use the exact original wording, but many people expressed willingness to support a modified text. I've proposed such a new text, based on Neil P. Quinn's text, with a small modification to account for discrimination required by law (e.g. age of people who can sign certain contracts). Please participate at https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#New_proposed_word… . The "Enforcement issues" section received general support, but some of that was conditional, or expressed preference for wording that developed during the discussion. The original wording also did not address the appeals body, which was raised in the discussion. Please participate at https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Circumvention_tex… Update regarding completed discussions: The "Clarification of legitimate reasons for publication of private communications and identity protection" and "Definitions - trolling, bad-faith reports" discussions have been closed. They both had support, and I've incorporated the text into the draft. Thanks, Matt

8 years

Upcoming research newsletter (March 2016): new papers open for review

by Mohammed Sadat

Hi everybody, We’re preparing for the March 2016 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201603 and add your name next to any paper you are interested in covering. Our target publication date is Wednesday March 30 UTC although actual publication might happen several days later. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: • Advances in Information Retrieval • Candidate Searching and Key Coreference Resolution for Wikification • Developing an annotator for Latin texts using Wikipedia • "Did i say something wrong?" A word-level analysis of Wikipedia articles for deletion discussions • Gender Biases in Cyberspace: A Two-Stage Model, the New Arena of Wikipedia and Other Websites • Improving Information Literacy Skills through Learning To Use and Edit Wikipedia: A Chemistry Perspective • Motivational determinants of participation trajectories in Wikipedia • Open Content, Linus’ Law, and Neutral Point of View • Teaching with Wikipedia in a 21st-century classroom: Perceptions of Wikipedia and its educational benefits • Wikidata as a semantic framework for the Gene Wiki initiative • Wikipedia in the anti-SOPA protests as a case study of direct, deliberative democracy in cyberspace • *CSCW 2016 Conference proceedings • *Wiki Workshop 2016 proceedings If you have any question about the format or process feel free to get in touch off-list. Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/wiki/Research:Newsletter

8 years, 1 month

SEMANTiCS 2016, Leipzig, Sep 12-15, 2nd Call for Research & Innovation Papers

by Sebastian Hellmann

Apologies for cross-posting 2nd Call for Research & Innovation Papers SEMANTiCS 2016 - The Linked Data Conference Transfer // Engineering // Community 12th International Conference on Semantic Systems Leipzig, Germany September 12 -15, 2016 http://2016.semantics.cc Important Dates (Research & Innovation) * Abstract Submission Deadline: April 14, 2016 (11:59 pm, Hawaii time) * Paper Submission Deadline: April 21, 2016 (11:59 pm, Hawaii time) * Notification of Acceptance: May 26, 2016 (11:59 pm, Hawaii time) * Camera-Ready Paper: June 16, 2016 (11:59 pm, Hawaii time) Submissions via Easychair: https://easychair.org/conferences/?conf=semantics2016research As in the previous years, SEMANTiCS’16 proceedings are expected to be published by ACM ICP. The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, who understand its benefits and encounter its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers and researchers from organisations ranging from NPOs, through public administrations to the largest companies in the world. Attendees learn from industry experts and top researchers about emerging trends and topics in the fields of semantic software, enterprise data, linked data & open data strategies, methodologies in knowledge modelling and text & data analytics. The SEMANTiCS community is highly diverse; attendees have responsibilities in interlinking areas like knowledge management, technical documentation, e-commerce, big data analytics, enterprise search, document management, business intelligence and enterprise vocabulary management. The success of last year’s conference in Vienna with more than 280 attendees from 22 countries proves that SEMANTiCS 2016 will continue a long tradition of bringing together colleagues from around the world. There will be presentations on industry implementations, use case prototypes, best practices, panels, papers and posters to discuss semantic systems in birds-of-a-feather sessions as well as informal settings. SEMANTICS addresses problems common among information managers, software engineers, IT-architects and various specialist departments working to develop, implement and/or evaluate semantic software systems. The SEMANTiCS program is a rich mix of technical talks, panel discussions of important topics and presentations by people who make things work - just like you. In addition, attendees can network with experts in a variety of fields. These relationships provide great value to organisations as they encounter subtle technical issues in any stage of implementation. The expertise gained by SEMANTiCS attendees has a long-term impact on their careers and organisations. These factors make SEMANTiCS for our community the major industry related event across Europe. #SEMANTiCS 2016 will especially welcome submissions for the following hot topics: * Data Quality Management * Data Science (Data Mining, Machine Learning, Network Analytics) * Semantics on the Web, Linked (Open) Data & schema.org * Corporate Knowledge Graphs * Knowledge Integration and Language Technologies * Economics of Data, Data Services and Data Ecosystems Following the success of previous years, the ‘horizontals’ (research) and ‘verticals’ (industries) below are of interest for the conference: Horizontals * Enterprise Linked Data & Data Integration * Knowledge Discovery & Intelligent Search * Business Models, Governance & Data Strategies * Big Data & Text Analytics * Data Portals & Knowledge Visualization * Semantic Information Management * Document Management & Content Management * Terminology, Thesaurus & Ontology Management * Smart Connectivity, Networking & Interlinking * Smart Data & Semantics in IoT * Semantics for IT Safety & Security * Semantic Rules, Policies & Licensing * Community, Social & Societal Aspects Verticals * Industry & Engineering * Life Sciences & Health Care * Public Administration * Galleries, Libraries, Archives & Museums (GLAM) * Education & eLearning * Media & Data Journalism * Publishing, Marketing & Advertising * Tourism & Recreation * Financial & Insurance Industry * Telecommunication & Mobile Services * Sustainable Development: Climate, Water, Air, Ecology * Energy, Smart Homes & Smart Grids * Food, Agriculture & Farming * Safety, Security & Privacy * Transport, Environment & Geospatial #Research / Innovation Papers The Research & Innovation track at SEMANTiCS welcomes the submission of papers on novel scientific research and/or innovations relevant to the topics of the conference. Submissions must be original and must not have been submitted for publication elsewhere. The Research & Innovation track at SEMANTiCS is a single-blind review process (author names are visible to reviewers, reviewers stay anonymous). The submitted abstract and the topics are leveraged to find adequate reviewers for submitted papers. Please write an email to semantics2016researchtrack(a)easychair.org, if you have any questions. Papers should follow the ACM ICPS guidelines for formatting and must not exceed 8 pages in length for full papers and 4 pages for short papers, including references and optional appendices. The layout templates can be found here: http://www.acm.org/sigs/publications/proceedings-templates All accepted full papers and short papers will be published in the digital library of the ACM ICP Series. Research & Innovation papers should be submitted through EasyChair at: https://easychair.org/conferences/?conf=semantics2016research. Papers must be submitted in PDF (Adobe's Portable Document Format) format. Other formats will not be accepted. For the camera-ready version, the source files (Latex, WordPerfect, Word) will also be needed. Important Dates (Research & Innovation) * Abstract Submission Deadline: April 14, 2016 (11:59 pm, Hawaii time) * Paper Submission Deadline: April 21, 2016 (11:59 pm, Hawaii time) * Notification of Acceptance: May 26, 2016 (11:59 pm, Hawaii time) * Camera-Ready Paper: June 16, 2016 (11:59 pm, Hawaii time) Research and Innovation Chairs: * Anna Fensel, University of Innsbruck * Amrapali Zaveri, Stanford University Contact email address: semantics2016researchtrack(a)easychair.org Research and Innovation Deputy Chairs: * Bernhard Haslhofer, Austrian Institute of Technology * Artem Revenko, Semantic Web Company Conference Chairs: * Sebastian Hellmann, AKSW/KILT, InfAI, Leipzig University * Tassilo Pellegrini, UAS St. Pölten Senior Program Committee: * Paul Buitelaar, Insight - National University of Ireland, Galway * Oscar Corcho, Universidad Politécnica de Madrid * Claudia D'Amato, University of Bari * Brian Davis, DERI NUIG * Victor de Boer, VU Amsterdam * Christian Dirschl, Wolters Kluwer Germany * Michel Dumontier, Stanford University * Agata Filipowska, Department of Information Systems, Poznan University of Economics * Bernhard Haslhofer, AIT-Austrian Institute of Technology * Sebastian Hellmann, AKSW/KILT, InfAI, Leipzig University * Andreas Hotho, University of Wuerzburg * Jose Emilio Labra Gayo, Universidad de Oviedo * Peter Mika, Yahoo! Research * Axel-Cyrille Ngonga Ngomo, University of Leipzig * Josiane Xavier Parreira, Siemens AG Österreich * Heiko Paulheim, University of Mannheim * Tassilo Pellegrini, University of Applied Sciences St. Pölten * Marta Sabou, Vienna University of Technology * Harald Sack, Hasso-Plattner-Institute for IT Systems Engineering, University of Potsdam * Pierre-Yves Vandenbussche, Fujitsu * Ruben Verborgh, Ghent University - iMinds * Maria Esther Vidal, Universidad Simon Bolivar, Dept. Computer Science

8 years, 1 month

Re: [Wiki-research-l] [Analytics] [Data Release] [Data Deprecation] [Analytics Dumps]

by Dan Andreescu

On Wed, Mar 23, 2016 at 1:06 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote: > Dan Andreescu, 23/03/2016 15:58: > >> >> *Clean-up:* Analytics data on dumps was crammed into /other with >> unrelated datasets. We made a new page to receive current and future >> datasets [3] and linked to it from /other and /. Please let us know if >> anything there looks confusing or opaque and I'll be happy to clarify. >> > > I assume the old URLs will redirect to the new ones, right? > Good question, we didn't change any old URLs actually, so if you're trying to get to other/pagecounts-ez, other/pagecounts-raw and all that, they're all still there, just linked-to from /analytics. We did it this way because we figured people had scripts that depended on those URLs. We thought about moving and symlinking but it's probably unlikely that we'll ever be able to delete the other/** location. So mainly we just have a new page where we can do a better job of focusing on the analytics datasets.

8 years, 1 month

Re: [Wiki-research-l] [Data Release] [Data Deprecation] [Analytics Dumps]

by Dan Andreescu

cc-ing our friends in research and wikitech (sorry I forgot initially) We're happy to announce a few improvements to Analytics data releases on > dumps.wikimedia.org: > > * We are releasing a new dataset, an estimate of Unique Devices accessing > our projects [1] > * We are officially making available a better Pageviews dataset [2] > * We are deprecating two older pageview statistics datasets > * We moved Analytics data from /other to /analytics [3] > > Details follow: > > > *Unique Devices:* Since 2009, the Wikimedia Foundation used comScore to > report data about unique web visitors. In January 2016, however, we > decided to stop reporting comScore numbers [4] because of certain > limitations in the methodology, these limitations translated into > misreported mobile usage. We are now ready to replace comscore numbers with > the Unique Devices Dataset [5][1]. While unique devices does not equal > unique visitors, it is a good proxy for that metric, meaning that a major > increase in the number of unique devices is likely to come from an increase > in distinct users. We understand that counting uniques raises fairly big > privacy concerns and we use a very private conscious way to count unique > devices, it does not include any cookie by which your browser history can > be tracked [6]. > > We invite you to explore this new dataset and hope it’s helpful for the > Wikimedia community in better understanding our projects. This data can > help measurethe reach of wikimedia projects on the web. > > *Pageviews:* This [2] is the best quality data available for counting the > number of pageviews our projects receive at the article and project level. > We've upgraded from pagecounts-raw to pagecounts-all-sites, and now to > pageviews, in order to filter out more spider traffic and measure something > closer to what we think is a real user viewing content. A short history > might be useful: > > * pagecounts-raw: was maintained by Domas Mituzas originally and taken > over by the analytics team. It was and still is the most used dataset, > though it has some majore problems. It does not count access to the mobile > site, it does not filter out spider or bot traffic, and it suffers from > unknown loss due to logging infrastructure limitations. > * pagecounts-all-sites: uses the same pageview definition as > pagecounts-raw, and so also does not filter out spider or bot traffic. But > it does include access to mobile and zero sites, and is built on a more > reliable logging infrastructure. > * pagecounts-ez: is derived from the best data available at the time. > So until December 2015, it was based on pagecounts-raw and > pagecounts-all-sites, and now it's based on pageviews. This dataset is > great because it compresses very large files without losing any > information, still providing hourly page and project level statistics. > > So the new dataset, pageviews, is what's behind our pageview API and is > now available in static files for bulk download back to May 2015. But the > multiple ways to download pageview data is confusing for consumers, so > we're keeping only pageviews and pagecounts-ez and deprecating the other > two. If you'd like to read more about the current pageview definition, > details are on the research page [7]. > > *Deprecating:* We are deprecating the pagecounts-raw and > pagecounts-all-sites datasets in May 2016 (discussion here: > https://phabricator.wikimedia.org/T130656 ). This data suffers from many > artifacts, lack of mobile data, and/or infrastructure problems, and so is > not comparable to the new way we track pageviews. It will remain here > because we have historical data that may be useful, but it will not be > maintained or updated beyond May 2016. > > *Clean-up:* Analytics data on dumps was crammed into /other with > unrelated datasets. We made a new page to receive current and future > datasets [3] and linked to it from /other and /. Please let us know if > anything there looks confusing or opaque and I'll be happy to clarify. > > > [1] http://dumps.wikimedia.org/other/unique_devices > [2] http://dumps.wikimedia.org/other/pageviews > [3] http://dumps.wikimedia.org/analytics/ > [4] https://meta.wikimedia.org/wiki/ComScore/Announcement > [5] https://meta.wikimedia.org/wiki/Research:Unique_Devices > [6] > https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_uni… > [7] https://meta.wikimedia.org/wiki/Research:Page_view >

8 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l March 2016