Wiki-research-l February 2017

wiki-research-l@lists.wikimedia.org

31 participants
25 discussions

Wikipedia Research policy
by song＠cs.umn.edu 14 Jul '23

14 Jul '23

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

8 10

[Analytics] Beeline as Hive client
by Madhumitha Viswanathan 03 Oct '18

03 Oct '18

Hi all, For all Hive users using stat1002/1004, you might have seen a deprecation warning when you launch the hive client - that claims it's being replaced with Beeline. The Beeline shell has always been available to use, but it required supplying a database connection string every time, which was pretty annoying. We now have a wrapper <https://github.com/wikimedia/operations-puppet/blob/production/modules/role…> script setup to make this easier. The old Hive CLI will continue to exist, but we encourage moving over to Beeline. You can use it by logging into the stat1002/1004 boxes as usual, and launching `beeline`. There is some documentation on this here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline. If you run into any issues using this interface, please ping us on the Analytics list or #wikimedia-analytics or file a bug on Phabricator <http://phabricator.wikimedia.org/tag/analytics>. (If you are wondering stat1004 whaaat - there should be an announcement coming up about it soon!) Best, --Madhu :)

2 2

Wikipedia aggregate clickstream data released
by Dario Taraborelli 17 Jan '18

17 Jan '18

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

5 5

Wikipedia Detox: Scaling up our understanding of harassment on Wikipedia
by Ellery Wulczyn 27 Jun '17

27 Jun '17

Today we are announcing <https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/> the first results of the collaboration between Wikimedia Research and Jigsaw on modeling personal attacks and other forms of harassment on English Wikipedia. We have released <https://figshare.com/projects/Wikipedia_Talk/16731> a corpus of 95M user and article talk page comments as well as over 1M human labels produced by 4000 crowd-workers for a set of 100k comments. Documentation on our methodology and future work can be found in our paper Ex Machina: Personal Attacks Seen at Scale <https://arxiv.org/abs/1610.08914> (to appear at WWW2017) and on our project page on meta <https://meta.wikimedia.org/wiki/Research:Detox>. If you are interested in contributing to the project, please get in touch via the project talk page <https://meta.wikimedia.org/wiki/Research_talk:Detox>. Another great way to get involved is to label a set of comment in the Wikilabels discussion quality campaign <http://labels.wmflabs.org/ui/enwiki/>.

7 10

Finding the number of links between two wikipedia pages
by Mara Sorella 24 Feb '17

24 Feb '17

Hi everybody, I'm new to the list and have been referred here by a comment from a SO user as per my question [1], that I'm quoting next: I * have been successfully able to use the Wikipedia pagelinks SQL dump to obtain hyperlinks between Wikipedia pages for a specific revision time.However, there are cases where multiple instances of such links exist, e.g. the very same https://en.wikipedia.org/wiki/Wikipedia <https://en.wikipedia.org/wiki/Wikipedia> page and https://en.wikipedia.org/wiki/Wikimedia_Foundation <https://en.wikipedia.org/wiki/Wikimedia_Foundation>. I'm interested to find number of links between pairs of pages for a specific revision. Ideal solutions would involve dump files other than pagelinks (which I'm not aware of), or using the MediaWiki API.* To elaborate, I need this information to weight (almost) every hyperlink between article pages (that is, in NS0), that was present in a specific wikipedia revision (end of 2015), therefore, I would prefer not to follow the solution suggested by the SO user, that would be rather impractical. Indeed, my final aim is to use this weight in a thresholding fashion to sparsify the wikipedia graph (that due to the short diameter is more or less a giant connected component), in a way that should reflect the "relatedness" of the linked pages (where relatedness is not intended as strictly semantic, but at a higher "concept" level, if I may say so). For this reason, other suggestions on how determine such weights (possibly using other data sources -- ontologies?) are more than welcome. The graph will be used as dataset to test an event tracking algorithm I am doing research on. Thanks, Mara [1] http://stackoverflow.com/questions/42277773/number-of-links-between-two-wik…

4 5

Fwd: EventStreams launch and RCStream deprecation
by Andrew Otto 23 Feb '17

23 Feb '17

Hi everyone! Wikimedia is releasing a new service: EventStreams <https://wikitech.wikimedia.org/wiki/EventStreams>[1]. This service allows us to publish arbitrary streams of JSON event data to the public. (Psssst: we’re looking for cool new uses <https://www.mediawiki.org/wiki/EventStreams/Blog_-_Call_For_Entries>[2] to put on an upcoming blog post.) Initially, the only stream available will be good ol’ RecentChanges <https://www.mediawiki.org/wiki/Manual:RCFeed>. This event stream overlaps functionality already provided by irc.wikimedia.org and RCStream <https://wikitech.wikimedia.org/wiki/RCStream>. However, this new service has advantages over these (now deprecated) services. 1. We can expose more than just RecentChanges. 2. Events are delivered over streaming HTTP (chunked transfer) instead of IRC or socket.io. This requires less client side code and fewer special routing cases on the server side. 3. Streams can be resumed from the past. By using EventSource, a disconnected client will automatically resume the stream from where it left off, as long as it resumes within one week. In the future, we would like to allow users to specify historical timestamps from which they would like to begin consuming, if this proves safe and tractable. I did say deprecated! Okay okay, we may never be able to fully deprecate irc.wikimedia.org. It’s used by too many (probably sentient by now) bots out there. We do plan to obsolete RCStream, and to turn it off in a reasonable amount of time. The deadline iiiiiis July 7th, 2017. All services that rely on RCStream should migrate to the HTTP based EventStreams service by this date. We are committed to assisting you in this transition, so let us know how we can help. Unfortunately, unlike RCStream, EventStreams does not have server side event filtering (e.g. by wiki) quite yet. How and if this should be done is still under discussion <https://phabricator.wikimedia.org/T152731>. The RecentChanges data you are used to remains the same, and is available at https://stream.wikimedia.org/v2/stream/recentchange. However, we may have something different for you, if you find it useful. We have been internally producing new Mediawiki specific events <https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…> [3] for a while now, and could expose these via EventStreams as well. Take a look at these events, and tell us what you think. Would you find them useful? How would you like to subscribe to them? Individually as separate streams, or would you like to be able to compose multiple event types into a single stream via an API? These things are all possible. I asked for a lot of feedback in the above paragraphs. Let’s try and centralize this discussion over on the mediawiki.org EventStreams talk page <https://www.mediawiki.org/wiki/Talk:EventStreams>[4]. In summary, the questions are: - What RCStream clients do you maintain, and how can we help you migrate to EventStreams? <https://www.mediawiki.org/wiki/Topic:Tkjkee2j684hkwc9> - Is server side filtering, by wiki or arbitrary event field, useful to you? <https://www.mediawiki.org/wiki/Topic:Tkjkabtyakpm967t> - Would you like to consume streams other than RecentChanges? <https://www.mediawiki.org/wiki/Topic:Tkjk4ezxb4u01a61> (Currently available events are described the event-schemas repository <https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…> .) Thanks! - Andrew Otto [1] https://wikitech.wikimedia.org/wiki/EventStreams [2] https://www.mediawiki.org/wiki/EventStreams/Blog_-_Call_For_Entries [3] https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema… [4] https://www.mediawiki.org/wiki/Talk:EventStreams

1 0

Applications for WikiCite 2017 (Vienna, May 23-25, 2017) close in 5 days
by Dario Taraborelli 23 Feb '17

23 Feb '17

A reminder that applications to attend WikiCite 2017 <https://meta.wikimedia.org/wiki/WikiCite_2017> close on *February 27, 2017* . Please consider applying <https://docs.google.com/forms/d/e/1FAIpQLScWnCLfAt88cUWKSu_E-lU8m3te_r4P3ng…> if you work on sources and citations (or related tools) in Wikipedia, Wikidata, Wikisource or other Wikimedia projects. If there are other people in your network we should consider inviting to the event, please let us know. You can contact the organizing committee at: wikicite(a)wikimedia.org. Best, Dario -- on behalf of the organizers On Thu, Feb 9, 2017 at 3:44 PM, Dario Taraborelli < dtaraborelli(a)wikimedia.org> wrote: > Dear all, > > I am happy to announce that applications to attend WikiCite ‘17 officially open > today <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>. > > About the event > > WikiCite 2017 <https://meta.wikimedia.org/wiki/WikiCite_2017> is a 3-day > conference, summit and hack day to be hosted in Vienna, Austria, on May > 23-25, 2017. It expands on efforts started last year at WikiCite 2016 > <https://meta.wikimedia.org/wiki/WikiCite_2016/Report> to design a > central bibliographic repository, as well as tools and strategies to > improve information quality and verifiability in Wikimedia projects. > > Our goal is to bring together Wikimedia contributors, data modelers, > information and library science experts, software engineers, designers and > academic researchers who have experience working with Wikipedia's citations > and bibliographic data. > > WikiCite 2017 will be a venue to: > > - > > Day 1. (Conference) – present progress on existing work and > initiatives for citations and bibliographic data across Wikimedia projects > - > > Day 2. (Summit) – discuss technical, social, outreach and policy > directions > - > > Day 3. (Hack) – get together to build, based on new ideas and > applications > > > > More information on the event can be found here > <https://meta.wikimedia.org/wiki/WikiCite_2017>: > > How to apply > > Participation for this year's event is limited to 100 individuals. In > order to be considered for participation, please fill out the following > form <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2> and provide us with some > information about yourself, your interests, and expected contribution. > PLEASE NOTE THIS IS NOT THE FINAL REGISTRATION FORM. Your application will > be reviewed and the organizing committee will extend an invitation by March > 10, 2017. This application form is to determine the best mix of > attendees. Not everyone who applies will receive an invitation, but there > will be a waitlist. > > Important dates > > > - > > February 9, 2017: applications open > - > > February 27, 2017: applications close, waitlist opens > - > > March 10, 2017: all final notifications of acceptance are issued, > waitlist processing begins > - > > March 31, 2017: attendee list is finalized > > > Travel support > > > Like last year, limited funding to cover travel costs of prospective > participants will be available. Requests for travel support should be > submitted via the application form > <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>. We will confirm by March 10, if > we can provide you with travel support. > > Contact > > For any question, you can contact the organizing committee via: > wikicite(a)wikimedia.org > > We look forward to seeing you in Vienna! > > The WikiCite 2017 organizing committee > > Dario Taraborelli > > Jonathan Dugan > > Lydia Pintscher > > Daniel Mietchen > > Cameron Neylon > > > > *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation > wikimediafoundation.org • nitens.org • @readermeter > <http://twitter.com/readermeter> > -- *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter <http://twitter.com/readermeter>

1 0

Retention of Wikimedians for the long term
by Pine W 23 Feb '17

23 Feb '17

Hi Research-l, A human resources problem that I am experiencing is a shortage of human resources of community members who are willing, available, and have the skills to work on a variety of useful initiatives. Is anyone on this list aware of research that talks about motivations of long-term contributors? In particular, I'd be interested in research that suggests ways to convert productive, relatively new editors (say, 50-500 edits) into long-term community members who are likely to develop into long-term, productive Wikimedians. Thanks, Pine

13 25

Journal of Wiki Studies
by Piotr Konieczny 22 Feb '17

22 Feb '17

I just got an email soliciting submissions to new, open source "Journal of Wiki Studies". I didn't see anything about this on this list yet, so... See http://wikistudies.org/index.php?journal=wikistudies&page=index -- Piotr Konieczny, PhD http://hanyang.academia.edu/PiotrKonieczny http://scholar.google.com/citations?user=gdV8_AEAAAAJ http://en.wikipedia.org/wiki/User:Piotrus

4 3

SEMANTiCS 2017, Amsterdam, Sep 11-14, Open Calls
by Sebastian Hellmann 22 Feb '17

22 Feb '17

Apologies for cross-posting Call for Papers, Posters & Workshops SEMANTiCS 2017 - The Linked Data Conference 13th International Conference on Semantic Systems Amsterdam, Netherlands September 11 -14, 2017 http://2017.semantics.cc For details please go to: https://2017.semantics.cc/calls Important Dates (Research & Innovation): *Abstract Submission Deadline: May 17, 2017 (11:59 pm, Hawaii time) *Paper Submission Deadline: May 24, 2017 (11:59 pm, Hawaii time) *Notification of Acceptance: July 3, 2017 (11:59 pm, Hawaii time) *Camera-Ready Paper: August 14, 2017 (11:59 pm, Hawaii time) Important Dates (Workshops & Tutorials): *Submission of Proposals for Workshops with Call for Papers: March 31, 2017 (23:59 Hawaii Time) *Submission of Proposals for Tutorials and Workshops without Call for Papers: June 30, 2017 (23:59 Hawaii Time) *Workshop Proposals Notification of Acceptance: April 13, 2017 (23:59 Hawaii Time) *Workshop Website/Call for Papers Online: April 30, 2017 (23:59 Hawaii Time) *Workshop Camera-Ready Proceedings: September 4, 2017 (23:59 Hawaii Time) *SEMANTiCS 2017 Workshop & Tutorial Days: September 11 and 14, 2017 As in the previous years, SEMANTiCS’17 proceedings will be published by ACM ICPS (pending) and CEUR WS proceedings. SEMANTiCS 2017 will especially welcome submissions for the following hot topics: *Data Science (special track, see below) *Web Semantics, Linked (Open) Data & schema.org *Corporate Knowledge Graphs *Knowledge Integration and Language Technologies *Data Quality Management *Economics of Data, Data Services and Data Ecosystems Following the success of previous years, the ‘horizontals’ (research) and ‘verticals’ (industries) below are of interest for the conference: Horizontals: *Enterprise Linked Data & Data Integration *Knowledge Discovery & Intelligent Search *Business Models, Governance & Data Strategies *Semantics in Big Data *Text Analytics *Data Portals & Knowledge Visualization *Semantic Information Management *Document Management & Content Management *Terminology, Thesaurus & Ontology Management *Smart Connectivity, Networking & Interlinking *Smart Data & Semantics in IoT *Semantics for IT Safety & Security *Semantic Rules, Policies & Licensing *Community, Social & Societal Aspects Data Science Special Track Horizontals: *Large-Scale Data Processing (stream processing, handling large-scale graphs) *Data Analytics (Machine Learning, Predictive Analytics, Network Analytics) *Communicating Data (Data Visualization, UX & Interaction Design, Crowdsourcing) *Cross-cutting Issues (Ethics, Privacy, Security, Provenance) Verticals: *Industry & Engineering *Life Sciences & Health Care *Public Administration *e-Science *Digital Humanities *Galleries, Libraries, Archives & Museums (GLAM) *Education & eLearning *Media & Data Journalism *Publishing, Marketing & Advertising *Tourism & Recreation *Financial & Insurance Industry *Telecommunication & Mobile Services *Sustainable Development: Climate, Water, Air, Ecology *Energy, Smart Homes & Smart Grids *Food, Agriculture & Farming *Safety, Security & Privacy *Transport, Environment & Geospatial For call details please go to: https://2017.semantics.cc/calls

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l February 2017