Wiki-research-l October 2015

wiki-research-l@lists.wikimedia.org

30 participants
23 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

Wikipedia aggregate clickstream data released

by Dario Taraborelli

We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770> This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. This data can be used for various purposes: • determining the most frequent links people click on for a given article • determining the most common links people followed to an article • determining how much of the total traffic to an article clicked on a link in that article • generating a Markov chain over English Wikipedia We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream> Ellery and Dario

6 years, 3 months

Fwd: [Wikitech-l] statistics about frequent section titles

by Jonathan Morgan

Cross-posting this request to wiki-research-l. Anyone have data on frequently used section titles in articles (any language), or know of datasets/publications that examined this? I'm not aware of any off the top of my head, Amir. - Jonathan ---------- Forwarded message ---------- From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> Date: Sat, Jul 11, 2015 at 3:29 AM Subject: [Wikitech-l] statistics about frequent section titles To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi, Did anybody ever try to collect statistics about frequent section titles in Wikimedia projects? For Wikipedia, for example, titles such as "Biography", "Early life", "Bibliography", "External links", "References", "History", etc., appear in a lot of articles, and their counterparts appear in a lot of languages. There are probably similar things in Wikivoyage, Wiktionary and possibly other projects. Did anybody ever try to collect statistics of the most frequent section titles in each language and project? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬ _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

8 years

Reinforcing or incentivizing desired user behavior

by Pine W

Hi all, Some of us plan to have a conversation at the WCONUSA unconference sessions about ENWP culture. Are there any recommended readings that you could suggest as preparation, particularly on the subject of how to reinforce or incentivize desirable user behavior? I think that Jonathan may have done some research on this topic for the Teahouse, and Ocassi may have for done research for TWA. I'm interested in applicable research as preparation both for the unconference discussion and for my planned video series that intends to inform and inspire new editors. Thanks, Pine

8 years, 5 months

Please help finish "Report a problem" section of CoC (+updates)

by Matthew Flaschen

We are now working on the "Report a problem" section of the draft Code of conduct: * Section: https://www.mediawiki.org/wiki/Code_of_Conduct/Draft#Report_a_problem * Talk: https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Finishing_the_.22… * Alternatively, you can provide anonymous feedback to conduct-discussion(a)wikimedia.org . This is the best time to make any necessary changes to this section (and explain why, in edit summaries and/or talk) and discuss it on the talk page. Your participation is also encouraged re the "Project administrators and maintainers have the right" line. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Rewording_proposal. Other updates: * The text of the intro, "Principles" and "Unacceptable behavior" sections has been frozen. Thanks to everyone who helped discuss and edit these sections. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Fine_tuning_the_n… for details. * The "Expected behavior" section has been moved, to https://www.mediawiki.org/wiki/Expected_behavior (a guideline) and (for one sentence) to the "Report a problem" section. See https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Move_.22Expected_… and https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#New_proposal.2C_w… * The text at the end of the "Unacceptable behavior" section has been rewritten: https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#Move_.22Our_open_… . Thanks, Matt Flaschen

8 years, 5 months

Upcoming research newsletter (October 2015): new papers open for review

by masssly＠ymail.com

Hi everybody, We’re preparing for the October 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201510 and add your name next to any paper you are interested in covering. Our target publication date is Wednesday October 28 UTC. As usual, short notes and one-paragraph reviews are most welcome. Highlights from this month: Use and awareness of Wikipedia among the M.C.A students of C. D. Jain college of commerce, Shrirampur : A Study Understanding Editing Behaviors in Multilingual Wikipedia The Impact and Evolution of Group Diversity in Online Open Collaboration Teaching Wikipedia: The Pedagogy and Politics of an Open Access Writing Community “An Encyclopedia, Not an Experiment in Democracy”: Wikipedia Biographies, Authorship, and the Wikipedia Subject "You get what you need” : A study of students’ attitudes towards using Wikipedia when doing school assignments Machine Learning and the Detection of Anomalies in Wikipedia "Collective remembering of organizations: Co-construction of organizational pasts in Wikipedia" Top 100 historical figures of Wikipedia Towards a Class-Based Model of Information Organization in Wikipedia Transparency, Control, and Content Generation on Wikipedia: Editorial Strategies and Technical Affordances Influence of Wikipedia and other web resources on acute and critical care decisions. A web-based survey Utilising Wikipedia for text mining applications Automatic Identification and Disambiguation of Concepts and Named Entities in the Multilingual Wikipedia Beyond Friendships and Followers: The Wikipedia Social Network Intellectual Interchanges in the History of Massive Online Open-editing Encyclopedia, Wikipedia Exploration of Online Culture Through Network Analysis of Wikipedia Cyberpsychology, Behavior, and Social Networking Measuring Article Quality in Wikipedia using the Collaboration Network How do Twitter, Wikipedia, and Harrison's principles of medicine describe heart attacks? Sociotechnical interaction at work: an ethnographic study of the Wikipedia community Wikipedia and history: a worthwhile partnership in the digital era? If you have any question about the format or process feel free to get in touch off-list. Masssly, Tilman Bayer and Dario Taraborelli [1] http://meta.wikimedia.org/wiki/Research:Newsletter

8 years, 6 months

[CFP - Extended Deadline 20/11] Semantic Web Journal - Special Issue on Quality Management of Semantic Web Assets (Data, Services and Systems)

by Sebastian Hellmann

*Extended Deadline November 20, 2015 CFP: Semantic Web Journal - Special Issue on Quality Management of Semantic Web Assets (Data, Services and Systems):* http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-… Submission guidelines *Deadline:October 31, 2015* > *November 20, 2015** * Submissions shall be made through the Semantic Web journal website at http://www.semantic-web-journal.net. Prospective authors must take notice of the submission guidelines posted at http://www.semantic-web-journal.net/authors. Note that you need to request an account on the website for submitting a paper. Please indicate in the cover letter that it is for the Special Issue on Quality Management of Semantic Web Assets (Data, Services and Systems). Submissions are possible in the following categories: full research papers, application reports, reports on tools and systems, and case studies. While there is no upper limit, paper length must be justified by content. Guest editors * Amrapali Zaveri, University of Leipzig, AKSW Group, Germany * Dimitris Kontokostas, University of Leipzig, AKSW Group, Germany * Sebastian Hellmann, University of Leipzig, AKSW Group, Germany * Jürgen Umbrich, Vienna University of Economics and Business, Austria *Overview and Topics* The standardization and adoption of Semantic Web technologies has resulted in a variety of assets, including an unprecedented volume of data being semantically enriched and systems and services, which consume or publish this data. Although gathering, processing and publishing data is a step towards further adoption of Semantic Web, quality does not yet play a central role in these assets (e.g., data lifecycle, system/service development). Quality management essentially refers to activities and tasks involved to guarantee a certain level of consistency and to meet the quality requirements for the assets. In general, quality management consists of the following four phases and components: (i) quality planning, (ii) quality control, (iii) quality assurance and (iv) quality improvement. The quality planning phase in the Semantic Web typically involves the design of procedures, strategies and policies to support the management of the assets. The quality control and assurance components have their primary aim in preventing errors and to meet quality requirements pertaining to the Semantic Web standards. A core part for both components are quality assessment methods which provide the necessary input for the controlling and assurance tasks. Quality assessment of Semantic Web Assets (data, services and systems), in particular, presents new challenges that were not handled before in other research areas. Thus, adopting existing approaches for data quality assessment is not a straightforward solution. These challenges are related to the openness of the Semantic Web, the diversity of the information and the unbounded, dynamic set of autonomous data sources, publishers and consumers (legal and software agents). Additionally, detecting the quality of available data sources and making the information explicit is yet another challenge. Moreover, noise in one data set, or missing links between different data sets, propagates throughout the Web of Data, and imposes great challenges on the data value chain. In case of systems and services, different implementations follow the specifications for RDF and SPARQL to varying extents, or even propose and offer new, non-standardized extensions. This causes strong incompatibilities between systems, e.g., between the used SPARQL features in the query engines and support features in RDF stores. The potential heterogeneity and incompatibility poses several challenges for the quality assessments in and for such systems and services. Eventually, quality improvement methods are used to further enhance the value of the Semantic Web Assets. One important step to improve the quality of data is identifying the root cause of the problem and then designing corresponding data improvement solutions. These solutions select the most effective and efficient strategies and related set of techniques and tools to improve quality. Quality improvement metrics for products and services entails understanding and improving operational processes and establishing valid and reliable service performance measures. This Special Issue is addressed to those members of the community interested in providing novel methodologies or frameworks in managing, assessing, monitoring, maintaining and improving the quality of the Semantic Web data, services and systems and also introduce tools and user interfaces which can effectively assist in this management. Topics of Interest We welcome original high quality submissions on (but are not restricted to) the following topics: * Methodologies and frameworks to plan, control, assure or improve the quality of Semantic Web Assets * Quality exploration and analysis interfaces * Quality monitoring * Developing, deploying and managing quality service ecosystems * Assessing the quality evolution of Semantic Web Assets * Large-scale quality assessment of structured datasets * Crowdsourcing data quality assessment * Quality assessment leveraging background knowledge * Use-case driven quality management * Evaluation of trustworthiness of data * Web Data and LOD quality benchmarks * Data Quality improvement methods and frameworks, e.g., linkage, alignment, cleaning, enrichment, correctness * Service/system quality improvement methods and frameworks * Managing sustainability issues in services * Guarantee of service (availability, performance) * Systems for transparent management of open data

8 years, 6 months

Any Norwegian academics writing about Wikipedia?

by Laura Hale

Hey, I was wondering if any one on the list had any contacts with Norwegian academics doing research on Wikipedia, particularly from a gender gap perspective? Sincerely, Laura Hale -- twitter: purplepopple

8 years, 6 months

Code of Conduct and publication of private non-harassing communication

by Matthew Flaschen

Quim has proposed an alternative wording for the text about republication of private communication. You can comment at https://www.mediawiki.org/wiki/Talk:Code_of_Conduct/Draft#New_proposal.2C_w… or to conduct-discussion(a)wikimedia.org . Thanks as always, Matt Flaschen

8 years, 6 months

October 2015 Research Showcase

by Leila Zia

Hi everyone, The next Research showcase is completely dedicated to Teahouse. :-) It will be live-streamed this Wednesday, October 21 at 18:30 (UTC). The streaming link is: http://www.youtube.com/watch?v=T73vRiNsRxo As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Archive>. We look forward to seeing you! Leila This month The impact of the Wikipedia Teahouse on new editor retentionBy *Jonathan Morgan, Aaron Halfaker* New Wikipedia editors face a variety of social and technical barriers to participation. These barriers have been shown to cause even promising, highly-motivated newcomers to give up and leave Wikipedia shortly after joining. The Wikipedia Teahouse was launched in 2012 to provide new editors with a space on Wikipedia where they could ask questions, introduce themselves, and learn the ropes of editing in a friendly and supportive environment, with the goal of increasing the percentage of good-faith newcomers who go on to become productive Wikipedians. Research has shown that the Teahouse provided a positive experience for participants, and suggested that participating in the Teahouse led to more editing activity and longer survival for new editors who participated. The current study examines the impact of Teahouse invitations on new editors survival over a longer period of time (2-6 months), and presents findings related to contextual factors within editors' first few sessions that are associated with overall survival rate and editing patterns associated with increased likelihood of visiting the Teahouse.

8 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l October 2015