Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Cross-posting this request to wiki-research-l. Anyone have data on
frequently used section titles in articles (any language), or know of
datasets/publications that examined this?
I'm not aware of any off the top of my head, Amir.
- Jonathan
---------- Forwarded message ----------
From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
Date: Sat, Jul 11, 2015 at 3:29 AM
Subject: [Wikitech-l] statistics about frequent section titles
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Hi,
Did anybody ever try to collect statistics about frequent section titles in
Wikimedia projects?
For Wikipedia, for example, titles such as "Biography", "Early life",
"Bibliography", "External links", "References", "History", etc., appear in
a lot of articles, and their counterparts appear in a lot of languages.
There are probably similar things in Wikivoyage, Wiktionary and possibly
other projects.
Did anybody ever try to collect statistics of the most frequent section
titles in each language and project?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
Hi all,
Some of us plan to have a conversation at the WCONUSA unconference sessions
about ENWP culture. Are there any recommended readings that you could
suggest as preparation, particularly on the subject of how to reinforce or
incentivize desirable user behavior? I think that Jonathan may have done
some research on this topic for the Teahouse, and Ocassi may have for done
research for TWA. I'm interested in applicable research as preparation both
for the unconference discussion and for my planned video series that
intends to inform and inspire new editors.
Thanks,
Pine
Hi everybody,
We’re preparing for the October 2015 research newsletter and looking for contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201510 and add your name next to any paper you are interested in covering. Our target publication date is Wednesday October 28 UTC. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
Use and awareness of Wikipedia among the M.C.A students of C. D. Jain college of commerce, Shrirampur : A Study
Understanding Editing Behaviors in Multilingual Wikipedia
The Impact and Evolution of Group Diversity in Online Open Collaboration
Teaching Wikipedia: The Pedagogy and Politics of an Open Access Writing Community
“An Encyclopedia, Not an Experiment in Democracy”: Wikipedia Biographies, Authorship, and the Wikipedia Subject
"You get what you need” : A study of students’ attitudes towards using Wikipedia when doing school assignments
Machine Learning and the Detection of Anomalies in Wikipedia
"Collective remembering of organizations: Co-construction of organizational pasts in Wikipedia"
Top 100 historical figures of Wikipedia
Towards a Class-Based Model of Information Organization in Wikipedia
Transparency, Control, and Content Generation on Wikipedia: Editorial Strategies and Technical Affordances
Influence of Wikipedia and other web resources on acute and critical care decisions. A web-based survey
Utilising Wikipedia for text mining applications
Automatic Identification and Disambiguation of Concepts and Named Entities in the Multilingual Wikipedia
Beyond Friendships and Followers: The Wikipedia Social Network
Intellectual Interchanges in the History of Massive Online Open-editing Encyclopedia, Wikipedia
Exploration of Online Culture Through Network Analysis of Wikipedia
Cyberpsychology, Behavior, and Social Networking
Measuring Article Quality in Wikipedia using the Collaboration Network
How do Twitter, Wikipedia, and Harrison's principles of medicine describe heart attacks?
Sociotechnical interaction at work: an ethnographic study of the Wikipedia community
Wikipedia and history: a worthwhile partnership in the digital era?
If you have any question about the format or process feel free to get in touch off-list.
Masssly, Tilman Bayer and Dario Taraborelli
[1] http://meta.wikimedia.org/wiki/Research:Newsletter
*Extended Deadline November 20, 2015
CFP: Semantic Web Journal - Special Issue on Quality Management of
Semantic Web Assets (Data, Services and Systems):*
http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-…
Submission guidelines
*Deadline:October 31, 2015* > *November 20, 2015**
*
Submissions shall be made through the Semantic Web journal website at
http://www.semantic-web-journal.net. Prospective authors must take
notice of the submission guidelines posted at
http://www.semantic-web-journal.net/authors. Note that you need to
request an account on the website for submitting a paper. Please
indicate in the cover letter that it is for the Special Issue on Quality
Management of Semantic Web Assets (Data, Services and Systems).
Submissions are possible in the following categories: full research
papers, application reports, reports on tools and systems, and case
studies. While there is no upper limit, paper length must be justified
by content.
Guest editors
* Amrapali Zaveri, University of Leipzig, AKSW Group, Germany
* Dimitris Kontokostas, University of Leipzig, AKSW Group, Germany
* Sebastian Hellmann, University of Leipzig, AKSW Group, Germany
* Jürgen Umbrich, Vienna University of Economics and Business, Austria
*Overview and Topics*
The standardization and adoption of Semantic Web technologies has
resulted in a variety of assets, including an unprecedented volume of
data being semantically enriched and systems and services, which consume
or publish this data. Although gathering, processing and publishing data
is a step towards further adoption of Semantic Web, quality does not yet
play a central role in these assets (e.g., data lifecycle,
system/service development).
Quality management essentially refers to activities and tasks involved
to guarantee a certain level of consistency and to meet the quality
requirements for the assets. In general, quality management consists of
the following four phases and components: (i) quality planning, (ii)
quality control, (iii) quality assurance and (iv) quality improvement.
The quality planning phase in the Semantic Web typically involves the
design of procedures, strategies and policies to support the management
of the assets. The quality control and assurance components have their
primary aim in preventing errors and to meet quality requirements
pertaining to the Semantic Web standards. A core part for both
components are quality assessment methods which provide the necessary
input for the controlling and assurance tasks.
Quality assessment of Semantic Web Assets (data, services and systems),
in particular, presents new challenges that were not handled before in
other research areas. Thus, adopting existing approaches for data
quality assessment is not a straightforward solution. These challenges
are related to the openness of the Semantic Web, the diversity of the
information and the unbounded, dynamic set of autonomous data sources,
publishers and consumers (legal and software agents). Additionally,
detecting the quality of available data sources and making the
information explicit is yet another challenge. Moreover, noise in one
data set, or missing links between different data sets, propagates
throughout the Web of Data, and imposes great challenges on the data
value chain.
In case of systems and services, different implementations follow the
specifications for RDF and SPARQL to varying extents, or even propose
and offer new, non-standardized extensions. This causes strong
incompatibilities between systems, e.g., between the used SPARQL
features in the query engines and support features in RDF stores. The
potential heterogeneity and incompatibility poses several challenges for
the quality assessments in and for such systems and services.
Eventually, quality improvement methods are used to further enhance the
value of the Semantic Web Assets. One important step to improve the
quality of data is identifying the root cause of the problem and then
designing corresponding data improvement solutions. These solutions
select the most effective and efficient strategies and related set of
techniques and tools to improve quality. Quality improvement metrics for
products and services entails understanding and improving operational
processes and establishing valid and reliable service performance measures.
This Special Issue is addressed to those members of the community
interested in providing novel methodologies or frameworks in managing,
assessing, monitoring, maintaining and improving the quality of the
Semantic Web data, services and systems and also introduce tools and
user interfaces which can effectively assist in this management.
Topics of Interest
We welcome original high quality submissions on (but are not restricted
to) the following topics:
* Methodologies and frameworks to plan, control, assure or improve the
quality of Semantic Web Assets
* Quality exploration and analysis interfaces
* Quality monitoring
* Developing, deploying and managing quality service ecosystems
* Assessing the quality evolution of Semantic Web Assets
* Large-scale quality assessment of structured datasets
* Crowdsourcing data quality assessment
* Quality assessment leveraging background knowledge
* Use-case driven quality management
* Evaluation of trustworthiness of data
* Web Data and LOD quality benchmarks
* Data Quality improvement methods and frameworks, e.g., linkage,
alignment, cleaning, enrichment, correctness
* Service/system quality improvement methods and frameworks
* Managing sustainability issues in services
* Guarantee of service (availability, performance)
* Systems for transparent management of open data
Hey,
I was wondering if any one on the list had any contacts with Norwegian
academics doing research on Wikipedia, particularly from a gender gap
perspective?
Sincerely,
Laura Hale
--
twitter: purplepopple
Hi everyone,
The next Research showcase is completely dedicated to Teahouse. :-) It will
be live-streamed this Wednesday, October 21 at 18:30 (UTC). The streaming
link is:
http://www.youtube.com/watch?v=T73vRiNsRxo
As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Archive>.
We look forward to seeing you!
Leila
This month
The impact of the Wikipedia Teahouse on new editor retentionBy *Jonathan
Morgan, Aaron Halfaker*
New Wikipedia editors face a variety of social and technical barriers to
participation. These barriers have been shown to cause even promising,
highly-motivated newcomers to give up and leave Wikipedia shortly after
joining. The Wikipedia Teahouse was launched in 2012 to provide new editors
with a space on Wikipedia where they could ask questions, introduce
themselves, and learn the ropes of editing in a friendly and supportive
environment, with the goal of increasing the percentage of good-faith
newcomers who go on to become productive Wikipedians. Research has shown
that the Teahouse provided a positive experience for participants, and
suggested that participating in the Teahouse led to more editing activity
and longer survival for new editors who participated. The current study
examines the impact of Teahouse invitations on new editors survival over a
longer period of time (2-6 months), and presents findings related to
contextual factors within editors' first few sessions that are associated
with overall survival rate and editing patterns associated with increased
likelihood of visiting the Teahouse.