Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Cross-posting this request to wiki-research-l. Anyone have data on
frequently used section titles in articles (any language), or know of
datasets/publications that examined this?
I'm not aware of any off the top of my head, Amir.
- Jonathan
---------- Forwarded message ----------
From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
Date: Sat, Jul 11, 2015 at 3:29 AM
Subject: [Wikitech-l] statistics about frequent section titles
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Hi,
Did anybody ever try to collect statistics about frequent section titles in
Wikimedia projects?
For Wikipedia, for example, titles such as "Biography", "Early life",
"Bibliography", "External links", "References", "History", etc., appear in
a lot of articles, and their counterparts appear in a lot of languages.
There are probably similar things in Wikivoyage, Wiktionary and possibly
other projects.
Did anybody ever try to collect statistics of the most frequent section
titles in each language and project?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
I've lurked on this list for about 6 months now. Basically I've been
looking for sources to address questions that I run across as an
editor, not as an academic.
I have to comment on Corneli's question and Darnell's answer: the
"category system is hopelessly muddled." I can only agree - in 10
years as a pretty active editor, I've never figured out what can be
done using the present categorization system. It's not because I'm
not interested in categories.
Please see (and comment on if you'd like) my informal investigation
on "What's in Wikipedia?" at
https://en.wikipedia.org/wiki/User:Smallbones/1000_random_results . If
nothing else, you might be interested in the graphic
https://commons.wikimedia.org/wiki/File:Size_of_English_Wikipedia_(1000_vol…
Re: the gender gap, please take a look at the bottom of the write-up
on how biographies (Women vs. Men) improve over time. It's got a new
(AFAIK) use of the ORES output.
All comments welcome - here, on the discussion page, or, if you really
want to lay into me, via e-mail.
Thanks,
Pete
User:Smallbones
Call for Industry & Transfer Tutorials and Workshops
SEMANTiCS 2016 - The Linked Data Conference
Transfer // Engineering // Community
12th International Conference on Semantic Systems
Leipzig, Germany
September 12 - 15, 2016
http://www.semantics.cc
Important Dates
* Submission open on first-come, first-served basis until all slots are
filled
* SEMANTiCS 2016 Workshop Day: September 12, 2016
* SEMANTiCS 2016 Tutorial Day: September 15, 2016
Submissions via email: semantics2016workshopchairs(a)gmail.com
See also the Call for Scientific Workshops: http://semantics.cc/open-calls
#SEMANTiCS Industry & Transfer Tutorials and Workshops
SEMANTiCS 2016 is a major venue for industrial innovation and features
an industry & transfer tutorial and workshop program addressing the
diverse practical interests of its audience. This program is intended to
offer a rich diversity of high quality information valuable to
conference attendees and local participants seeking to pick up new
skills and stay up-to-date regarding the latest developments in the
community. We encourage submissions of proposals on all topics in the
general areas of SEMANTiCS 2016 and proposals bridging or introducing
new perspectives in these areas. Tutorials and workshops may incorporate
panel discussions, lightning talks, meetings, networking or hands-on
sessions, hackathons, and other practical formats where applicable.
Rooms for business or project meetings are available upon request as well.
#SEMANTiCS 2016 Tutorials and Workshop Scope & Goals
Satellite events at SEMANTiCS 2016 allow your organisation or project to
push your topics and gain increased visibility. The workshop and
tutorials will be announced on the SEMANTiCS and DBpedia website
(tutorials only) and will be seen by all participants. SEMANTiCS 2016
industry & transfer tutorials and workshops are incubators for
industrial and scientific communities that form and share a particular
research and development agenda. They provide a forum for presenting
widely recognized contributions and findings to a diverse and
knowledgeable community. Furthermore, the event can be used as a
dissemination activity in the scope of large research projects or as a
closed format for research/commercial project consortia meetings.
#Setup and requirements of SEMANTiCS Workshops and Tutorials
SEMANTiCS 2016 workshops and tutorials may be either half or full day
long. Workshops will be held on the day before the main SEMANTiCS 2016
conference (September 12, 2016), Tutorials will be held after the
conference (September 15, 2016) in parallel to the DBpedia Day.
Participation in SEMANTiCS 2016 workshops and tutorials is typically
free. Full service will be provided (Coffee, Lunch, Room, etc.). In
order to cover cost, organiser can choose to pay a cover charge per
participant or buy one of the sponsorship packages that include a
workshop slot (http://semantics.cc/sponsorship-packages). In exceptional
cases, we can also levy a ticket fee from participants.
#Tutorial and Workshop Proposal Submissions
Tutorials and workshop proposals must include the following information:
* full contact information of all organizers of the event and main
contact person
* proposed duration of the event (i.e., half or full day), different
sessions if applicable, together with justification that a high-quality
presentation will be achieved within the chosen time period
* outline of the themes and goals of the event, including a brief
abstract (less than 200 words) intended for the SEMANTiCS 2016 website
* a statement addressing why the event is important, why the event is
timely, how it is relevant to SEMANTiCS 2016, and why the presenters are
qualified for a high-quality introduction of the topic
* a description of the intended audience and the expected learning outcomes
* desired prerequisite knowledge of the audience
* desired minimum and maximum number of event participants, expected
number of participants, and (in case of previously held tutoreventials)
number of registered attendees and web site for previous editions of the
event
* any equipment, room capacity, or other logistic constraints
* a brief description of each organizer's background, including relevant
past experience in organizing events
Tutorial and workshop proposals must be submitted electronically via
semantics2016workshopchairs(a)gmail.com
Submission open on first-come, first-served basis until all slots are
filled
#Review and Evaluation Criteria
Tutorial and workshop proposals will be reviewed by the SEMANTiCS 2016
Workshop Chairs, as well as by the SEMANTiCS 2016 organizing committee,
according to the following criteria:
* The potential to advance the state of semantic web research and practice
* The organizers' commitment to stimulate discussion at the event
* The organizers' experience and ability to lead a successful event
* Timeliness and expected interest in the event topics
* The balance and synergy between all SEMANTiCS 2016 events
The following ‘horizontals’ (topics) and ‘verticals’ (industries) are of
interest:
Horizontals
* Enterprise Linked Data & Data Integration
* Corporate Knowledge Graphs
* Semantics on the Web & schema.org
* Business Models, Governance & Data Strategies
* Knowledge Discovery & Intelligent Search
* Smart Connectivity & Interlinking
* Data Quality Management
* Big Data & Text Analytics
* Data Portals & Knowledge Visualization
* Semantic Information Management
* Document Management & Content Management
* Terminology, Thesaurus & Ontology Management
* Language Technologies
* Data Science (Data Mining, Machine Learning, Network Analytics)
* Economics of data, data services and data ecosystems
* Community, Social & Societal Aspects
Verticals
* Industry & Engineering
* Life Sciences & Health Care
* Public Administration
* Galleries, Libraries, Archives & Museums (GLAM)
* Education & eLearning
* Media & Data Journalism
* Publishing, Marketing & Advertising
* Tourism & Recreation
* Financial & Insurance Industry
* Telecommunication & Mobile Services
* Energy, Smart Homes & Smart Grids
* Transport, Environment & Geospatial
* Agriculture & Farming
In case you have additional questions concerning the submission process,
please do not hesitate to contact us at
semantics2016workshopchairs(a)gmail.com
We are looking forward to your contribution!
Workshop & Tutorial Chair: Thomas Moser (St. Pölten University of
Applied Sciences)
Deputy Workshop & Tutorial Chair: Kay Müller (AKSW/KILT, Leipzig University)
#About SEMANTiCS
The annual SEMANTiCS conference is the meeting place for researchers and
professionals who push the boundaries of semantic computing and who
understand its benefits and encounter its limitations. Every year,
SEMANTiCS attracts professionals and researchers alike ranging from
NPOs, through public administrations to the largest companies in the world.
The success of last year’s conference in Vienna with more than 280
attendees from 22 countries proves that SEMANTiCS 2016 will continue a
long tradition of bringing together colleagues from around the world.
There will be presentations on industry implementations, use case
prototypes, best practices, panels, papers and posters to discuss
semantic systems in birds-of-a-feather sessions as well as informal
settings. SEMANTiCS addresses problems common among information
managers, software engineers, IT-architects and various specialist
departments working to develop, implement and/or evaluate semantic
software systems.
The SEMANTiCS program is a rich mix of technical talks, panel
discussions of important topics and presentations by people who make
things work - just like you. In addition, attendees can network with
experts in a variety of fields. These relationships provide great value
to organisations as they encounter subtle technical issues in any stage
of implementation. The expertise gained by SEMANTiCS attendees has a
long-term impact on their careers and organisations. These factors make
SEMANTiCS the major industry related event across Europe for our community.
Call for Scientific Workshops
SEMANTiCS 2016 - The Linked Data Conference
Transfer // Engineering // Community
12th International Conference on Semantic Systems
Leipzig, Germany
September 12 - 15, 2016
http://www.semantics.cc
Important Dates (11:59 pm, Hawaii time)
* Workshop Proposal Submission Deadline: April 3, 2016
* Workshop Proposal Notification of Acceptance: April 10, 2016
* Workshop Website/Call for Papers Online: April 15, 2016
* Workshop Paper Submission Deadline: June 9, 2016
* Workshop Paper Camera-Ready Deadline: July 26, 2016
* SEMANTiCS 2016 Workshop Day: September 12, 2016
Submissions via Easychair:
https://easychair.org/conferences/?conf=semantics2016research
See also the Call for Industry & Transfer Tutorials and Workshops:
http://semantics.cc/open-calls
#SEMANTiCS 2016 Scientific Workshops Scope & Goals
SEMANTiCS 2016 scientific workshops provide a forum for groups of
researchers and practitioners to discuss topics in semantic web research
and industrial applications. They provide opportunities for researchers
and practitioners to exchange and discuss scientific and engineering
ideas before these ideas have matured to warrant conference or journal
publication. SEMANTiCS 2016 scientific workshops also serve as
incubators for scientific communities that form and share a particular
research agenda.
The workshops may be either half or full day long and will be held on
the day before the main SEMANTiCS 2016 conference (September 12th,
2016). Participation in SEMANTiCS 2016 scientific workshops is typically
free. The workshops can produce workshop proceedings to be published in
CEUR Workshop Proceedings (http://ceur-ws.org/). The best accepted
workshop papers will be considered for publication in the SEMANTiCS 2016
conference proceedings, furthermore, a selection of workshop papers will
be invited to present a poster at the SEMANTiCS 2016 conference.
#Scientific Workshop Proposal Submissions
Submissions via Easychair:
https://easychair.org/conferences/?conf=semantics2016research
Workshop proposals must include the following information (please be
brief and concise, max. 5 pages):
* full contact information of all organizers of the workshop and main
contact person
* desired length of the workshop (i.e., half or full day)
* outline of the themes and goals of the workshop, including a brief
abstract (less than 200 words) intended for the SEMANTiCS 2016 website
* concise motivation of the workshop's relevance to the field of
semantic web
* participant solicitation and selection process
* desired minimum and maximum number of workshop participants, expected
number of participants, and (in case of previously held workshops)
number of registered attendees and web site for previous editions of the
workshop
* structure of the workshop and plans for generating and stimulating
discussion
* any equipment, room capacity, or other logistic constraints
* a brief description of each organizer's background, including relevant
past experience in organizing conferences and workshops
* if applicable, a draft version of the call for papers
#Review and Evaluation Criteria
Scientific workshop proposals will be reviewed by the SEMANTiCS 2016
Workshop Chairs, as well as by the SEMANTiCS 2016 organizing committee,
according to the following criteria:
* The potential to advance the state of semantic web research and practice
* The organizers' commitment to stimulate discussion at the workshop
* The organizers' experience and ability to lead a successful workshop
timeliness and expected interest in the workshop topics
* The balance and synergy between all SEMANTiCS 2016 events
The following ‘horizontals’ (topics) and ‘verticals’ (industries) are of
interest:
Horizontals
* Enterprise Linked Data & Data Integration
* Corporate Knowledge Graphs
* Semantics on the Web & schema.org
* Business Models, Governance & Data Strategies
* Knowledge Discovery & Intelligent Search
* Smart Connectivity & Interlinking
* Data Quality Management
* Big Data & Text Analytics
* Data Portals & Knowledge Visualization
* Semantic Information Management
* Document Management & Content Management
* Terminology, Thesaurus & Ontology Management
* Language Technologies
* Data Science (Data Mining, Machine Learning, Network Analytics)
* Economics of data, data services and data ecosystems
* Community, Social & Societal Aspects
Verticals
* Industry & Engineering
* Life Sciences & Health Care
* Public Administration
* Galleries, Libraries, Archives & Museums (GLAM)
* Education & eLearning
* Media & Data Journalism
* Publishing, Marketing & Advertising
* Tourism & Recreation
* Financial & Insurance Industry
* Telecommunication & Mobile Services
* Energy, Smart Homes & Smart Grids
* Transport, Environment & Geospatial
* Agriculture & Farming
In case you have additional questions concerning the submission process,
please do not hesitate to contact us at
semantics2016workshopchairs(a)gmail.com
We are looking forward to your contribution!
Workshop & Tutorial Chair: Thomas Moser (St. Pölten University of
Applied Sciences)
Deputy Workshop & Tutorial Chair: Kay Müller (AKSW/KILT, Leipzig University)
#About SEMANTiCS
The annual SEMANTiCS conference is the meeting place for researchers and
professionals who push the boundaries of semantic computing and who
understand its benefits and encounter its limitations. Every year,
SEMANTiCS attracts professionals and researchers alike ranging from
NPOs, through public administrations to the largest companies in the
world. SEMANTiCS workshop attendees learn from industry experts and top
researchers about emerging trends and hot topics in the fields of
semantic software, enterprise data, linked data & open data strategies,
methodologies in knowledge modelling and text & data analytics. Since
the SEMANTiCS community is highly diverse, both workshops participants
and organisers will benefit from the experience.
The success of last year’s conference in Vienna with more than 280
attendees from 22 countries proves that SEMANTiCS 2016 will continue a
long tradition of bringing together colleagues from around the world.
There will be presentations on industry implementations, use case
prototypes, best practices, panels, papers and posters to discuss
semantic systems in birds-of-a-feather sessions as well as informal
settings. SEMANTiCS addresses problems common among information
managers, software engineers, IT-architects and various specialist
departments working to develop, implement and/or evaluate semantic
software systems.
The SEMANTiCS program is a rich mix of technical talks, panel
discussions of important topics and presentations by people who make
things work - just like you. In addition, attendees can network with
experts in a variety of fields. These relationships provide great value
to organisations as they encounter subtle technical issues in any stage
of implementation. The expertise gained by SEMANTiCS attendees has a
long-term impact on their careers and organisations. These factors make
SEMANTiCS the major industry related event across Europe for our community.
Hi all – heads up that we extended the submission deadline for the Wiki
Workshop at ICWSM '16 to *Wednesday, March 3, 2016*. (The second deadline
remains unchanged: March 11, 2016).
You can check the workshop's website
<http://snap.stanford.edu/wikiworkshop2016/> for submission instructions or
follow us at @wikiworkshop16 <https://twitter.com/wikiworkshop16> for live
updates.
Looking forward to your contributions.
Dario
Hi Bruno,
I have been using the WikiExtractor for this task:
https://github.com/attardi/wikiextractor
Hope this helps.
Cheers,
Marco
On 2/22/16 23:32, wiki-research-l-request(a)lists.wikimedia.org wrote:
> Date: Mon, 22 Feb 2016 23:12:08 +0100
> From: "Federico Leva (Nemo)"<nemowiki(a)gmail.com>
> To: Research into Wikimedia content and communities
> <wiki-research-l(a)lists.wikimedia.org>
> Subject: Re: [Wiki-research-l] "Quick" request
> Message-ID:<56CB87B8.9050008(a)gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Bruno Goncalves, 22/02/2016 22:58:
>> >There used to be official HTML dumps
>> >https://dumps.wikimedia.org/other/static_html_dumps/ but they haven't
>> >been updated in almost a decade:)
> The job is effectively done by Kiwix now.
> http://download.kiwix.org/zim/wikipedia/
> For instance:
> wikipedia_en_all_nopic_2015-05.zim 17-May-2015 10:27 15G
>
> There are several tools to extract the HTML from a ZIM file:
> http://www.openzim.org/wiki/Readers
>
> Nemo