Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
Curious, what percentage of digital assistants (Alexa, Siri, Cortana,
Google) cite Wikipedia when a person asks a question?
Does the current Wikipedia mobile app support voice search?
Are there any reports on this? Thanks in advance!
Sincere regards,
Stella
--
Stella Yu | STELLARESULTS | 415 690 7827
"Chronicling heritage brands and legendary people."
Hi everyone,
We are excited to announce that the 5th annual Wiki Workshop [1] will
take place in Lyon on April 24, 2018 and as part of The Web Conference
2018 (a.k.a. WWW2018) [2].
You can access the call for papers at
http://wikiworkshop.org/2018/#call . Please submit your ongoing or
completed research related to Wikimedia projects to the workshop. Note
that 2018-01-28 is the submission deadline if you want your paper to
appear in the proceedings, and 2018-03-11 is for all other papers.[3]
Following the past year's model, the workshop will have a set of
invited talks (Jon Kleinberg and Markus Kroetzsch have already
accepted our invitation [4] \o/), a poster session, and more.
Questions and comments are welcome. Otherwise, we're looking forward
to receiving your submissions and seeing you in Lyon in April. :)
Best,
Leila, on behalf of the organizers [5]
[1] http://wikiworkshop.org/2018/
[2] https://www2018.thewebconf.org/
[3] http://wikiworkshop.org/2018/#dates
[4] http://wikiworkshop.org/2018/#speakers
[5] http://wikiworkshop.org/2018/#organization
--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
Hi everyone,
we [1] would like to announce a research project with the goal of studying
whether user interactions recorded at the time of editing are suitable to
predict vandalism in real time.
Should vandal editing behavior be sufficiently different from normal
editing behavior, this would allow for a number of interesting real-time
prevention techniques. For example:
- withholding confidently suspicious edits for review before publishing
them,
- a popup asking "I am not a vandal" (as in Google's "I am not a robot") to
analyze vandal reactions,
- a popup with a chat box to personally engage vandals, e.g., to help them
find other ways of stress relief or to understand them better,
- or at the very least: a new signal to improve traditional vandalism
detectors.
We have set up a laboratory environment to study editor behavior in a
realistic setting using a private mirror of Wikipedia. No editing
whatsoever is conducted on the real Wikipedia as part of our experiments,
and all test subjects of our user studies are made aware of the
experimental nature of their editing. We plan on making use of
crowdsourcing as a means to attain scale and diversity.
If you wish to participate in this study as a test subject yourself, please
get in touch. The more diversity, the more insightful the results will be.
We are also happy to collaborate and to answer all questions that may arise
in relation to the project. For example, our setup and tooling may turn out
to be useful to study other user behavior-related things without having to
actually deploy experiments within the live MediaWiki.
Best,
Martin
PS: The AICaptcha project seems most closely related. @Vinitha and Gergő:
If you wish, we can set up a Skype meeting to talk about a avenues for
collaboration.
[1] A group of students and researchers from Bauhaus-Universität Weimar (
www.webis.de) and Leipzig University (www.temir.org); project PI: Martin
Potthast.
*TL;DR*: The Analytics Hadoop cluster will be completely down for max
2h on *Feb
6th* (EU/CET morning) to upgrade all the daemons to Java 8.
Hi everybody,
we are planning to upgrade the Analytics Hadoop cluster to Java 8 on *Feb
6th* (EU/CET morning) for https://phabricator.wikimedia.org/T166248.
Sadly we can't do a rolling upgrade of all the jvm-based Hadoop daemons
since the distribution that we use (Cloudera) suggests to perform the
upgrade only after a complete cluster shutdown. This means that for a
couple of hours (hopefully a lot less) all the Hadoop based services will
be unavailable (Hive, Oozie, HDFS, etc..).
We have tested the new configuration in labs and all the regular Analytics
jobs seem to work correctly, so we don't expect major issues after the
upgrade, but if you have any question or concern please follow up in the
task.
Thanks!
Luca and Andrew (on behalf of the Analytics team)
> you can't have volunteer resources run the video
> equipment for you. You need a professional crew.
Is there a more specific description of the specific authority,
wording, and legislative intent for this, please?
Dear all,
it's my pleasure to inform you that the call for paper for OpenSym 2018
is available.
Conference Website and call for papers: http://opensym.org
Papers are due by March 15, 23h59 (any time on Earth).
Submission: https://easychair.org/conferences/?conf=opensym2018
(accepted rate in 2017: 45%)
Topics: The conference provides peer-reviewed research tracks on
subjects related to open collaboration including:
- Open Collaboration Research, esp. Wikis and Social Media
- Free/Libre and Open Source Software (FLOSS)
- Open Data, Open Access, and Open Science
- Open Education
- IT-Driven Open Innovation
- Open Policy/Open Government/Open Law
- Wikipedia and Wikimedia Research
Looking forward to seeing your paper's presentation in Paris
Nicolas Jullien, general chair of OpenSym 2018
About the Conference
--------------------
OpenSym is the only conference that brings together the different
strands of open collaboration research and practice, seeking to create
synergies and inspire new collaborations between people from computer
science, information science, social science, humanities, and everyone
interested in understanding open collaboration and how it is changing
our society.
This year’s conference will be held in Paris, France on August 22-24,
2018. A Doctoral Symposium will take place on August 21, 2018.
OpenSym is held in-cooperation with ACM SIGWEB and ACM SIGSOFT and the
conference proceedings will be archived in the ACM digital library like
all prior editions.
Submission Information and Instructions
---------------------------------------
Topics: The conference provides peer-reviewed research tracks on
subjects related to open collaboration including:
- Open Collaboration Research, esp. Wikis and Social Media
- Free/Libre and Open Source Software (FLOSS)
- Open Data, Open Access, and Open Science
- Open Education
- IT-Driven Open Innovation
- Open Policy/Open Government/Open Law
- Wikipedia and Wikimedia Research
Paper Presentation: OpenSym 2018 will be organized as a one track
conference in order to emphasize the interdisciplinary character of this
conference and to encourage discussion.
Submission Deadline: The research paper submission deadline is March
15th 2018. Submitted papers should present integrative reviews or
original reports of substantive new work: theoretical, empirical, and/or
in the design, development and/or deployment of novel concepts, systems,
and mechanisms. Research papers will be reviewed to meet rigorous
academic standards of publication. Papers will be reviewed for
relevance, conceptual quality, innovation and clarity of presentation.
All the submissions are done via the EasyChair platform, here:
https://easychair.org/conferences/?conf=opensym2018
Paper Length: There is no minimum or maximum length for submitted
papers. Rather, reviewers will be instructed to weigh the contribution
of a paper relative to its length. Papers should report research
thoroughly but succinctly: brevity is a virtue. A typical length of a
“long research paper” is 10 pages (formerly the maximum length limit and
the limit on OpenSym tracks), but may be shorter if the contribution can
be described and supported in fewer pages—shorter, more focused papers
(called “short research papers” previously) are encouraged and will be
reviewed like any other paper. While we will review papers longer than
10 pages, the contribution must warrant the extra length. Reviewers will
be instructed to reject papers whose length is incommensurate with the
size of their contribution. Papers should be formatted in ACM SIGCHI
paper format. Reviewing is not double-blind so manuscripts do not need
to be anonymized.
Posters: As in previous years, OpenSym will also be hosting a poster
session at the conference. To propose a poster, authors should submit an
extended abstract (not more than 4 pages) describing the content of the
poster which will be published in a non-archival companion proceedings
to the conference. Posters should use the ACM SIGCHI templates for
extended abstracts. An example of a poster abstract can be found here.
Reviewing is not double-blind so abstracts do not need to be anonymized.
Paper Proceedings: OpenSym is held in-cooperation with ACM SIGWEB and
ACM SIGSOFT and the conference proceedings will be archived in the ACM
digital library like all prior editions. OpenSym seeks to accommodate
the needs of the different research disciplines it draws on including
disciplines with archival conference proceedings and disciplines where
authors usually present at conferences and publish later. Authors, whose
submitted papers have been accepted for presentation at the conference
have a choice of:
having their paper become part of the official proceedings, archived in
the ACM Digital Library,
having their paper published in the conference website only, with no
transfer of copyright from the authors,
having no publication record at all but only the presentation at the
conference.
Response from authors: For the second time at OpenSym, authors will be
given the opportunity to write a response to their reviews before final
decisions are made. This should be treated as an opportunity to correct
any mistakes or misconceptions in the reviews as well as to propose
minor changes that the authors can make during the two weeks between
notification and the camera-ready deadline.
Important Dates
Submission deadline: March 15, 2018
Reviews sent to authors: May 11, 2018
Response to reviews from authors due: May 20, 2018
Final decision notification: June 15, 2018
Camera-ready papers due: June 22, 2018
Papers available online: July 13, 2018
Conference Organization
The general chairs of the conference are Nicolas Jullien and Olivier
Berger, IMT, France. Feel free to contact us with any questions you
might have at info(a)opensym.org.
--
Maître de Conférences (HDR) / Associate Professor.
https://nicolasjullien.wp.mines-telecom.fr/
Directeur de M@rsouin http://www.marsouin.org
Membre du LEGO http://labo-lego.fr
Responsable du M2 management innovation
parcours Mgt du SI et des données @ischool IMT Atlantique
https://innovationmanagement.wp.imt.fr/
--
Maître de Conférences (HDR) / Associate Professor.
https://nicolasjullien.wp.mines-telecom.fr/
Directeur de M@rsouin http://www.marsouin.org
Membre du LEGO http://labo-lego.fr
Responsable du M2 management innovation
parcours Mgt du SI et des données @ischool IMT Atlantique
https://innovationmanagement.wp.imt.fr/
Resending the email below, as it does not have seem to have made to the
inboxes of several recipients - including myself - even though it is recorded
in the list's archives
<https://lists.wikimedia.org/pipermail/wiki-research-l/2018-January/date.html>
.
---
From: masssly at ymail.com
Subject: [Wiki-research-l] Upcoming research newsletter: new papers open
for review
Date: Fri Jan 19 20:12:18 UTC 2018
Hi everyone,
We’re preparing for the January 2018 research newsletter and looking for
contributors. Please take a look at:
https://etherpad.wikimedia.org/p/WRN201801 and add your name next to any
paper you are interested in covering. Our target publication date is on
January 26 UTC. As usual, short notes and one-paragraph reviews are most
welcome.
Highlights from this month:
• Can conference papers have information value through Wikipedia? An
investigation of four engineering fields
• Collaborative Approach to Developing a Multilingual Ontology: A Case
Study of Wikidata
• Determining Quality of Articles in Polish Wikipedia Based on Linguistic
Features
• Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced
Online Dictionary
• Fostering Public Good Contributions with Symbolic Awards: A Large-Scale
Natural Field Experiment at Wikipedia
• Knowledge categorization affects popularity and quality of Wikipedia
articles
• The Conceptual Correspondence between the Encyclopaedia and Wikipedia
• The Wisdom of Polarized Crowds
• Use of Louisiana's Digital Cultural Heritage by Wikipedians
• What Makes Wikipedia's Volunteer Editors Volunteer?
• Wikipedia-integrated publishing: a comparison of successful models
If you have any question about the format or process feel free to get in
touch off-list.
Masssly, Tilman Bayer and Dario Taraborelli
[1] http://meta.wikimedia.org/wiki/Research:Newsletter
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
Cross-post.
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Thu, Jan 18, 2018 at 6:38 AM
Subject: [Input requested] Knowledge as a Service at the Wikimedia
Developer Summit 2018
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Howdy Wikitechnorati,
(And thank you for patience with me cross-posting if you're on other lists.)
I'm writing to invite your input on the following Phabricator task ahead of
next week's Wikimedia Developer Summit 2018 [1] session.
Knowledge as a Service
https://phabricator.wikimedia.org/T183315
The purpose [2] of the Wikimedia Developer Summit 2018 sessions is to
provide guidance for Phase 2 of the Movement Strategic Direction [3] on
buildout of technology capabilities. We'd really love your thoughts to help
set context for our session next week, as Knowledge as a Service is a
primary consideration in the Movement Strategic Direction.
What is Knowledge as a Service? Its essence is about information
architecture approaches and the necessary software that will ultimately
allow content consumption and creation to radiate to new and different
types of interfaces and devices in addition to browser-based approaches. As
you review position papers from attendees [4] you'll notice that the way
they (myself included) think about best solving this is through a heavy
emphasis on technology that makes it easier to better structure information
and its metadata for re-use, remixing, and querying.
What might this mean? Does it mean we should build Wikimedia software in an
API- and metadata-first manner following industry standards compatible with
content structuration? Does it mean weaving our existing structured and
semi-structured data technologies together? How do we build technology that
can ensure successful collaboration between communities on increasingly
structured and interdependent information sources? And how can we ensure
the tech will bolster growth of multilingual and multimedia content
creation and consumption?
I've copied some of the essential material from the Movement Strategic
Direction concerning Knowledge as a Service so you have it here. We would
appreciate your input and hope you will subscribe to the Phabricator task
to contribute and follow along as we explore this topic.
https://phabricator.wikimedia.org/T183315
The following content is copied from https://meta.wikimedia.org/
wiki/Strategy/Wikimedia_movement/2017/Direction :
Knowledge as a service: To serve our users, we will become a platform that
serves open knowledge to the world across interfaces and communities. We
will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia. Our infrastructure will enable us and others to
collect and use different forms of free, trusted knowledge.
...
As technology spreads through every aspect of our lives, Wikimedia's
infrastructure needs to be able to communicate easily with other connected
systems.
...
As a platform, we need to transform our structures to support new formats,
new interfaces, and new types of knowledge. We have a strategic opportunity
to go further and offer this platform as a service to other institutions,
beyond Wikimedia. In a world that is becoming more and more connected,
building the infrastructure for knowledge gives others a vested interest in
our success. It is how we ensure our place in the larger network of
knowledge, and become an essential part of it. As a service to users, we
need to build the platform for knowledge or, in jargon, provide knowledge
as a service.
...
Knowledge as a service: A platform that serves open knowledge to the world
across interfaces and communities
Our openness will ensure that our decisions are fair, that we are
accountable to one another, and that we act in the public interest. Our
systems will follow the evolution of technology. We will transform our
platform to work across digital formats, devices, and interfaces. The
distributed structure of our network will help us adapt to local contexts.
...
We will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia.
We will continue to build the infrastructure for free knowledge for our
communities. We will go further by offering it as a service to others in
the network of knowledge. We will continue to build the partnerships that
enable us to develop knowledge we can't create ourselves.
...
Our infrastructure will enable us and others to collect and use different
forms of free, trusted knowledge.
We will build the technical infrastructures that enable us to collect free
knowledge in all forms and languages. We will use our position as a leader
in the ecosystem of knowledge to advance our ideals of freedom and
fairness. We will build the technical structures and the social agreements
that enable us to trust the new knowledge we compile. We will focus on
highly structured information to facilitate its exchange and reuse in
multiple contexts.
Thank you.
-Adam
[1] https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2018
[2] https://www.mediawiki.org/wiki/Wikimedia_Developer_
Summit/2018/Purpose_and_Results
[3] https://meta.wikimedia.org/wiki/Strategy/Wikimedia_
movement/2017/Direction
[4] https://wikifarm.wmflabs.org/devsummit/index.php/Session:10
Forwarding a reply from Joseph that somehow didn't go through.
---------- Forwarded message ----------
From: Joseph Allemandou <jallemandou(a)wikimedia.org>
To: Research into Wikimedia content and communities <wiki-research-l@lists.
wikimedia.org>, gerard.meijssen(a)gmail.com
Hi Gerard,
Here are my two cents on your questions.
About redlinks, you are correct in saying that the 3% of "other" link-type
are jumps from a page to another (using http-referer), while the hyperlink
from the origin to the target allowing for such a jump doesn't exist in the
origin page at the moment of computation.
>From my exploration of the dataset, such "other" links happen with the
"manually-edited-with-error" url class (the "-" article has a lot of such
entering links for instance), as well as with links that I think have been
edited in the origin page (for instance in November 2017 dataset, there are
"other" links from page "Kevin Spacey" to "Dan Savage",
"hebephilia","pedophilia or "Harvey_Weinstein" - Those links are confirmed
as existing at some point in the page in November, but not anymore at the
beginning of December when the pages hyperlinks are snapshot).
As for your question about what people are looking for and don't find, the
one way I can think of to get ideas is to use detailed session analysis
correlated with search results, in order to try to get a signal of pages
reached from search and not being visited for long. Even if I think we have
data we could use in that respect on the cluster, we can't publish such
details externally for privacy concerns, obviously.
Please let me know if what I say makes sense :)
Many thanks
Joseph Allemandou
> Hoi,
> Do I understand well that the 3% of "other" links are the ones that have
> articles at *this *time but they did not exist at the time of the dump. So
> in effect they are not red links?
>
> Is there any way to find the articles people were seeking but could not
> find??
> Thanks,
> GerardM
>
> On 16 January 2018 at 20:21, Leila Zia <leila(a)wikimedia.org> wrote:
>
> > Hi all,
> >
> > For archive happiness:
> >
> > Clickstream dataset is now being generated on a monthly basis for 5
> > Wikipedia languages (English, Russian, German, Spanish, and Japanese).
> You
> > can access the data at https://dumps.wikimedia.org/other/clickstream/
> and
> > read more about the release and those who contributed to it at
> > https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
> >
> > Best,
> > Leila
> >
>