Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
Curious, what percentage of digital assistants (Alexa, Siri, Cortana,
Google) cite Wikipedia when a person asks a question?
Does the current Wikipedia mobile app support voice search?
Are there any reports on this? Thanks in advance!
Sincere regards,
Stella
--
Stella Yu | STELLARESULTS | 415 690 7827
"Chronicling heritage brands and legendary people."
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Here's an update on a design research project we recently did at the
Wikimedia Foundation.
The discussion sections are geared toward a general rather than a scholarly
audience, but you are all more than welcome to attend. We hope to do a
research showcase presentation in the future as well.
---------- Forwarded message ----------
From: Neil Patel Quinn <nquinn(a)wikimedia.org>
Date: 27 September 2017 at 03:39
Subject: The secret lives of new editors: research report and discussion
sessions
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Hello everyone!
In May and June of this year, a team of researchers from the Wikimedia
Foundation and Reboot [1] traveled to the South Korea and the Czech
Republic to learn more about the experiences new editors have on the Czech
and Korean Wikipedias.
We interviewed 47 new editors and 17 experienced editors and (with an
intermediate stop on several thousand sticky notes) summarized what we
learned in 11 findings. You can learn more about the project and see our
full report on our wiki page, mw:New Editor Experiences [2].
Of the 11 findings we identified, some may be surprising to you, while
others may reinforce what you already knew. Either way, we'd love to know
what you think. We're holding two public discussion sessions next week to
talk briefly about our findings and then take questions and comments.
We hope you'll come! The two sessions will be at:
1. Wednesday, October 4, 09:30–11:00 PDT (16:30–18:00 UTC)
2. Thursday, October 5, 21:00–22:30 PDT (Friday, October 6, 04:00–05:30 UTC)
Full details and instructions on how to join are at mw:New Editor
Experiences/October 2017 discussions [3].
[1]: https://reboot.org/
[2]: https://www.mediawiki.org/wiki/New_Editor_Experiences
[3]: https://www.mediawiki.org/wiki/New_Editor_Experiences/Octobe
r_2017_discussions
--
Neil Patel Quinn <https://meta.wikimedia.org/wiki/User:Neil_P._Quinn-WMF>,
product analyst
Wikimedia Foundation
Chris Koerner, 25/09/2017 23:32:
> * Mikhail created a dashboard to track the prevalence of sister
> project search results on fulltext search result pages on desktop,
> broken up by language. For example, it turns out that nearly 80% of
> fulltext searches show sister projects on enwiki. [30]
> [30] https://discovery.wmflabs.org/metrics/#sister_search_prevalence
Interesting. There's probably some underlying pattern to analyse that
would tell us something about the relative development of various
Wikimedia projects in several languages.
Nemo
[Begging pardon if you have already read this in the Wikidata mailing list]
Hi everyone,
Remember the StrepHit research project?
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
And the Wikidata primary sources tool?
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
While the StrepHit team is building its next version, I'd like to
invite you to have a look at a new project proposal.
The main goal is to add a high volume of identifiers to Wikidata,
ensuring live maintenance of links.
Do you think that Wikidata should become the central linking hub of
open knowledge?
If so, I'd be really grateful if you could endorse the *soweego* project:
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego
Of course, any comment is more than welcome on the discussion page.
Looking forward to your valuable feedback.
Best,
Marco
Dear Friends,
I saw yesterday the research output of African institutions about Wikipedia and other WIkimedia projects in Web of Science Core Collection. The query was:
TOPIC:(Wikipedia OR Wikipédia) OR TOPIC:(Wiktionary OR Wiktionnaire) ORTOPIC: (Wikisource) OR TOPIC:(Wikinews) OR TOPIC: (Wikiversity OR Wikiversité) OR TOPIC:(Wikimedia Labs) OR TOPIC:(Wikimedia) OR TOPIC: (Wikimedia Commons) OR TOPIC: (Wikispecies) OR TOPIC: (Wiki Loves) OR TOPIC:(Wikidata) OR TOPIC: (Wikimedia Incubator) OR TOPIC: (Mediawiki) ORTOPIC: (Meta-wiki) OR TOPIC:(Wikiquote) OR TOPIC: (Wikivoyage OR Wikitravel) OR TOPIC: (Wikibooks OR Wikilivres)
Timespan: All years. Indexes: SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, ESCI.
I found that only 53 of the 3832 researches dealt with Wikimedia projects were from African countries (1,3%) and that only 12 African countries do Wikimedia research. This situation should be ameliorated by trying to do collaborations with African scientists, countries and institutions that do Wikimedia Research. Here is their lists:
Countries: Tunisia (14), South Africa (10), Egypt (10), Algeria (8), Nigeria (6)
Institutions: Université de Sfax, Tunisia (9), Federal Teaching Hospital Abakaliki, Nigeria (5), Université de Tunis El Manar, Tunisia (3), Université de Sousse, Tunisia (3), Université de M'hamed Bougara Boumerdès, Algeria (3), University of Science and Technology Houari Boumédiene, Algeria (2), University of South Africa, South Africa (2), Université de Carthage, Tunisia (2), Nile University, Egypt (2), Cairo University, Egypt (2), Benha University, Egypt (2)
Authors: Mohamed Ali Hadj Taieb, Faculty of Sciences of Sfax, University of Sfax, Tunisia (5), Stanley C. Igwe, Department of Neuro-psychiatry, Federal Teaching Hospital Abakaliki, Nigeria (5), Mohamed Ben Aouicha, Faculty of Sciences of Sfax, University of Sfax, Tunisia (5), Mhamed Mataoui, École Militaire Polytechnique, Algeria (3), Abdelmajid Ben Hamadou, ISIMS, University of Sfax, Tunisia (3), Meriem Amina Zingla, Faculty of Sciences of Tunis, University of Tunis El Manar, Tunisia (2), Yahya Slimani, Faculty of Sciences of Tunis, University of Tunis El Manar, Tunisia (2), Hoda M O Mokhtar, Faculty of Computers and Information, Cairo University, Egypt (2), Mohamed Mezghiche, Université de M'hamed Bougara Boumerdès, Algeria (2), Ghada Feki, ENIS, University of Sfax, Sfax, Tunisia (2), Rim Fakhfakh, ENIS, University of Sfax, Sfax, Tunisia (2), Anis Ben Ammar, ENIS, University of Sfax, Sfax, Tunisia (2), Chokri Ben Amar, ENIS, University of Sfax, Sfax, Tunisia (2), Samhaa R. El-Beltagy, Center of Informatics Science, Nile University, Egypt (2), Chiraz Latiri, Faculty of Sciences of Tunis, University of Tunis El Manar, Tunisia (2), Eslam Amer, Faculty of Computer Science and Information, Benha University, Egypt (2).
I ask about how Wikimedia Research team can collaborate with these performant Wikimedia researchers from Africa to increase African Wikimedia research output productivity and if it will be excellent to invite them to WikiIndaba 2018.
Yours Sincerely,
Houcemeddine Turki
Hi Everyone,
The next Research Showcase will be live-streamed this Wednesday, September
20, 2017 at 11:30 AM (PST) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=VR5JwqyVGSk
As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#September_2017>.
This month's presentation:
A Glimpse into BabelAn Analysis of Multilinguality in WikidataBy *Lucie-Aimée
Kaffee*Multilinguality is an important topic for knowledge bases,
especially Wikidata, that was build to serve the multilingual requirements
of an international community. Its labels are the way for humans to
interact with the data. In this talk, we explore the state of languages in
Wikidata as of now, especially in regard to its ontology, and the
relationship to Wikipedia. Furthermore, we set the multilinguality of
Wikidata in the context of the real world by comparing it to the
distribution of native speakers. We find an existing language
maldistribution, which is less urgent in the ontology, and promising
results for future improvements. An outlook on how users interact with
languages on Wikidata will be given.
Science is Shaped by WikipediaEvidence from a Randomized Control TrialBy *Neil
C. Thompson and Douglas Hanley*As the largest encyclopedia in the world, it
is not surprising that Wikipedia reflects the state of scientific
knowledge. However, Wikipedia is also one of the most accessed websites in
the world, including by scientists, which suggests that it also has the
potential to shape science. This paper shows that it does. Incorporating
ideas into a Wikipedia article leads to those ideas being used more in the
scientific literature. This paper documents this in two ways:
correlationally across thousands of articles in Wikipedia and causally
through a randomized experiment where we added new scientific content to
Wikipedia. We find that fully a third of the correlational relationship is
causal, implying that Wikipedia has a strong shaping effect on science. Our
findings speak not only to the influence of Wikipedia, but more broadly to
the influence of repositories of scientific knowledge. The results suggest
that increased provision of information in accessible repositories is a
very cost-effective way to advance science. We also find that such gains
are equity-improving, disproportionately benefitting those without
traditional access to scientific information.
Many kind regards,
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodlund(a)wikimedia.org
Hi everyone,
We’re preparing for the July 2017 research newsletter and looking for contributors. Please take a look at:
https://etherpad.wikimedia.org/p/WRN201707 and add your name next to any paper you are interested in covering. Our target publication date is Friday September 22 UTC. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
• Modeling Dynamics of Wikipedia: An Empirical Analysis Using a Vector Error Correction Model
• Implementation and Evaluation of a Framework to calculate Impact Measures for Wikipedia Authors
• The Russian-language Wikipedia as a Measure of Society Political Mythologization
• An end-to-end learning solution for assessing the quality of Wikipedia articles
• What Do Wikidata and Wikipedia Have in Common?: An Analysis of Their Use of External References
• A Glimpse into Babel: An Analysis of Multilinguality in Wikidata
• Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect
• Before the Sense of ‘We’: Identity Work as a Bridge from Mass Collaboration to Group Emergence
If you have any question about the format or process feel free to get in touch off-list.
Masssly, Tilman Bayer and Dario Taraborelli
[1] http://meta.wikimedia.org/ wiki/Research:Newsletter