Hi all,
For all Hive users using stat1002/1004, you might have seen a deprecation
warning when you launch the hive client - that claims it's being replaced
with Beeline. The Beeline shell has always been available to use, but it
required supplying a database connection string every time, which was
pretty annoying. We now have a wrapper
<https://github.com/wikimedia/operations-puppet/blob/production/modules/role…>
script
setup to make this easier. The old Hive CLI will continue to exist, but we
encourage moving over to Beeline. You can use it by logging into the
stat1002/1004 boxes as usual, and launching `beeline`.
There is some documentation on this here:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Beeline.
If you run into any issues using this interface, please ping us on the
Analytics list or #wikimedia-analytics or file a bug on Phabricator
<http://phabricator.wikimedia.org/tag/analytics>.
(If you are wondering stat1004 whaaat - there should be an announcement
coming up about it soon!)
Best,
--Madhu :)
We’re glad to announce the release of an aggregate clickstream dataset extracted from English Wikipedia
http://dx.doi.org/10.6084/m9.figshare.1305770 <http://dx.doi.org/10.6084/m9.figshare.1305770>
This dataset contains counts of (referer, article) pairs aggregated from the HTTP request logs of English Wikipedia. This snapshot captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015.
This data can be used for various purposes:
• determining the most frequent links people click on for a given article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a link in that article
• generating a Markov chain over English Wikipedia
We created a page on Meta for feedback and discussion about this release: https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream <https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream>
Ellery and Dario
Hi everybody, I'm new to the list and have been referred here by a comment
from a SO user as per my question [1], that I'm quoting next:
I
* have been successfully able to use the Wikipedia pagelinks SQL dump to
obtain hyperlinks between Wikipedia pages for a specific revision
time.However, there are cases where multiple instances of such links exist,
e.g. the very same https://en.wikipedia.org/wiki/Wikipedia
<https://en.wikipedia.org/wiki/Wikipedia> page and
https://en.wikipedia.org/wiki/Wikimedia_Foundation
<https://en.wikipedia.org/wiki/Wikimedia_Foundation>. I'm interested to
find number of links between pairs of pages for a specific revision. Ideal
solutions would involve dump files other than pagelinks (which I'm not
aware of), or using the MediaWiki API.*
To elaborate, I need this information to weight (almost) every hyperlink
between article pages (that is, in NS0), that was present in a specific
wikipedia revision (end of 2015), therefore, I would prefer not to follow
the solution suggested by the SO user, that would be rather impractical.
Indeed, my final aim is to use this weight in a thresholding fashion to
sparsify the wikipedia graph (that due to the short diameter is more or
less a giant connected component), in a way that should reflect the
"relatedness" of the linked pages (where relatedness is not intended as
strictly semantic, but at a higher "concept" level, if I may say so).
For this reason, other suggestions on how determine such weights (possibly
using other data sources -- ontologies?) are more than welcome.
The graph will be used as dataset to test an event tracking algorithm I am
doing research on.
Thanks,
Mara
[1]
http://stackoverflow.com/questions/42277773/number-of-links-between-two-wik…
A reminder that applications to attend WikiCite 2017
<https://meta.wikimedia.org/wiki/WikiCite_2017> close on *February 27, 2017*
.
Please consider applying
<https://docs.google.com/forms/d/e/1FAIpQLScWnCLfAt88cUWKSu_E-lU8m3te_r4P3ng…>
if you work on sources and citations (or related tools) in Wikipedia,
Wikidata, Wikisource or other Wikimedia projects. If there are other people
in your network we should consider inviting to the event, please let us
know. You can contact the organizing committee at: wikicite(a)wikimedia.org.
Best,
Dario
-- on behalf of the organizers
On Thu, Feb 9, 2017 at 3:44 PM, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:
> Dear all,
>
> I am happy to announce that applications to attend WikiCite ‘17 officially open
> today <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>.
>
> About the event
>
> WikiCite 2017 <https://meta.wikimedia.org/wiki/WikiCite_2017> is a 3-day
> conference, summit and hack day to be hosted in Vienna, Austria, on May
> 23-25, 2017. It expands on efforts started last year at WikiCite 2016
> <https://meta.wikimedia.org/wiki/WikiCite_2016/Report> to design a
> central bibliographic repository, as well as tools and strategies to
> improve information quality and verifiability in Wikimedia projects.
>
> Our goal is to bring together Wikimedia contributors, data modelers,
> information and library science experts, software engineers, designers and
> academic researchers who have experience working with Wikipedia's citations
> and bibliographic data.
>
> WikiCite 2017 will be a venue to:
>
> -
>
> Day 1. (Conference) – present progress on existing work and
> initiatives for citations and bibliographic data across Wikimedia projects
> -
>
> Day 2. (Summit) – discuss technical, social, outreach and policy
> directions
> -
>
> Day 3. (Hack) – get together to build, based on new ideas and
> applications
>
>
>
> More information on the event can be found here
> <https://meta.wikimedia.org/wiki/WikiCite_2017>:
>
> How to apply
>
> Participation for this year's event is limited to 100 individuals. In
> order to be considered for participation, please fill out the following
> form <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2> and provide us with some
> information about yourself, your interests, and expected contribution.
> PLEASE NOTE THIS IS NOT THE FINAL REGISTRATION FORM. Your application will
> be reviewed and the organizing committee will extend an invitation by March
> 10, 2017. This application form is to determine the best mix of
> attendees. Not everyone who applies will receive an invitation, but there
> will be a waitlist.
>
> Important dates
>
>
> -
>
> February 9, 2017: applications open
> -
>
> February 27, 2017: applications close, waitlist opens
> -
>
> March 10, 2017: all final notifications of acceptance are issued,
> waitlist processing begins
> -
>
> March 31, 2017: attendee list is finalized
>
>
> Travel support
>
>
> Like last year, limited funding to cover travel costs of prospective
> participants will be available. Requests for travel support should be
> submitted via the application form
> <https://goo.gl/forms/Kb9Wl6Xfw2EmFqEr2>. We will confirm by March 10, if
> we can provide you with travel support.
>
> Contact
>
> For any question, you can contact the organizing committee via:
> wikicite(a)wikimedia.org
>
> We look forward to seeing you in Vienna!
>
> The WikiCite 2017 organizing committee
>
> Dario Taraborelli
>
> Jonathan Dugan
>
> Lydia Pintscher
>
> Daniel Mietchen
>
> Cameron Neylon
>
>
>
> *Dario Taraborelli *Director, Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> <http://twitter.com/readermeter>
>
--
*Dario Taraborelli *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
Hi Research-l,
A human resources problem that I am experiencing is a shortage of human
resources of community members who are willing, available, and have the
skills to work on a variety of useful initiatives. Is anyone on this list
aware of research that talks about motivations of long-term contributors?
In particular, I'd be interested in research that suggests ways to convert
productive, relatively new editors (say, 50-500 edits) into long-term
community members who are likely to develop into long-term, productive
Wikimedians.
Thanks,
Pine
Apologies for cross-posting
Call for Papers, Posters & Workshops
SEMANTiCS 2017 - The Linked Data Conference
13th International Conference on Semantic Systems
Amsterdam, Netherlands
September 11 -14, 2017
http://2017.semantics.cc
For details please go to: https://2017.semantics.cc/calls
Important Dates (Research & Innovation):
*Abstract Submission Deadline: May 17, 2017 (11:59 pm, Hawaii time)
*Paper Submission Deadline: May 24, 2017 (11:59 pm, Hawaii time)
*Notification of Acceptance: July 3, 2017 (11:59 pm, Hawaii time)
*Camera-Ready Paper: August 14, 2017 (11:59 pm, Hawaii time)
Important Dates (Workshops & Tutorials):
*Submission of Proposals for Workshops with Call for Papers: March 31,
2017 (23:59 Hawaii Time)
*Submission of Proposals for Tutorials and Workshops without Call for
Papers: June 30, 2017 (23:59 Hawaii Time)
*Workshop Proposals Notification of Acceptance: April 13, 2017 (23:59
Hawaii Time)
*Workshop Website/Call for Papers Online: April 30, 2017 (23:59 Hawaii Time)
*Workshop Camera-Ready Proceedings: September 4, 2017 (23:59 Hawaii Time)
*SEMANTiCS 2017 Workshop & Tutorial Days: September 11 and 14, 2017
As in the previous years, SEMANTiCS’17 proceedings will be published by
ACM ICPS (pending) and CEUR WS proceedings.
SEMANTiCS 2017 will especially welcome submissions for the following hot
topics:
*Data Science (special track, see below)
*Web Semantics, Linked (Open) Data & schema.org
*Corporate Knowledge Graphs
*Knowledge Integration and Language Technologies
*Data Quality Management
*Economics of Data, Data Services and Data Ecosystems
Following the success of previous years, the ‘horizontals’ (research)
and ‘verticals’ (industries) below are of interest for the conference:
Horizontals:
*Enterprise Linked Data & Data Integration
*Knowledge Discovery & Intelligent Search
*Business Models, Governance & Data Strategies
*Semantics in Big Data
*Text Analytics
*Data Portals & Knowledge Visualization
*Semantic Information Management
*Document Management & Content Management
*Terminology, Thesaurus & Ontology Management
*Smart Connectivity, Networking & Interlinking
*Smart Data & Semantics in IoT
*Semantics for IT Safety & Security
*Semantic Rules, Policies & Licensing
*Community, Social & Societal Aspects
Data Science Special Track Horizontals:
*Large-Scale Data Processing (stream processing, handling large-scale
graphs)
*Data Analytics (Machine Learning, Predictive Analytics, Network Analytics)
*Communicating Data (Data Visualization, UX & Interaction Design,
Crowdsourcing)
*Cross-cutting Issues (Ethics, Privacy, Security, Provenance)
Verticals:
*Industry & Engineering
*Life Sciences & Health Care
*Public Administration
*e-Science
*Digital Humanities
*Galleries, Libraries, Archives & Museums (GLAM)
*Education & eLearning
*Media & Data Journalism
*Publishing, Marketing & Advertising
*Tourism & Recreation
*Financial & Insurance Industry
*Telecommunication & Mobile Services
*Sustainable Development: Climate, Water, Air, Ecology
*Energy, Smart Homes & Smart Grids
*Food, Agriculture & Farming
*Safety, Security & Privacy
*Transport, Environment & Geospatial
For call details please go to: https://2017.semantics.cc/calls