Hello volunteer developers & technical contributors!
The Wikimedia Foundation is asking for your feedback in a survey. We want
to know how well we are supporting your contributions on and off wiki, and
how we can change or improve things in the future.[1] The opinions you
share will directly affect the current and future work of the Wikimedia
Foundation. To say thank you for your time, we are giving away 20 Wikimedia
T-shirts to randomly selected people who take the survey.[2] The survey is
available in various languages and will take between 20 and 40 minutes.
Use this link to take the survey now:
https://wikimedia.qualtrics.com/SE/?SID=SV_6mTVlPf6O06r3mt&Aud=DEV&Src=DEV
You can find more information about this project here[3]. This survey is
hosted by a third-party service and governed by this privacy statement[4].
Please visit our frequently asked questions page to find more information
about this survey[5]. If you need additional help or have questions about
this survey, send an email to surveys(a)wikimedia.org.
Thank you!
Edward Galvez
Survey Specialist, Community Engagement
Wikimedia Foundation
[1] This survey is primarily meant to get feedback on the Wikimedia
Foundation's current work, not long-term strategy.
[2]Legal stuff: No purchase necessary. Must be the age of majority to
participate. Sponsored by the Wikimedia Foundation located at 149 New
Montgomery, San Francisco, CA, USA, 94105. Ends January 31, 2017. Void
where prohibited. Click here for contest rules.
[3] About this survey:
https://meta.wikimedia.org/wiki/Community_Engagement_
Insights/About_CE_Insights
[4] Privacy statement: https://wikimediafoundation.
org/wiki/Community_Engagement_Insights_2016_Survey_Privacy_Statement
[5] FAQ:
https://meta.wikimedia.org/wiki/Community_Engagement_
Insights/Frequently_asked_questions
> I hope comms figures out a way to counter-act the public
> opinion that Wikipedia traffic is monitored by the government.
Wikipedia is the very first example given by NSA training materials
for how to add sites to the XKEYSCORE GUI:
https://assets.documentcloud.org/documents/2116354/pages/xks-for-counter-cn…
Hi!
> 1. Is there a unique key for the query log? The log I am refering to
> is the *wdqs_extract* table**from
> the hive database wmf.**We would like to be able to
> permanently link our own computed data with the log entry we
> computed it from.
I think you can use hostname+sequence (from
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest, assuming
those are preserved in wdqs_extract) as a key.
> 2. Is it possible to find out if a query in a given log entry was
> accepted by the sparql endpoint as valid?
If it wasn't, the result code should be 400.
> 3. Is there any other database system besides hive installed on the
> server?
I think the currently recommended interface is beeline, not sure about
other DB systems.
> And finally a question on conventions for this mailing list: Am I
> correct in sending one mail for multiple questions or should I send
> separate mails for each question?
I think it's ok. For the questions regarding data and other WDQS
specifics you may also CC me or discovery(a)lists.wikimedia.org.
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hello!
Do check latest blogpost by Andrew, the question of how to import plain json into hadoop from kafka comes up frequently and he explains how to do it step by step:
https://blog.wikimedia.org/2017/01/13/json-hadoop-kafka/
Yours truly should not be an author as I just proof read it. Just saying.
Thanks,
Nuria
Hi,
While looking at
https://stats.wikimedia.org/archive/squid_reports/2016-10/SquidReportPageVi…
I
noticed that France received traffic from a large number of language
Wikipedias that it usually doesn't. Most notable was over 21 million
pageviews to the Avar Wikipedia, which accounted for almost all of the
traffic to that Wikipedia that month and about two orders of magnitude more
than that Wikipedia receives in most months (see
https://stats.wikimedia.org/EN/TablesPageViewsMonthlyOriginal.htm).
It looks like a bot that happened to run in France but didn't get
classified by the existing algorithms as a bot.
Does anybody have any other ideas about what might have happened here?
Thank you!
Vipul
analytics-store was brought down at 6am, and then again at 9am UTC 25 Dec
due to multiple executions of long running queries (some of them 2 days
long) such as:
SELECT LEFT(timestamp, 8) AS yearmonthday, timestamp, userAgent, clientIp,
webHost, COUNT(*) AS copies FROM log.PageContentSaveComplete ...
SELECT COUNT(*) AS count, term_entity_type, term_type, term_language FROM
wikidatawiki.wb_terms ...
select date('20161218000000') as day, actions, count(*) as repeated from
(select group_concat(event_action order by timestamp, action_order.ord
separator '-') as actions from (select ...
I would urge you to setup a per-user/per-service query resource limits,
otherwise poorly performant queries will affect all users (and in cases
like this, create downtime). I have set up query limits for all
research/analytics users temporarily until 3rd January.
--
Jaime Crespo
<http://wikimedia.org>