- Is there a unique key for the query log? The log I am refering to is
the *wdqs_extract* table from the hive database wmf. We would like to be able to permanently link our own >computed data with the log entry we computed it from. The answer is no, there is not, other than one you can calculate with the data available.
- Is there any other database system besides hive installed on the server?
Ahemm.. hive is not a database but I imagine if you are asking whether you need to write hive-friendly sql to access data? The answer is yes, you have to. You are talking to hadoop with SQL that is going to serialize itself into java code and return you the data you are interested in.
Beeline or hive should work.
On Tue, Jan 3, 2017 at 9:30 AM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
1. Is there a unique key for the query log? The log I am refering to is the *wdqs_extract* table**from the hive database wmf.**We would like to be able to permanently link our own computed data with the log entry we computed it from.
I think you can use hostname+sequence (from https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest, assuming those are preserved in wdqs_extract) as a key.
2. Is it possible to find out if a query in a given log entry was accepted by the sparql endpoint as valid?
If it wasn't, the result code should be 400.
3. Is there any other database system besides hive installed on the server?
I think the currently recommended interface is beeline, not sure about other DB systems.
And finally a question on conventions for this mailing list: Am I correct in sending one mail for multiple questions or should I send separate mails for each question?
I think it's ok. For the questions regarding data and other WDQS specifics you may also CC me or discovery@lists.wikimedia.org.
-- Stas Malyshev smalyshev@wikimedia.org
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics