Hey all!
wmfdata-python <https://github.com/wikimedia/wmfdata-python> (a package
that streamlines access to private analytics data) has been updated to
version 1.1. Here's what's new:
- The new presto module supports querying the Data Lake using Presto
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto>.
- The spark module has been refactored to support local and custom
sessions.
- A new utils.get_dblist function provides easy access to wiki database
lists, which is particularly useful with mariadb.run.
- The hive.run_cli function now creates its temp files in standard
location, to avoid creating distracting new entries in the current working
directory.
Many thanks to:
- Andrew Otto and Adam Roses Wight for writing significant new code
- Mikhail Popov, Andrew Otto, and Luca Toscano for careful code review
As always, if you have questions or feedback about wmfdata-python, please
email Product Analytics at product-analytics(a)wikimedia.org.
--
Neil Shah-Quinn
senior data scientist, Product Analytics
<https://www.mediawiki.org/wiki/Product_Analytics>
Wikimedia Foundation <https://wikimediafoundation.org/>
Show replies by thread