I don't see a reason that you can't have access to the quarry db. Does anyone else?
On Mon, Jun 5, 2023 at 2:43 PM Hal Triedman htriedman@wikimedia.org wrote:
Hi cloud admins!
My name is Hal Triedman — I'm a Privacy Engineer at WMF, but in my spare time I do a lot of work on machine learning. One of the things we've been looking into is the creation of label-query datasets for Mediawiki database queries, with the goal of being able to finetune an AI model to help users write queries with more ease/create embeddings that allow for easier searching of past queries.
Quarry is particularly interesting for this project because it has the following qualities:
- it is entirely on Mediawiki databases
- it has been used to make hundreds of thousands of queries
- many of those queries have relatively descriptive titles about what is
happening in the SQL
Is there any easy way of assembling a database of existing public title-query pairs (i.e. by running a database query that excludes things like "Untitled query", or just pulling published queries)? Please let me know, and thanks.
Hal _______________________________________________ Cloud-admin mailing list -- cloud-admin@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-admin.lists.wikimedia.org/