Hi! I'm trying to get articletopic predictions for a bunch of Wikipedia articles.[1] This value is cached in Elasticsearch indices,[2] under the WeightedTags field.[3]
Because using CirrusSearch through the Action API would return at most 500 results,[4] I was thinking of querying the CirrusSearch database directly.
I've seen there is the CloudElastic replica,[5] but I'm not being able to use it from PAWS. Is it only available from Cloud VPS and Toolforge?
Otherwise, can you suggest an alternative for what I'm trying to accomplish? Thank you!
[1] https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_articletopic_outl... [2] https://wikitech.wikimedia.org/wiki/Search/articletopic [3] https://wikitech.wikimedia.org/wiki/Search/WeightedTags [4] https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch [5] https://wikitech.wikimedia.org/wiki/Help:CirrusSearch_OpenSearch_replicas
Hello! Would you be able to explain a little more on the access pattern? Is this going to be a bulk operation across, for example, all articles on a wiki? Would you mind posting your reply on this list as well as, after signing up (if not already signed up), on the discovery list ( https://lists.wikimedia.org/postorius/lists/discovery.lists.wikimedia.org/ ) ?
Thanks! -Adam
On Mon, Oct 13, 2025 at 9:24 AM delahera@gmail.com wrote:
Hi! I'm trying to get articletopic predictions for a bunch of Wikipedia articles.[1] This value is cached in Elasticsearch indices,[2] under the WeightedTags field.[3]
Because using CirrusSearch through the Action API would return at most 500 results,[4] I was thinking of querying the CirrusSearch database directly.
I've seen there is the CloudElastic replica,[5] but I'm not being able to use it from PAWS. Is it only available from Cloud VPS and Toolforge?
Otherwise, can you suggest an alternative for what I'm trying to accomplish? Thank you!
[1] https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_articletopic_outl... [2] https://wikitech.wikimedia.org/wiki/Search/articletopic [3] https://wikitech.wikimedia.org/wiki/Search/WeightedTags [4] https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch [5] https://wikitech.wikimedia.org/wiki/Help:CirrusSearch_OpenSearch_replicas _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
This seems like a bug, so I filed https://phabricator.wikimedia.org/T407216 to track it.
Taavi
On Mon, Oct 13, 2025 at 5:24 PM delahera@gmail.com wrote:
Hi! I'm trying to get articletopic predictions for a bunch of Wikipedia articles.[1] This value is cached in Elasticsearch indices,[2] under the WeightedTags field.[3]
Because using CirrusSearch through the Action API would return at most 500 results,[4] I was thinking of querying the CirrusSearch database directly.
I've seen there is the CloudElastic replica,[5] but I'm not being able to use it from PAWS. Is it only available from Cloud VPS and Toolforge?
Otherwise, can you suggest an alternative for what I'm trying to accomplish? Thank you!
[1] https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_articletopic_outl... [2] https://wikitech.wikimedia.org/wiki/Search/articletopic [3] https://wikitech.wikimedia.org/wiki/Search/WeightedTags [4] https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch [5] https://wikitech.wikimedia.org/wiki/Help:CirrusSearch_OpenSearch_replicas _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
Thank you Adam and Taavi for your messages.
@Taavi, thanks for filing that bug. I've already subscribed to it.
@Adam, what I'm trying to do is part of a prototype project to describe the nature of wikilinks from a list of articles of interest. That is, given a list of around 1000 articles, I get the wikilinks coming out from those articles (around 30 k) and try to describe them in different ways: the categories they belong to, the Wikidata items they are linked to and, hopefully, their `articletopic` predictions.
For now, I'm doing this in a personal PAWS notebook and I plan to run it on around 800 enwiki and 200 eswiki articles. But I would like to share the notebook with others in the future, so they can use it for their own list of articles, and I may try to make it into a Toolforge tool eventually.
I hope this clarifies things. Otherwise, please let me know!
Thanks,
Diego
On Tue, 14 Oct 2025 at 09:34, Taavi Väänänen taavi@wikimedia.org wrote:
This seems like a bug, so I filed https://phabricator.wikimedia.org/T407216 to track it.
Taavi
On Mon, Oct 13, 2025 at 5:24 PM delahera@gmail.com wrote:
Hi! I'm trying to get articletopic predictions for a bunch of Wikipedia
articles.[1] This value is cached in Elasticsearch indices,[2] under the WeightedTags field.[3]
Because using CirrusSearch through the Action API would return at most
500 results,[4] I was thinking of querying the CirrusSearch database directly.
I've seen there is the CloudElastic replica,[5] but I'm not being able
to use it from PAWS. Is it only available from Cloud VPS and Toolforge?
Otherwise, can you suggest an alternative for what I'm trying to
accomplish? Thank you!
[1]
https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_articletopic_outl...
[2] https://wikitech.wikimedia.org/wiki/Search/articletopic [3] https://wikitech.wikimedia.org/wiki/Search/WeightedTags [4]
https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bsearch
[5]
https://wikitech.wikimedia.org/wiki/Help:CirrusSearch_OpenSearch_replicas
Cloud mailing list -- cloud@lists.wikimedia.org List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
-- Taavi Väänänen (he/him) Site Reliability Engineer, Cloud Services Wikimedia Foundation _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/