Thank you very much Laurence.

From what I read on the web, I understand that this limit comes from elastic search in order to, as you mentionned it, prevent execution time and memory issues.
Considering this, we have decided to query our data using a sparql request. It seems to work better as we get results almost immediately even with an offset of 80000.

Below the sparql request we send :
PREFIX entity: <http://pfcnoemigration-wiki.bnf.fr/entity/>
PREFIX prop: <http://pfcnoemigration-wiki.bnf.fr/prop/direct/>
SELECT ?ent
WHERE {
   ?ent prop:P1 entity:Q58.
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],fr". }
}
LIMIT 50 OFFSET 80000

Best regards,

Pascal Lefeuvre
Scrum master for the French National Library




De :        "Laurence Parry" <greenreaper@hotmail.com>
A :        "Wikibase Community User Group" <wikibaseug@lists.wikimedia.org>
Date :        29/06/2021 15:13
Objet :        [Wikibase] Re: : Querying a wikibase with api.php - problem when offset greater than 10000




Hello Pascal,

Unfortunately the search offset limit does not seem to be modifiable via standard configuration settings (I'd be glad to be corrected here).

But I think you may be able to adjust it by editing
extensions/CirrusSearch/includes/Searcher.php

Specifically, there is a constant there, MAX_OFFSET_LIMIT, which is set to 10000:
https://github.com/wbstack/mediawiki/blob/15862d7af0c6b32e288a76c77aeae8e994f8de39/extensions/CirrusSearch/includes/Searcher.php#L71
(Code may differ slightly depending on version.)

You could try setting that to a different value and see if it helps, after doing whatever is appropriate to your setup to ensure that you are not getting a cached version of the old code after editing it.

Bear in mind that an offset-based query might be slow, especially 50 at a time, as it may process earlier entries each time you make a request.

Let us know if that is doable and works for you - if not, perhaps others have ideas (or maybe there could be interest in making it a configurable). I imagine the advice might be to use WDQS, but it might not provide the snippets you're looking for.

Best regards,
--
Laurence "GreenReaper" Parry - Curator, WikiFur


From: pascal.lefeuvre@bnf.fr <pascal.lefeuvre@bnf.fr>
Sent:
Tuesday, June 29, 2021 1:10:15 PM
To:
wikibaseug@lists.wikimedia.org <wikibaseug@lists.wikimedia.org>
Subject:
[Wikibase] : Querying a wikibase with api.php - problem when offset greater than 10000

 
Dear Wikibase users;

I write you since I've just encountered a problem with my wikibase.
I am querying my wikibase using api.php. The query returns more than 10000 items. I process the results page by page using srlimit and sroffset parameters.
The problem appears when sroffset becomes greater than 10000. Then I get this error



{"batchcomplete":"","warnings":{"search":{"*":"Could not retrieve results. Up to 10000 search results are supported, but results starting at 10000 were requested."}},"query":{"searchinfo":{"totalhits":0},"search":[]}}


The request is <My Wikibase URL>/w/api.php?action=query&format=json&list=search&srsearch=haswbstatement:%22P338=Q58%22&srprop=snippet|titlesnippet|redirecttitle&srlimit=50&sroffset=10000


Is there a way to go over this limit ?


Thank you for your answers.


Pascal Lefeuvre
Scrum master for the French National Library

Visitez les expositions sur le site François-Mitterrand et retrouvez les manifestations culturelles du mois de juin sur place ou à distance.

La bibliothèque tous publics est ouverte du mardi au samedi de 10 h à 19 h.
Les bibliothèques de recherche sont ouvertes, sur le site François-Mitterrand, le lundi de 14 h à 19 h et du mardi au samedi de 10 h à 19 h
Les sites
Richelieu, Arsenalet Opéraretrouvent leurs horaires habituels. Consulter les modalités d’accès

Avant d'imprimer, pensez à l'environnement._______________________________________________
Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org
To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org


Visitez les expositions sur le site François-Mitterrand et retrouvez les manifestations culturelles du mois de juin sur place ou à distance.

La bibliothèque tous publics est ouverte du mardi au samedi de 10 h à 19 h.
Les bibliothèques de recherche sont ouvertes, sur le site François-Mitterrand, le lundi de 14 h à 19 h et du mardi au samedi de 10 h à 19 h
Les sites Richelieu, Arsenal et Opéra retrouvent leurs horaires habituels. Consulter les modalités d’accès

Avant d'imprimer, pensez à l'environnement.