jayvdb added a comment.
re -random, and even the pagegenerator, I doubt anyone using those care about randomness.
I was looking for a MediaWiki API bug about allowing continuation. I didnt find one.
TASK DETAIL
https://phabricator.wikimedia.org/T84944
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: gerritbot, valhallasw, jayvdb, Aklapper, Mpaa, pywikipedia-bugs
valhallasw added a comment.
The underlying randomness algorithm is as follows:
- each page is stored with a random number, `page_random`, between 0 and 1
- generator=random runs `SELECT * FROM page WHERE page_random > {value} LIMIT {limit}`, with value a random number between 0 and 1, and LIMIT the number of pages to retrieve
I suppose the API could actually expose page_random as opaque 'continue' parameter, which would then allow actual continuation, and hence provide full random-without-replacement?
As for //our// users: they would typically use -random from the command line, and iirc generators from the command line are always filtered for uniqueness.
TASK DETAIL
https://phabricator.wikimedia.org/T84944
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw
Cc: gerritbot, valhallasw, jayvdb, Aklapper, Mpaa, pywikipedia-bugs
jayvdb added a comment.
To me, 'step' feels like it is breaking a batch into non overlapping subsets, which isnt strictly true if each 'step' is a new random sequence, especially if each batch contains only unique items (which means the server algorithm is slightly reducing the randomness, when a duplicate appears).
If we look at a very small wiki, the underlying generator doesnt repeat if the limit isnt reached.
https://www.molnac.unisa.it/BioTools/mdcons/api.php?action=query&generator=…https://www.molnac.unisa.it/BioTools/mdcons/index.php/Special:ListFiles
IMO, in site.randompages, we are trying to expose the underlying MediaWiki API, and it doesnt have continuation. A caller cant know that limit 20 is two batches of 10 from the server algorithm, or a single batch of 20 from the server algorithm (which is unique?). The only way to have any chance of knowing that is to obtain the API limit from paraminfo , and use that.
However, I dont know the underlying randomness algorithm well enough to speak with much authority about that, or how many of our users are wanting to 'see' the underlying randomness vs happy with any randomness that has slight oddities introduced by multiple disjoint batches.
Whatever we do, we need to update the docstring to explain what we are doing in case the caller cares.
TASK DETAIL
https://phabricator.wikimedia.org/T84944
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: gerritbot, valhallasw, jayvdb, Aklapper, Mpaa, pywikipedia-bugs
cpa199 added a comment.
Many thanks for that, I can see that it has been updated indeed.
TASK DETAIL
https://phabricator.wikimedia.org/T87248
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Chad, cpa199
Cc: Krinkle, XZise, valhallasw, JanZerebecki, Nikerabbit, siebrand, cpa199, zhaofengli, llbraughler, adrianheine, Krenair, Xqt, jayvdb, fbstj, greg, Legoktm, Chad, MarkTraceur, matmarex, UltrasonicNXT, Aklapper, QChris, pywikipedia-bugs
valhallasw added a comment.
I don't see why having step is related to having a continuation mechanism. It's a parameter that's passed to the api, which has a well-defined meaning.
I also don't see why returning duplicates is an issue. Random sampling is typically with replacement, unless specified otherwise, so the caller should not be surprised to see duplicates, and should filter them manually.
TASK DETAIL
https://phabricator.wikimedia.org/T84944
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: valhallasw
Cc: gerritbot, valhallasw, jayvdb, Aklapper, Mpaa, pywikipedia-bugs