I turns out, this is a little more complicated than it appeared at first;  usercontribs and list users have different concepts of "invalid".  If you ask for usercontribs on "1.2.3.4", it's valid.  If you pass in "1.2.3.0/24", you get baduser..  But list users returns:

{
    "batchcomplete": "",
    "query": {
        "users": [
            {
                "name": "1.2.3.4",
                "invalid": ""
            }
        ]
    }
}

which I guess makes sense in that context since it can't map it to a userid.  I can work around this, but mentioning it for the sake of some poor developer searching the archives N years from now trying to figure it out :-)


On Aug 19, 2021, at 6:21 PM, Bryan Davis <bd808@wikimedia.org> wrote:

On Thu, Aug 19, 2021 at 4:04 PM Roy Smith <roy@panix.com> wrote:

I've got a tool which parses sockpuppet investigation (SPI) pages and does some analysis.  One of the steps is I need to validate that all of the usernames found in the SPI report are valid.  I do that by sequentially calling usercontribs on each name with uclimit=1 and seeing if I get a baduser error.

This works, but it's slow because I need to make 1 API call for each user.  For a big SPI case, the time to do this swamps everything else.  Is there a more efficient way to do this?  Some API call where I can give it a bunch of usernames in a batch and have it tell me which ones are invalid?  Alternatively, is there a regex I could apply on the client side to test if a username is valid?

The most common type of invalid name I see is when somebody puts down an iprange (i.e. 1.2.4.0/24) as a username.  Testing for that client-side would be trivial, but it might miss some others.

You can do lookups in batches of 50 (500 if you have the
"apihighlimits" right which is commonly granted by the "Bots" group on
movement wikis) with
<https://en.wikipedia.org/w/api.php?action=help&modules=query%2Busers>.

Here's a quick example:
<https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&list=users&format=json&utf8=1&formatversion=2&ususers=Bryan%20Davis%7CBryanDavis%7CBDavis%20(WMF)%7Cbd808>

The results will look something like:
```
{
   "batchcomplete": true,
   "query": {
       "users": [
           {
               "name": "Bryan Davis",
               "missing": true
           },
           {
               "userid": 2619078,
               "name": "BryanDavis"
           },
           {
               "userid": 19474624,
               "name": "BDavis (WMF)"
           },
           {
               "userid": 24257381,
               "name": "Bd808"
           }
       ]
   }
}
```

Bryan
--
Bryan Davis              Technical Engagement      Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/