On Mar 28, 2023, at 9:09 PM, Kunal Mehta <legoktm@debian.org> wrote:
I suppose it's also worth asking what you're using expand_text() for in the first place, to see if there's a better way to do whatever it is you want to :)

That's a fair question.

What I'm doing is looking at DYK nominations to evaluate if they've been approved.  Like so many wiki things, there's no formal definition, but the simple version is that I'm looking for "File:Symbol confirmed.svg".  The problem is that it may not appear in the raw wikitext.  An example is Bismarck Kuyon.  Looking at the page, it's easy to see the green checkmark indicating approval.  But looking at the wikitext source, there's no such thing.  What there is, is a {{DYK checklist}} template which invokes some Lua code that generates the checkmark based on the values in the other fields.  The expand_text() forces that to get run on the server side.

From a machine-parsability point of view, it's insane.  But I gotta work with what I've been given.

Ultimately, this is going to run as a bot.  That fact that it takes a couple of minutes to evaluate all the nominations of interest isn't critical.   I was doing an interactive web-based version for review purposes, and for that, waiting 2 minutes for the page to load sucked.  But, I don't really need to do that, so I'll probably just go back to the serialized version and leave it at that.

One optimization I can see is that I only really need to do the expand_text() on the subset of nominations which use {{DYK checklist}}, and not even all of those (sometimes it's possible to determine the approval state entirely from the text following the {{DYK checklist}}).  That will add a bit more complexity, which I was trying to avoid.

Even deeper down the complexity rathole, I could re-implement the Lua logic on the client side and avoid the expand_text() completely.  I believe that's what some existing bots, such as WugBot do.  But I really didn't want to go there.

I did a little reading about your mwbot-rs project.  At one point, I was actually kind of excited about Rust and might have joined you just for the excuse to learn it.  Maybe some day.  I am totally about your goal of "sustainable development of bots and tools".  We've got so many tools (some of which important processes like DYK are totally dependent on) which are, frankly, a mess of single-purpose code which can't be easily reused for anything else.  What I've been trying to do with dyk-tools is create a toolkit of reusable components which other people can build upon.  But I seem to be spending most of my time working around silly things like the {{DYK checklist}} stuff.

Anyway, I hope that answers your question :-)

BTW, I've mentioned this before, but I really can't recommend viztracer highly enough as a performance analysis tool.  At one level, it's just cProfile on steroids, but with a snazzy graphical front end.  It's what let me figure out that it was expand(), not get(), which was the most expensive.  I uploaded a screenshot to commons.