I noticed that category_tests.py was failing for me. The breakage was due to this edit on enwiki https://en.wikipedia.org/w/index.php?title=Category:2021_establishments_in_Orissa&diff=prev&oldid=1160600714, and fixed in ce301a2 https://github.com/wikimedia/pywikibot/commit/ce301a2f1de4a2dca4e8e93cc1465c7963a2ebe2 which updates the test to look for the current contents of the page on enwiki.
I'm concerned that we're writing tests which depend on knowing the current contents of a live production wiki.
Hi folks,
Some thoughts on the following from someone who worked on Pywikibot tests a /very/ long time (>10 years...) ago:
On 26/06/2023 00:47, Roy Smith wrote:
I'm concerned that we're writing tests which depend on knowing the current contents of a live production wiki.
On 15/05/2023 21:11, YiFei Zhu wrote:
This way, if a unit test fails, you know it's Pywikibot's code that is to blame and not something like a MediaWiki change or the environment setup.
First of all: tests that randomly fail, and tests that require some not-so-well documented set of steps to run at all are /a pain/. So there's certainly room for improvement there.
On thing to keep in mind is that Pywikibot is a bit weird, as there is only a limited set of things it does /on itself/. The value of Pywikibot lies in the interaction with Mediawiki, and by and large /with the specific version of Mediawiki deployed on Wikimedia wikis (including all the weird extensions)/. And because of that, there is a massive value in tests that test /the integration with Wikimedia wikis/: it's great if mocked API calls work, but if it breaks against the real API, Pywikibot is not doing its job for the user. Unfortunately that does mean the /contents/ of the wiki now are a dependency for the test as well.
The alternative is mocking bits of Pywikbot: either the API requests or the Site object itself. The former is something I've played with in the past by recording (caching) the requests sent to the actual site. Although that works, there are two main issues: you regularly need to update the 'logged' requests (as some may be missing etc), and you run the risk of leaking private data. The second method sounds more appealing, but in practice, you'll be mocking /every/ method. In effect, at that point you're re-implementing the Mediawiki API in Python. Might as well talk to the real one then?
With Docker containers being a fairly standard thing (although still a pain under Windows) spinning up a test container with a well-known MW version and well-known content might be the best of both worlds. A stable development environment while still being able to run the tests against both a 'latest' MW container and a live site (but that can then just be done on the CI server rather than trying to run all local tests on a live site?).
In the end, all approaches have their own share of pain. Live testing requires a fiddly live-site-setup, mocking is brittle, and docker.... has its own steep learning curve. The only true advice I can give here is 'choose your battles' and 'document everything you can, and then some' :-)
Cheers, Merlijn
On Jun 26, 2023, at 4:17 PM, Merlijn van Deen (valhallasw) valhallasw@arctus.nl wrote:
, there is a massive value in tests that test the integration with Wikimedia wikis: it's great if mocked API calls work, but if it breaks against the real API, Pywikibot is not doing its job for the user. Unfortunately that does mean the contents of the wiki now are a dependency for the test as well.
I certainly agree that integration tests are useful. Even essential. But, there's two different things here. One is talking to another piece of software over a network connection vs using a mocked test fixture. Mocks are great for some things, but utterly fail at demonstrating that you actually understand the API you're talking to (which is so often the case with a complex API like MediaWiki).
The orthogonal problem is depending on whatever data happens to be in the database of the thing you're talking to vs setting up the test conditions when you run the test. That's the bigger issue here. It doesn't matter if you're talking unit tests, integration tests, or anything else, If the outcome of your test depends on whatever some random person on the internet happened to do yesterday, it's hopeless.
Overall, I suspect the right path forward is spinning up a live server in a docker container combined with each test instantiate the required content as part of the test setup (and cleaning it up during teardown). Unfortunately, my understanding of docker is rudimentary at best, so engineering such an environment is beyond me.