Anders, do you have a citation for “use Wikipedia content considerably”?
Lots of early-ish ML work was heavily dependent on Wikipedia, but state-of-the-art Large Language Models are trained on vast quantities of text, of which Wikipedia is only a small part. ChatGPT does not share their data sources (as far as I know) but the Eleuther.ai project released their Pile a few years back, and that already had Wikipedia as < 5% of the text data; I think it is safe to assume that the percentage is smaller for newer models:
https://arxiv.org/abs/2101.00027
Techniques to improve reliability of LLM output may rely more heavily on Wikipedia. For example, Facebook uses Wikipedia rather heavily in this *research paper*:
https://arxiv.org/abs/2208.03299 But I have seen no evidence that techniques like that are in use by OpenAI, or that they’re specifically trained on Wikipedia. If you’ve seen discussion of that, or evidence from output suggesting it, that’d be interesting and important!