I'm trying to parse DYK prep area templates, for example Template:Did you
know/Preparation area 3
<https://en.wikipedia.org/wiki/Template:Did_you_know/Preparation_area_3>.
Unfortunately, these are more like flat text files than any kind of nicely structured
data. The stuff of interest is everything between two HTML comments:
<!--Hooks-->
{{main page image/DYK|image=Melissa Ong.webp|caption=Selfie of Ong, commonly replicated
by the Step Chickens<!--the caption length is intentional, it highlights that this
image is there for a specific purpose and isn't just any image of Ong – please
don't shorten it! Same for the ''(shown)'' –leek -->}}
* ... that "Step Chickens" on TikTok replace their profile pictures with an
image ''(shown)'' of '''[[Melissa Ong]]''', whom
they call "Mother Hen"?
* ... that '''[[interfaith greetings in Indonesia]]''' include
phrases from Islam, Christianity, Hinduism, Buddhism, and Confucianism?
* ... that '''[[Kimmo Leinonen]]''' helped establish both the
[[Finnish Hockey Hall of Fame]] and the [[IIHF Hall of Fame]]?
* ... that the [[Pulitzer Prize for Fiction|Pulitzer Prize]]-winning novel
'''''[[All the Light We Cannot See]]''''' contains
a sympathetic [[Nazism|Nazi]]?
* ... that a {{Convert|10|ft|m|adj=mid|-tall|0}} '''[[Lady Rainier|statue of
a woman]]''' in [[Seattle]] was commissioned by a local brewery in 1903?
* ... that ...
* ... that prior to entering politics, '''[[Herbert
Salvatierra]]''' led a troupe of [[carnival]]
''[[comparsa]]s''?
* ... that [[Winston Churchill]] published '''[[Are There Men on the Moon?|an
essay on extraterrestrial life]]''' during the Second World War?
<!--HooksEnd-->
I can find the comments with Wikicode.filter_comments(). But once I've found the two
delimiting comments, how do I grab the text between them? Or is the parser the wrong
tool? Would I do better to treat the content of the page as flat text and just iterate
over it line by line, teasing it apart with regexes?