Hmmm, I see one way to do this. I can use tag.span to get the start and end index of the tag, then check that any text between the start of this one and the end of the previous one is empty. Or maybe only contains whitespace?
It also looks like you don’t get the tags in any useful order, but it’s easy enough to re-sort them by tag.span[0].
It’s obvious that wtp.parse() is building some kind of node tree, but I don’t see any way to walk the tree in a general way. You can get the parent of a node, or all the ancestors of a node, but I don’t see a way to get the children of a node.
On Feb 10, 2018, at 3:28 PM, Roy Smith roy@panix.com wrote:
I want to write a tool which checks that you are consistent about ordering of <ref> tags. For example, if I did:
The sky is blue.<ref name=“x”/><ref name=“y”/> The grass is green.<ref name=“y”/><ref name=“x”/>
This would get rendered as:
The sky is blue.[1][2] The grass is green.[2][1]
When what you want is:
The sky is blue.[1][2] The grass is green.[1][2]
Checking for out-of-order citations seems pretty straight-forward. In python, I’m using mwclient to get the wikitext, and wikitextparser to extract the tags from that. It should be trivial to check that they always show up in the same order.
The problem is, when I do:
parsed = wtp.parse(wikitext) tags = parsed.tags()
I get a list of tags with no information about which ones are adjacent to each other. So, while the above example is an error, doing:
The sky is blue.<ref name=“x”/><ref name=“y”/> The grass is green.<ref name=“y”/>. The dirt is brown.<ref name=“x”/>
Will render as:
The sky is blue.[1][2] The grass is green.[2] The dirt is brown.[1]
which is perfectly fine. But, wikitextparser’s tags() doesn’t differentiate between those.
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud