Hi all,

I'm trying to scrape some data from en.wiki about the outlinks from the body of articles. However, the API returns article outlinks contained within templates. While I can write a routine to get a list of all the templates and identify the article links inside these templates to remove from the outlinks, this is problematic if a link appears in both the body and a template. Thus if article X has a link to Y in the body as well as links to Y an Z in templates, I want to capture Y but not Y & Z.

Ideally, I'd like to either (1) be able to count the number of times an article links out to another article (if X links to Y twice) and then iterate this count down for each appearance in a template or (2) count only the links occurring in the body and not parsing the links in templates.

Thank you in advance for your suggestions!

Best,

Brian