Hi all,
I'm trying to scrape some data from en.wiki about the outlinks from the
body of articles. However, the API returns article outlinks contained
within templates. While I can write a routine to get a list of all the
templates and identify the article links inside these templates to remove
from the outlinks, this is problematic if a link appears in both the body
and a template. Thus if article X has a link to Y in the body as well as
links to Y an Z in templates, I want to capture Y but not Y & Z.
Ideally, I'd like to either (1) be able to count the number of times an
article links out to another article (if X links to Y twice) and then
iterate this count down for each appearance in a template or (2) count only
the links occurring in the body and not parsing the links in templates.
Thank you in advance for your suggestions!
Best,
Brian