Here's big data dataset from Google Research and UMass IESL, 40 million "links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page," from 10 million web pages, for the purposes of contextualized disambiguation:
Learning from Big Data: 40 Million Entities in Context http://googleresearch.blogspot.co.uk/2013/03/learning-from-big-data-40-milli...
In hopes that the work might be interesting or useful to the folks here,
Pete