[WikiEN-l] extracting protein target infobox information via page export

Fri Jan 21 10:30:02 UTC 2011

On Fri, Jan 21, 2011 at 3:58 AM, Rajarshi Guha <rajarshi.guha at gmail.com> wrote:
>
> On Jan 19, 2011, at 10:19 AM, Carcharoth wrote:
>
>> On Wed, Jan 19, 2011 at 3:10 PM, Andrew Gray <andrew.gray at dunelm.org.uk
>> > wrote:
>>
>> I'm curious as well. I'm also curious as to why the user wants to
>> extract this information, given that they should (going by their
>> signature) have access to databases that already have this sort of
>> information (the sort of databases that should be supplying the
>> information in the Wikipedia infoboxes). There probably is a reason,
>> but I can't immediately think of one.
>
> Partly because Wikipedia has done an aggregation on the multiple data
> sources

It might be better to extract links to the sources, rather than the
actual data itself, which could be in a vandalised state at the time
of extraction. What I guess I'm saying is that the data is better
obtained from the sources, rather than Wikipedia. Or at the least
cross-checking with the sources needs to be done, depending on what
the data will be used for.

Carcharoth