On Thu, Jan 21, 2016 at 3:12 PM Marco Fossati <fossati(a)spaziodati.eu> wrote:
Thanks Magnus for the pointers, the
mix'n'match data are like jewels for
StrepHit.
Does the database you mentioned also contain the **body** of the
catalogs items, i.e., the raw text of biographies?
If so, I can avoid scraping all those sources, and it would be just
perfect.
Sorry, no text body. Just these brief one-line descriptions alone are
touching a legally grey area (especially in Europe)...
As a side note, I'm currently outreaching GLAM
people to ask for more
biographical sources: links are coming, and I'll definitely import them
into mix'n'match as well.
Excellent!
I was aware of the Sourcerer tool: I'm concerned with those references
coming from Wikipedia articles though, since they stem from inside a
Wikimedia project, and I want to make sure that everything comes from
the outside.
The Sourcerer references do NOT come from Wikipedia! I am using third-party
sites for which we already have IDs (e.g. GND) to auto-validate values, and
add the appropriate reference if identical. Basically, what you want to do,
on the cheap ;-)
I'm open for discussion with the community about
this.
What do you think?
Cheers,
Marco
On 1/21/16 13:00, wikidata-request(a)lists.wikimedia.org wrote:
Date: Wed, 20 Jan 2016 16:44:46 +0000
From: Magnus Manske<magnusmanske(a)googlemail.com>
To: "Discussion list for the Wikidata project."
<wikidata(a)lists.wikimedia.org>
Subject: Re: [Wikidata] Mix'n'match tool catalogues data
Message-ID:
<
CAGHUEtaLAz3Ofx93oLV45FY4SGFfSt+OsLZ9FYT5x1Yvbbbm9w(a)mail.gmail.com>
Content-Type: text/plain;
charset="utf-8"
I also have a bot that can add references from various web sources:
https://bitbucket.org/magnusmanske/wikidata-todo/src/f56dfdaaaee053abaadefb…
magnusmanske(a)googlemail.com>
wrote:
> >Hi Marco,
> >
> >I run this tool. Quick answers:
> >
> >1. Yes. If you have a Labs account, you can see everything in database
> >s51434__mixnmatch_p . You can also get most of the data via the API
> >(undocumented; ask me for specifics, check out the requests of the
> >interface in the browser, or try the source code at
> >
https://bitbucket.org/magnusmanske/mixnmatch/src/63c9ba58dd236e0aeb5a7ad123…
> >)
> >
> >2. Anyone can match entries to Wikidata items. I added most of the
> >catalogs, but you can also do that yourself at
> >https://tools.wmflabs.org/mix-n-match/import.php .
> >
> >Cheers,
> >Magnus
> >
> >On Wed, Jan 20, 2016 at 4:31 PM Marco Fossati<fossati(a)spaziodati.eu>
> >wrote:
> >
>> >>Hi everyone,
>> >>
>> >>The mix'n'match tool [1] provides a list of catalogues from
different
>> >>sources with lots of biographical data.
>> >>
>> >>The list seems like a great starting point for the selection of
reliable
>> >>sources that would feed the
StrepHit pipeline [2].
>> >>
>> >>I was wondering 2 things:
>> >>1. Is it possible to directly access those datasets?
>> >>2. Who are the contributors that maintain the list?
>> >>
>> >>If you are involved into this effort, please get in touch with me.
>> >>Cheers,
>> >>
>> >>Marco
>> >>
>> >>[1]https://tools.wmflabs.org/mix-n-match/
>> >>[2]
>> >>
>> >>
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va…
>> >>
>> >>_______________________________________________
>> >>Wikidata mailing list
>> >>Wikidata(a)lists.wikimedia.org
>> >>https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >>
> >
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata