On Thu, Jan 21, 2016 at 3:12 PM Marco Fossati <fossati@spaziodati.eu> wrote:

Thanks Magnus for the pointers, the mix'n'match data are like jewels for
StrepHit.
Does the database you mentioned also contain the **body** of the
catalogs items, i.e., the raw text of biographies?
If so, I can avoid scraping all those sources, and it would be just perfect.

Sorry, no text body. Just these brief one-line descriptions alone are touching a legally grey area (especially in Europe)...

As a side note, I'm currently outreaching GLAM people to ask for more
biographical sources: links are coming, and I'll definitely import them
into mix'n'match as well.

Excellent!

I was aware of the Sourcerer tool: I'm concerned with those references
coming from Wikipedia articles though, since they stem from inside a
Wikimedia project, and I want to make sure that everything comes from
the outside.

The Sourcerer references do NOT come from Wikipedia! I am using third-party sites for which we already have IDs (e.g. GND) to auto-validate values, and add the appropriate reference if identical. Basically, what you want to do, on the cheap ;-)

I'm open for discussion with the community about this.

What do you think?

Cheers,

Marco

On 1/21/16 13:00, wikidata-request@lists.wikimedia.org wrote:
> Date: Wed, 20 Jan 2016 16:44:46 +0000
> From: Magnus Manske<magnusmanske@googlemail.com>
> To: "Discussion list for the Wikidata project."
> <wikidata@lists.wikimedia.org>
> Subject: Re: [Wikidata] Mix'n'match tool catalogues data
> Message-ID:
> <CAGHUEtaLAz3Ofx93oLV45FY4SGFfSt+OsLZ9FYT5x1Yvbbbm9w@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I also have a bot that can add references from various web sources:
> https://bitbucket.org/magnusmanske/wikidata-todo/src/f56dfdaaaee053abaadefb584fcb4f714bc82545/scripts/autosource/botsource.php?at=master&fileviewer=file-view-default
>
> Edits so far:
> https://www.wikidata.org/wiki/Special:Contributions/SourcererBot
>
>
> On Wed, Jan 20, 2016 at 4:42 PM Magnus Manske<magnusmanske@googlemail.com>
> wrote:
>
>> >Hi Marco,
>> >
>> >I run this tool. Quick answers:
>> >
>> >1. Yes. If you have a Labs account, you can see everything in database
>> >s51434__mixnmatch_p . You can also get most of the data via the API
>> >(undocumented; ask me for specifics, check out the requests of the
>> >interface in the browser, or try the source code at
>> >https://bitbucket.org/magnusmanske/mixnmatch/src/63c9ba58dd236e0aeb5a7ad12315047d787530f0/public_html/api.php?at=master&fileviewer=file-view-default
>> >)
>> >
>> >2. Anyone can match entries to Wikidata items. I added most of the
>> >catalogs, but you can also do that yourself at
>> >https://tools.wmflabs.org/mix-n-match/import.php .
>> >
>> >Cheers,
>> >Magnus
>> >
>> >On Wed, Jan 20, 2016 at 4:31 PM Marco Fossati<fossati@spaziodati.eu>
>> >wrote:
>> >
>>> >>Hi everyone,
>>> >>
>>> >>The mix'n'match tool [1] provides a list of catalogues from different
>>> >>sources with lots of biographical data.
>>> >>
>>> >>The list seems like a great starting point for the selection of reliable
>>> >>sources that would feed the StrepHit pipeline [2].
>>> >>
>>> >>I was wondering 2 things:
>>> >>1. Is it possible to directly access those datasets?
>>> >>2. Who are the contributors that maintain the list?
>>> >>
>>> >>If you are involved into this effort, please get in touch with me.
>>> >>Cheers,
>>> >>
>>> >>Marco
>>> >>
>>> >>[1]https://tools.wmflabs.org/mix-n-match/
>>> >>[2]
>>> >>
>>> >>https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>>> >>
>>> >>_______________________________________________
>>> >>Wikidata mailing list
>>> >>Wikidata@lists.wikimedia.org
>>> >>https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >>
>> >

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata