Thanks Magnus for the pointers, the mix'n'match data are like jewels for StrepHit. Does the database you mentioned also contain the **body** of the catalogs items, i.e., the raw text of biographies? If so, I can avoid scraping all those sources, and it would be just perfect.
As a side note, I'm currently outreaching GLAM people to ask for more biographical sources: links are coming, and I'll definitely import them into mix'n'match as well.
I was aware of the Sourcerer tool: I'm concerned with those references coming from Wikipedia articles though, since they stem from inside a Wikimedia project, and I want to make sure that everything comes from the outside. I'm open for discussion with the community about this.
What do you think?
Cheers,
Marco
On 1/21/16 13:00, wikidata-request@lists.wikimedia.org wrote:
Date: Wed, 20 Jan 2016 16:44:46 +0000 From: Magnus Manskemagnusmanske@googlemail.com To: "Discussion list for the Wikidata project." wikidata@lists.wikimedia.org Subject: Re: [Wikidata] Mix'n'match tool catalogues data Message-ID: CAGHUEtaLAz3Ofx93oLV45FY4SGFfSt+OsLZ9FYT5x1Yvbbbm9w@mail.gmail.com Content-Type: text/plain; charset="utf-8"
I also have a bot that can add references from various web sources: https://bitbucket.org/magnusmanske/wikidata-todo/src/f56dfdaaaee053abaadefb5...
Edits so far: https://www.wikidata.org/wiki/Special:Contributions/SourcererBot
On Wed, Jan 20, 2016 at 4:42 PM Magnus Manskemagnusmanske@googlemail.com wrote:
Hi Marco,
I run this tool. Quick answers:
- Yes. If you have a Labs account, you can see everything in database
s51434__mixnmatch_p . You can also get most of the data via the API (undocumented; ask me for specifics, check out the requests of the interface in the browser, or try the source code at https://bitbucket.org/magnusmanske/mixnmatch/src/63c9ba58dd236e0aeb5a7ad1231... )
- Anyone can match entries to Wikidata items. I added most of the
catalogs, but you can also do that yourself at https://tools.wmflabs.org/mix-n-match/import.php .
Cheers, Magnus
On Wed, Jan 20, 2016 at 4:31 PM Marco Fossatifossati@spaziodati.eu wrote:
Hi everyone,
The mix'n'match tool [1] provides a list of catalogues from different sources with lots of biographical data.
The list seems like a great starting point for the selection of reliable sources that would feed the StrepHit pipeline [2].
I was wondering 2 things:
- Is it possible to directly access those datasets?
- Who are the contributors that maintain the list?
If you are involved into this effort, please get in touch with me. Cheers,
Marco
[1]https://tools.wmflabs.org/mix-n-match/ [2]
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val...
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Thu, Jan 21, 2016 at 3:12 PM Marco Fossati fossati@spaziodati.eu wrote:
Thanks Magnus for the pointers, the mix'n'match data are like jewels for StrepHit. Does the database you mentioned also contain the **body** of the catalogs items, i.e., the raw text of biographies? If so, I can avoid scraping all those sources, and it would be just perfect.
Sorry, no text body. Just these brief one-line descriptions alone are touching a legally grey area (especially in Europe)...
As a side note, I'm currently outreaching GLAM people to ask for more biographical sources: links are coming, and I'll definitely import them into mix'n'match as well.
Excellent!
I was aware of the Sourcerer tool: I'm concerned with those references coming from Wikipedia articles though, since they stem from inside a Wikimedia project, and I want to make sure that everything comes from the outside.
The Sourcerer references do NOT come from Wikipedia! I am using third-party sites for which we already have IDs (e.g. GND) to auto-validate values, and add the appropriate reference if identical. Basically, what you want to do, on the cheap ;-)
I'm open for discussion with the community about this.
What do you think?
Cheers,
Marco
On 1/21/16 13:00, wikidata-request@lists.wikimedia.org wrote:
Date: Wed, 20 Jan 2016 16:44:46 +0000 From: Magnus Manskemagnusmanske@googlemail.com To: "Discussion list for the Wikidata project." wikidata@lists.wikimedia.org Subject: Re: [Wikidata] Mix'n'match tool catalogues data Message-ID: <
CAGHUEtaLAz3Ofx93oLV45FY4SGFfSt+OsLZ9FYT5x1Yvbbbm9w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
I also have a bot that can add references from various web sources:
https://bitbucket.org/magnusmanske/wikidata-todo/src/f56dfdaaaee053abaadefb5...
Edits so far: https://www.wikidata.org/wiki/Special:Contributions/SourcererBot
On Wed, Jan 20, 2016 at 4:42 PM Magnus Manske<
magnusmanske@googlemail.com>
wrote:
Hi Marco,
I run this tool. Quick answers:
- Yes. If you have a Labs account, you can see everything in database
s51434__mixnmatch_p . You can also get most of the data via the API (undocumented; ask me for specifics, check out the requests of the interface in the browser, or try the source code at
https://bitbucket.org/magnusmanske/mixnmatch/src/63c9ba58dd236e0aeb5a7ad1231...
)
- Anyone can match entries to Wikidata items. I added most of the
catalogs, but you can also do that yourself at https://tools.wmflabs.org/mix-n-match/import.php .
Cheers, Magnus
On Wed, Jan 20, 2016 at 4:31 PM Marco Fossatifossati@spaziodati.eu wrote:
Hi everyone,
The mix'n'match tool [1] provides a list of catalogues from different sources with lots of biographical data.
The list seems like a great starting point for the selection of
reliable
sources that would feed the StrepHit pipeline [2].
I was wondering 2 things:
- Is it possible to directly access those datasets?
- Who are the contributors that maintain the list?
If you are involved into this effort, please get in touch with me. Cheers,
Marco
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val...
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata