Is there anyone who knows more about this project? It looks like Mix'n'match tool.
Mateusz Malinowski
wt., 25 maj 2021, 14:04 użytkownik wikidata-request@lists.wikimedia.org napisał:
Send Wikidata mailing list submissions to wikidata@lists.wikimedia.org
To subscribe or unsubscribe, please visit
https://lists.wikimedia.org/postorius/lists/wikidata.lists.wikimedia.org/
You can reach the person managing the list at wikidata-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Wikidata digest..."
Today's Topics:
- Re: Weekly Summary #469 (Thad Guidry)
Message: 1 Date: Mon, 24 May 2021 11:35:00 -0500 From: Thad Guidry thadguidry@gmail.com Subject: [Wikidata] Re: Weekly Summary #469 To: Discussion list for the Wikidata project wikidata@lists.wikimedia.org Message-ID: <CAChbWaN0xuNT5wmmmQDcMY2A7avG-S+Nv3UsZZ1= 1N9cxG31jA@mail.gmail.com> Content-Type: multipart/alternative; boundary="00000000000064141605c315fe49"
Development
- Designing and planning for the first version of a tool to compare
Wikidata's data against other databases and find mismatches that
might need
fixing
NICE !
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Il giorno mer 26 mag 2021 alle ore 17:52 Mateusz Malinowski bohopicasso@gmail.com ha scritto:
Is there anyone who knows more about this project? It looks like Mix'n'match tool.
Mateusz is referring to:
Development
- Designing and planning for the first version of a tool to compare
Wikidata's data against other databases and find mismatches that might need fixing
Sorry, doing this because I didn't understand at first what project he was referring to, so I thought this might be useful also for other people :)
L.
On Wed, May 26, 2021 at 6:52 PM Luca Martinelli [Sannita] martinelliluca@gmail.com wrote:
Il giorno mer 26 mag 2021 alle ore 17:52 Mateusz Malinowski bohopicasso@gmail.com ha scritto:
Is there anyone who knows more about this project? It looks like Mix'n'match tool.
Mateusz is referring to:
Development
- Designing and planning for the first version of a tool to compare
Wikidata's data against other databases and find mismatches that might need fixing
Sorry, doing this because I didn't understand at first what project he was referring to, so I thought this might be useful also for other people :)
L.
Thanks Luca!
The tool we've been digging into for quite a while now is different from MixnMatch. MixnMatch is meant to help you connect an entry in a catalog with its corresponding Wikidata Item. Once you found the corresponding Item a new external identifier statement can be added to the Item to store this connection permanently. Having these connections is a very very useful base for what we're looking into now. The tool we are envisioning is meant to find the cases where the actual data in an Item does not match the data in another database that also has an entry on it and then help editors to resolve those mismatches. A concrete example would be a person with a date of birth on their Item in Wikidata. They also have an entry in the German National Library. Now we can potentially look at the date of birth in the person's entry in the German National Library and compare it to the date of birth Wikidata has. If they don't match someone should probably have a look at it and see where the problem is and potentially fix it. This is however not a trivial thing and we're still figuring out how exactly it will work so things might shift a bit in the coming weeks.
Cheers Lydia
FYI Lydia
the people at WikiTree https://www.wikidata.org/wiki/Property_talk:P2949 has connected 200 000 people to Wikidata and every week they run a checks of 22 million profiles and has done this I guess since 2016
* checks between WikiTree and WIkidata * WikiTree is an old Wiki version with a lot of customizations... * uploads the diff to Wikidata * the dev cc: Aleš is then also doing some 300 checks of the 22 million profiles * sanity check with > 300 ruleshttps://www.wikitree.com/index.php?title=Category:DD_Suggestions_Help&pageUntil=DBE_734&limit=200#Pages * checking other sites like FindAgrave * create statistics like people connected to Wikidatahttps://wikitree.sdms.si/default.htm?report=stat1&dataID=501&Year=0 * people has produced videoshttps://www.wikitree.com/wiki/Space:Data_Doctors_Project_Video_Collection of the different rules * on every profile in WIkiTree you can check the sanity and see if they have a diff with e.g. Wikidata and/or if Wikidata suggest a relation that is not in WikiTree * they fix errors in Wikidata but I am not 100% sure if we in WIkidata corrects them * my guess we see some errors on Database reports/Constraint violationshttps://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P2949 * all changes done
Here some links
* example how a new run is anmouncedhttps://www.wikitree.com/g2g/1243471/suggestions-news-and-updates-may-23rd-2021 * the space for the 23 may runhttps://www.wikitree.com/wiki/Space:Data_Doctors_Report_2021-05-23 * the Wikidata section of errors * [cid:fb8d58a6-5241-456d-8641-5c0ab34b0dd9]
* total errors with WIkidata 79956 * new errors 530 * hidden I guess is on protected profiles * error 541 is clue for father * https://www.wikitree.com/wiki/Space:DBE_541 * where you have structured sections like Possible Causes, Action Steps, * you have this for about xxx errors * there is also a WIkiTree+ area that is a little bit hardcore * https://wikitree.sdms.si/default.htm?report=err6&Query=&MaxErrors=10... * and on every WikiTree profile you can see the errors related to that persons....
The excellent with the Wikitree people of people doing genealogy is that they have a community feeling and care about the profiles so when we massupload 600 000 maybe not so good profiles in WIkidata then they get chaos... but after some Wikidata related discussionhttps://www.wikitree.com/g2g/tag/wikidata why do we link Wikidata they fix the errors 😉
Regards Magnus Sälgö Stockholm, Sweden
________________________________ From: Lydia Pintscher Lydia.Pintscher@wikimedia.de Sent: Thursday, May 27, 2021 1:57 PM To: Discussion list for the Wikidata project wikidata@lists.wikimedia.org Subject: [Wikidata] Re: Wikidata Digest, Vol 114, Issue 19
Thanks Luca!
The tool we've been digging into for quite a while now is different from MixnMatch. MixnMatch is meant to help you connect an entry in a catalog with its corresponding Wikidata Item. Once you found the corresponding Item a new external identifier statement can be added to the Item to store this connection permanently. Having these connections is a very very useful base for what we're looking into now. The tool we are envisioning is meant to find the cases where the actual data in an Item does not match the data in another database that also has an entry on it and then help editors to resolve those mismatches. A concrete example would be a person with a date of birth on their Item in Wikidata. They also have an entry in the German National Library. Now we can potentially look at the date of birth in the person's entry in the German National Library and compare it to the date of birth Wikidata has. If they don't match someone should probably have a look at it and see where the problem is and potentially fix it. This is however not a trivial thing and we're still figuring out how exactly it will work so things might shift a bit in the coming weeks.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.dehttp://www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-leave@lists.wikimedia.org