Hoi,
Lists like these are gold. Yes we are interested in such lists and yes,
they indicate issues that we can solve.
I did the first one.. Moritz Büsgen is now only one person. It is suggested
that bots might merge these. Possibly, it is one of those issues where the
community may have an opinion.
It would be cool to have a workflow where items that have been resolved
disappear. By fixing the first one, other people will not know that it was
fixed..
Thanks,
GerardM
On 23 December 2015 at 23:05, Proffitt,Merrilee <proffitm(a)oclc.org> wrote:
Hello colleagues,
During the most recent VIAF harvest we encountered a number of duplicate
records in Wikidata. Forwarding on in case this is of interest (there is an
attached file – not sure if that will go through on this list or not).
Some discussion from OCLC colleagues is included below.
Merrilee Proffitt, Senior Program Officer
OCLC Research
*From:* Toves,Jenny
*Sent:* Tuesday, December 22, 2015 6:02 AM
*To:* Proffitt,Merrilee
*Subject:* FW: 201551 vs 201552
Good morning Merrilee,
You probably know that we harvest wikidata monthly for ingest into VIAF.
This month we found 315 pairs of records that appear to be duplicates. That
was a jump from previous months. I am not sure who would be interested in
this but Thom & I thought you might be. The attached report has 630 lines
showing what viaf saw as duplicates. So this pair of lines:
WKP|Q21518392 =998 $aCharles du Bois
Larbalestier$2WKP|Q21341290$3duplicate
WKP|Q21341290 =998 $aCharles du Bois
Larbalestier$2WKP|Q21518392$3duplicate
Shows that those two wikidata numbers are linked to one another by viaf.
I don’t think we expect you to do anything with this unless you find it
interesting. I suspect there are bots to clean this stuff up but maybe not.
--Jenny.
*From:* Hickey,Thom
*Sent:* Monday, December 21, 2015 9:47 PM
*To:* Toves,Jenny
*Subject:* RE: 201551 vs 201552
She probably would be interested.
--Th
*From: *Toves,Jenny <tovesj(a)oclc.org>
*Sent: *Monday, December 21, 2015 9:35 PM
*To: *Hickey,Thom <hickey(a)oclc.org>
*Subject: *RE: 201551 vs 201552
Exact same name + dates. Do you a list of them? Do you think Merrilee or
anyone would be interested?
*From:* Hickey,Thom
*Sent:* Monday, December 21, 2015 8:04 PM
*To:* Toves,Jenny
*Subject:* FW: 201551 vs 201552
Noticed WKP duplicates went way up
--Th
*From: *Jenny Toves <toves(a)orhddb01dxdu.dev.oclc.org>
*Sent: *Monday, December 21, 2015 5:12 PM
*To: *Hickey,Thom <hickey(a)oclc.org>rg>; Toves,Jenny <tovesj(a)oclc.org>
*Subject: *201551 vs 201552
REPORT for records
Changed 13.51%: geographic 3369217.0 -> 3824513.0
Change in % of 8: NLR at_least_one_match 16% -> 24%
Changed 19.83%: NLR all_matches 181437.0 -> 217423.0
Change in % of 88: NLR with_bibs 0% -> 88%
Changed 17.99%: WKP geographic 2529990.0 -> 2985194.0
Changed -19.95%: WKP corporate 369224.0 -> 295579.0
REPORT for matches
Changed 12.70%: exact corporate name 1021239.0 -> 1150899.0
Changed 14.29%: XR viafid 7.0 -> 8.0
Changed -10.42%: XR expression title to sibling 48.0 -> 43.0
Changed -16.16%: PTBNP forced 229.0 -> 192.0
Changed -37.50%: NSZL forced 8.0 -> 5.0
Changed 38.46%: NLP suggested 13.0 -> 18.0
No longer zero: NLR standard number 0.0 -> 21479.0
No longer zero: NLR exact title 0.0 -> 5166.0
No longer zero: NLR partial date and partial title 0.0 -> 618.0
No longer zero: NLR name as subject 0.0 -> 62.0
No longer zero: NLR partial title and publisher 0.0 -> 88.0
No longer zero: NLR title 0.0 -> 5093.0
Changed -47.66%: NLR forced single date 37125.0 -> 19430.0
Changed 14.29%: NLR viafid 14.0 -> 16.0
No longer zero: NLR partial date and publisher 0.0 -> 15894.0
No longer zero: NLR joint author 0.0 -> 5228.0
Changed -14.49%: LC suggested 7594.0 -> 6494.0
Changed 33.33%: CYT viafid 12.0 -> 16.0
Changed -21.08%: NLA forced 223.0 -> 176.0
Changed 233.33%: LNL forced 3.0 -> 10.0
Changed 12.50%: NLB viafid 8.0 -> 9.0
Changed 16.67%: NLB ngram corporate name 6.0 -> 7.0
Changed 25.71%: VLACC forced 35.0 -> 44.0
Changed 19.13%: DNB exact corporate name 315872.0 -> 376304.0
Changed 14.29%: DNB expression title to sibling 7.0 -> 8.0
Changed 16.67%: BNF expression title to sibling 6.0 -> 7.0
Changed 15.91%: ICCU forced 44.0 -> 51.0
Changed 25.54%: NTA forced 9699.0 -> 12176.0
Changed 28.62%: WKP exact corporate name 224787.0 -> 289112.0
Changed 23.73%: WKP longer corporate name 76057.0 -> 94106.0
Changed 584.78%: WKP duplicate record 92.0 -> 630.0
Changed -18.92%: EGAXA forced 37.0 -> 30.0
REPORT for tags
Changed 11.56%: NSZL work links (993) 225.0 -> 251.0
No longer zero: NLR wrote about (955) 0.0 -> 106.0
No longer zero: NLR bibs (999) 0.0 -> 108202.0
No longer zero: NLR was a subject (960) 0.0 -> 16423.0
No longer zero: NLR relator code (941) 0.0 -> 103950.0
No longer zero: NLR language of work (940) 0.0 -> 108193.0
No longer zero: NLR issn (902) 0.0 -> 34.0
No longer zero: NLR bib title (910) 0.0 -> 107895.0
No longer zero: NLR joint corporate author (951) 0.0 -> 24235.0
Changed 146.67%: NLR compared (996) 27448.0 -> 67705.0
No longer zero: NLR rectype + biblvl (944) 0.0 -> 108194.0
No longer zero: NLR country of publication (922) 0.0 -> 108169.0
No longer zero: NLR publisher (921) 0.0 -> 93904.0
No longer zero: NLR isbn (901) 0.0 -> 78978.0
No longer zero: NLR publisher id (920) 0.0 -> 78978.0
Changed 50.05%: NLR matched (998) 19864.0 -> 29806.0
No longer zero: NLR name from statement of responsibility (930) 0.0 ->
72478.0
No longer zero: NLR noise title (912) 0.0 -> 3543.0
No longer zero: NLR lc class number (942) 0.0 -> 1.0
No longer zero: NLR joint author (950) 0.0 -> 69048.0
No longer zero: NLR was a subject (969) 0.0 -> 115.0
Changed -14.29%: XA work links (993) 7.0 -> 6.0
No longer zero: SRP work links (993) 0.0 -> 1.0
Changed 22.50%: BNL work links (993) 551.0 -> 675.0
Changed 11.16%: WKP auth title (919) 45779.0 -> 50890.0
Changed 12.56%: WKP noise title (912) 8249.0 -> 9285.0
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata