Hopefully this is the right mailing list for my topic.
The German Verein für Computergenealogie is the largest genealogical society in Germany with more than 3,700 members. We are currently considering whether Wikibase is a suitable system for us. Most interesting is the use for our *propographical data*.
Prosopographical data can be divided into three classes:
a) well-known and well-studied personalities, typically authors b) lesser-known but well-studied personalities that can be clearly and easily identified in historical sources c) persons whose identifiability in various sources (such as church records, civil record, city directory) has to be established using (mostly manual) record linkage
Data from (a) can be found in the GND of the German National Libarary. For data from class (b) systems such as FactGrid exists. The Verein für Computergenealogie mostly works with data from class (c). We have a huge amount of that kind of data, more than 40 million records. Currently it is stored in several MySQL and MongoDB databases.
This leads me to the crucial question: Is the performance of Wikibase sufficient for such an amount of data? One record for a person will typically result in maybe ten statements in Wikibase. Using QuickStatements or the WDI library I have not been able to insert more than two or three statements per second. It would take month to import the data.
Another question is whether the edit history of the entries can be preserved. For some data set the edit history goes back to 2004.
I hope someone can give me hints on these questions.
Best wishes Jesper
Hi,
I can't speak to the wikibase capabilities directly, but QS via API will always take a bit of time.
One could adapt my Rust core of QuickStatements [1], which also comes with an (experimental but quite advanced) QS syntax parser, generate the JSON for each item, and manually insert it into the `page` table. Parsing and JSON generation would be very fast, and the data addition could be bulk SQL (eg, INSERT thousands of VALUE sets in one command).
Then you'd have to run the metadata update script that comes with MediaWiki/wikibase to get all the links etc. correct and updated. Probably something similar for the SPARQL. And some minor tweaks in the database I guess.
Caveat: It would take a bit of fiddling to adapt the Rust QS to this. Might be worth it as a general solution though, if people would be interested in this.
Cheers, Magnus
[1] https://github.com/magnusmanske/petscan_rs
On Sun, Apr 12, 2020 at 6:18 AM Dr. Jesper Zedlitz jesper@zedlitz.de wrote:
Hopefully this is the right mailing list for my topic.
The German Verein für Computergenealogie is the largest genealogical society in Germany with more than 3,700 members. We are currently considering whether Wikibase is a suitable system for us. Most interesting is the use for our *propographical data*.
Prosopographical data can be divided into three classes:
a) well-known and well-studied personalities, typically authors b) lesser-known but well-studied personalities that can be clearly and easily identified in historical sources c) persons whose identifiability in various sources (such as church records, civil record, city directory) has to be established using (mostly manual) record linkage
Data from (a) can be found in the GND of the German National Libarary. For data from class (b) systems such as FactGrid exists. The Verein für Computergenealogie mostly works with data from class (c). We have a huge amount of that kind of data, more than 40 million records. Currently it is stored in several MySQL and MongoDB databases.
This leads me to the crucial question: Is the performance of Wikibase sufficient for such an amount of data? One record for a person will typically result in maybe ten statements in Wikibase. Using QuickStatements or the WDI library I have not been able to insert more than two or three statements per second. It would take month to import the data.
Another question is whether the edit history of the entries can be preserved. For some data set the edit history goes back to 2004.
I hope someone can give me hints on these questions.
Best wishes Jesper
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata