Hi,
Wikibase is amazing software, and if some of the tools being developed for Wikidata can be re-purposed for Wikibase, its potential grows even more exponentially. (In the same way that some extensions and the ease of upgrading made Mediawiki a much more desirable CMS than it might have otherwise been.)
I was hoping people could share what their projects are, a little bit about the project and its purpose. In this context, I was hoping people might be willing to share what your priorities are as it relates to Wikibase and meeting project goals.
In case you don't know, Miguel and I are involved with ParaSports Data. ParaSports Data is a disability and disability sports knowledge base that can serve as a powerful resource for academics, NGOs and other stakeholders in the area of disability rights and disability sports. ParaSport Data was conceived in 2016 as a resource for structured data about disability and disability sport, including facts like dates, performance results, disability population sizes, event information, and classification related data. Using Wikipedia and Wikidata as a model, it seeks to be the largest single knowledgebase about disability and disability sport that allows for stakeholders to search, analyze and re-use this data to draw awareness to disability, disability sports and human rights as an extension of those two things. The purpose of this project is to create a data set of Paralympic, Deaflympic, Special Olympics, other disability sports and general disability information for use by researchers.
As it relates to Wikibase, our immediate needs and concerns around these needs probably fall into the following four areas:
1. Upgrading MediaWiki, Wikibase, the Query Engine, and installing Extension:OAuth and Quick Statements. This is not explained anywhere, and the best process as we understand it is outlined at https://www.researchgate.net/publication/329028278_Wikibase_Upgrade_Workflow . It makes it scary to do on our own as there is no process, lack of documentation and it is hard to locate others who have successfully done this. 2. Better optimizing the Wikibase software and query engine so it uses fewer server resources. (This partly ties into point three.) 3. Identifying funding sources to pay for our installation as I currently pay out of pocket for all hosting, and this is not a long term solution. 4. Improving bulk data import on Wikibase to prevent fewer errors, and need to merge items after the fact. This is because either the Wikibase software or the way we bulk import only exact matches item names. If in bulk creating items, an item description is different than the existing one, it creates a new item. Statements are then added to the lowest Q number when adding statements based on an item name match.
What are your priorities as it comes to your own installations? :)
I'm interested in using wikibase for a database for elected and appointed officials at a very local level - down to a level such that they don't even meet Wikidata's relaxed notability guidelines (wikidata isn't interested in who the members were of a small town's zoning commission 10 years ago). I don't have anything online yet, just playing with the docker container versions on my laptop to make sure I know what I'm doing (many thanks to the folks who put those containers together - being able to type two commands and get a working wikibase instance is amazing)
Beyond wikibase as a solid knowledgebase management software, I am especially interested in connecting to wikidata for two reasons:
1. Wikidata has a pretty well-established ontology for my domain - the properties and constraints that describe elected offices, length of terms, details about politicians, membership of people in those offices, etc: all of that likely directly applies to my small-scale database as well, and if there's something else I need, it's likely wikidata will need it too and I can get a community of linked-data/data modelers to figure it out with me. 2. There is overlap between my dataset and what wikidata is tracking - the members of the town's zoning members might not be interesting to wikidata, but wikidata almost certainly has an item for the town. And there's some overlap between the people: long before she was a United States Senator, Tammy Baldwin was a member of the local county board in the late 1980s, so I'd like to be able to link into that. Finally, the qualifiers on domains properties are allowed to come from are an important dataset and are useful in my database as well.
For me, the most important thing to have is rock-solid backup and restore, with detailed, no-question-too-dumb documentation. I'm terrified of putting together a database and have it blow up, and having to reconstruct it. What especially makes me nervous is that Q and P ids are set by wikibase, but they're externally used as well - so if I screw up so bad I have to completely re-import all of my data, if I'm not careful the Qid for a officeholder might chance when I re-load it, so anyone else who has a query using that Qid will be out of luck.
It'd be especially nice to have an example backup of a very small site posted on the web somewhere as a set of example "fixtures" of a handful of items and properties that could optionally be used in conjunction with the docker containers to verify that you've got everything up and running end-to-end, with sample queries and example expected output - given how easy docker makes it to blow everything away and start over, it'd be very nice to be able to bring up a site, modify the data to "experiment", and if I feel like I've gotten myself into trouble, delete it and start over.
I would echo Laura's interest in optimizing server resources - for funding I'm just going to eat the costs with a couple of VMs in the cloud (I'm counting on being able to do it for about $100/month, but I don't know if that's realistic), so the smaller the footprint the better (while still maintaining some HA/disaster recovery capabilities, or at least the ability to restore quickly if a VM crashes hard - I think I'm ok if my site goes down for a while until something reboots, but I don't want to lose data)
Other things I'm interested in is more federation support and examples, so I can more easily reuse properties and items from Wikidata. I think for performance reasons I'd want to be able to import most of them into my instance directly, and not use a literal federation where queries on my site make network calls back to wikidata.org - instead, I'd like to have the wikidata data imported into my instance and into a namespace to keep them separate, and to have a way to keep that up-to-date. I'd also like to only import a subset of wikidata - I want all the properties and constraints around P39 (position held) and I'm going to use them frequently, but I'd rather not import 20 gigs of data about genomes or fungus taxons. I'm not quite sure how to do that - I don't think wikidata neatly separates into a "core wikidata" and "everything else", so I'd guess I just keep recursively walking the graph and pull in things I need.
-Erik
On Mon, Nov 19, 2018 at 5:10 AM Laura Hale laura@fanhistory.com wrote:
Hi,
Wikibase is amazing software, and if some of the tools being developed for Wikidata can be re-purposed for Wikibase, its potential grows even more exponentially. (In the same way that some extensions and the ease of upgrading made Mediawiki a much more desirable CMS than it might have otherwise been.)
I was hoping people could share what their projects are, a little bit about the project and its purpose. In this context, I was hoping people might be willing to share what your priorities are as it relates to Wikibase and meeting project goals.
In case you don't know, Miguel and I are involved with ParaSports Data. ParaSports Data is a disability and disability sports knowledge base that can serve as a powerful resource for academics, NGOs and other stakeholders in the area of disability rights and disability sports. ParaSport Data was conceived in 2016 as a resource for structured data about disability and disability sport, including facts like dates, performance results, disability population sizes, event information, and classification related data. Using Wikipedia and Wikidata as a model, it seeks to be the largest single knowledgebase about disability and disability sport that allows for stakeholders to search, analyze and re-use this data to draw awareness to disability, disability sports and human rights as an extension of those two things. The purpose of this project is to create a data set of Paralympic, Deaflympic, Special Olympics, other disability sports and general disability information for use by researchers.
As it relates to Wikibase, our immediate needs and concerns around these needs probably fall into the following four areas:
- Upgrading MediaWiki, Wikibase, the Query Engine, and installing
Extension:OAuth and Quick Statements. This is not explained anywhere, and the best process as we understand it is outlined at https://www.researchgate.net/publication/329028278_Wikibase_Upgrade_Workflow . It makes it scary to do on our own as there is no process, lack of documentation and it is hard to locate others who have successfully done this. 2. Better optimizing the Wikibase software and query engine so it uses fewer server resources. (This partly ties into point three.) 3. Identifying funding sources to pay for our installation as I currently pay out of pocket for all hosting, and this is not a long term solution. 4. Improving bulk data import on Wikibase to prevent fewer errors, and need to merge items after the fact. This is because either the Wikibase software or the way we bulk import only exact matches item names. If in bulk creating items, an item description is different than the existing one, it creates a new item. Statements are then added to the lowest Q number when adding statements based on an item name match.
What are your priorities as it comes to your own installations? :)
-- twitter: purplepopple _______________________________________________ Wikibaseug mailing list Wikibaseug@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibaseug
wikibaseug@lists.wikimedia.org