Wikidata-tech June 2020

wikidata-tech@lists.wikimedia.org

8 participants
8 discussions

Mediawiki-Wikibase architecture question

by Victor Agroskin

Hi All! I'm trying to figure out possible ways to launch Mediawiki-Wikibase software to allow collaborative creation of wiki pages and corresponding knowledge graph. As well as I understand, it is possible to configure a single installation of Mediawiki with Wikibase extension, and have all wiki pages in the Main namespace like https://example.org/wiki/ and all graph items in the namespace https://example.org/wiki/Item: I want to build something more similar to Wikipedia-Wikidata -- wiki pages in the namespace https://wiki.example.org/wiki/ and wikibase graph in the namespace https://graph.example.org/wiki/ . Am I right that I have to launch two instances of MediaWiki for that, one without Wikibase extension and one with it? Or is there a simpler way to configure the system to get such namespace structure? Thank you for help! Victor Agroskin

3 years, 4 months

Slow loading process

by Leandro Tabares Martín

Dear all, I'm loading the whole wikidata dataset into Blazegraph using a High Performance Computer. I gave 120 GB RAM and 3 processing cores to the job. After almost 24 hours of load the "wikidata.jnl" file has only 28 GB as size. Initially the process was fast, but as the file increased its size the loading speed has decreased. I realize that only 14 GB of RAM are being used. I already implemented the recomendations given in https://github.com/blazegraph/database/wiki/IOOptimization Do you have some other recommendations to increase the loading speed? Leandro

3 years, 9 months

Wikidata history and discussion records

by Elisavet Koutsiana

Hi, I have been studying about wikidata for the last few weeks in order to perform an analysis. My main concern is about the history and the discussion data of every entity. I was wondering whether these information are included in the json files here https://dumps.wikimedia.org/wikidatawiki/entities/ or there are separate files about them. If so, where can I download them? Thanks in advance. Best wishes, Elisavet

3 years, 10 months

Usage of Blazegraph with Wikidata

by Leandro Tabares Martín

Hi, I put a "wikidata.jnl" file of almost 60 GB size in the Blazegraph root directory. When I run a query like "select ?s ?p ?o where {?s ?p ?o} limit 10" through the Blazegraph's query tab I get no results at all. Do I need to do something for Blazegraph to recognize the database file? Leandro

3 years, 10 months

Slow loading process

by Leandro Tabares Martín

Hi, I have downloaded Blazegraph already compiled from [1]. I also made the optimizations indicated at [2]. For the loading process I'm following the instructions given in the "getting-started.md" file that comes in the "docs" folder of the compiled distribution [1]. That means: 1- Munge the data with: ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s 2- Start the loading process with: ./loadRestAPI.sh -n wdq -d `pwd`/data/split Then the loading process starts with a rate of 84352. However, the rate has been progressively going down till 3362 after 36 hours of loading. I'm running the process on a HPC with SSD and I'm giving to the loading process 3 cores and 120 GB RAM. On the other hand, I notice that the average processor usage doesn't go up over 1.6 and the maximum RAM usage is 14 GB. I also saw [3] and I'm running the loading natively (without containers). I have the difference with [3] that I've reduced the JVM heap to 4GB as [2] suggested. So what else could I do to improve the loading performance. Thanks, Leandro [1] http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%2… [2] https://github.com/blazegraph/database/wiki/IOOptimization [3] https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits…

3 years, 10 months

Research on Wikidata dataset

by Leandro Tabares Martín

Dear all, I am a researcher from Hasselt University performing research on Query Reverse Engineering in the Context of the Semantic Web [1]. I think that the Wikidata dataset could be the ideal one to test the algorithms I have developed. However, due to the limitations of the public SPARQL endpoint [2], I cannot do this online, so I am setting a standalone instance. I realize that with my current computing power, it is not possible to perform the loading process of the dataset to my local Blazegraph instance. Because of the aforementioned reasons, I kindly request your assistance in order to be able to download a Blazegraph instance with the dataset loaded in it. Kind regards, Leandro Tabares Martín [1] https://www.uhasselt.be/UH/Research-groups/en-projecten_DOC/en-project_deta… [2] https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual

3 years, 10 months

How do you access Wikidata’s data and how does that work for you?

by Mohammed Sadat Abdulai

*Apologies for cross-posting* Hello all, The Wikidata development team is currently doing some research to understand better how people access and reuse Wikidata’s data from the code of their applications and tools (for example through APIs), and how we can improve the tools to make your workflows easier. We are running a short survey to gather more information from people who build tools based on Wikidata’s data. If you would like to participate, please use this link <https://docs.google.com/forms/d/e/1FAIpQLSfJ-I_Ib2EOuRVG4XfeUazhXTvgKsjcKhA…> (Google Forms, estimated fill-in time 5min). If you don’t want to use Google Forms, you can also send me a private email with your answers. We would love to get as many answers as possible before June 9th. The data will be anonymously collected and will only be shared in an aggregated form. If you have any questions, feel free to reach out to me directly. Cheers, -- Mohammed Sadat Abdulai *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de

3 years, 10 months

Re: [Wikidata-tech] Mediawiki-Wikibase architecture question

by Victor Agroskin

Thank you! The KG data built in my project should be ultimately used by people more accustomed to Semantic Web styled IRI. They will be from established SW or OWL communities, sometimes with their own standards for IRIs. And they'd like to have them dereferenceable! They can map their ontologies or add other IDs as needed, I just want to make their life a bit easier and avoid some unnecessary discussion.

3 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech June 2020