Hi,
I put a "wikidata.jnl" file of almost 60 GB size in the Blazegraph root
directory. When I run a query like "select ?s ?p ?o where {?s ?p ?o} limit
10" through the Blazegraph's query tab I get no results at all. Do I need
to do something for Blazegraph to recognize the database file?
Leandro
Hi,
I have downloaded Blazegraph already compiled from [1]. I also made the
optimizations indicated at [2].
For the loading process I'm following the instructions given in the
"getting-started.md" file that comes in the "docs" folder of the compiled
distribution [1]. That means:
1- Munge the data with: ./munge.sh -f
data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s
2- Start the loading process with: ./loadRestAPI.sh -n wdq -d
`pwd`/data/split
Then the loading process starts with a rate of 84352. However, the rate
has been progressively going down till 3362 after 36 hours of loading.
I'm running the process on a HPC with SSD and I'm giving to the loading
process 3 cores and 120 GB RAM. On the other hand, I notice that the
average processor usage doesn't go up over 1.6 and the maximum RAM usage is
14 GB.
I also saw [3] and I'm running the loading natively (without containers). I
have the difference with [3] that I've reduced the JVM heap to 4GB as [2]
suggested.
So what else could I do to improve the loading performance.
Thanks,
Leandro
[1]
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.wikidata.query.rdf%2…
[2] https://github.com/blazegraph/database/wiki/IOOptimization
[3]
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits…
Dear all,
I am a researcher from Hasselt University performing research on Query
Reverse Engineering in the Context of the Semantic Web [1]. I think that
the Wikidata dataset could be the ideal one to test the algorithms I have
developed. However, due to the limitations of the public SPARQL endpoint
[2], I cannot do this online, so I am setting a standalone instance. I
realize that with my current computing power, it is not possible to perform
the loading process of the dataset to my local Blazegraph instance. Because
of the aforementioned reasons, I kindly request your assistance in order to
be able to download a Blazegraph instance with the dataset loaded in it.
Kind regards,
Leandro Tabares Martín
[1]
https://www.uhasselt.be/UH/Research-groups/en-projecten_DOC/en-project_deta…
[2] https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual
*Apologies for cross-posting*
Hello all,
The Wikidata development team is currently doing some research to
understand better how people access and reuse Wikidata’s data from the code
of their applications and tools (for example through APIs), and how we can
improve the tools to make your workflows easier.
We are running a short survey to gather more information from people who
build tools based on Wikidata’s data. If you would like to participate,
please use this link
<https://docs.google.com/forms/d/e/1FAIpQLSfJ-I_Ib2EOuRVG4XfeUazhXTvgKsjcKhA…>
(Google Forms, estimated fill-in time 5min). If you don’t want to use
Google Forms, you can also send me a private email with your answers. We
would love to get as many answers as possible before June 9th.
The data will be anonymously collected and will only be shared in an
aggregated form.
If you have any questions, feel free to reach out to me directly.
Cheers,
--
Mohammed Sadat Abdulai
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Thank you!
The KG data built in my project should be ultimately used by people more
accustomed to Semantic Web styled IRI. They will be from established SW or
OWL communities, sometimes with their own standards for IRIs. And they'd
like to have them dereferenceable! They can map their ontologies or add
other IDs as needed, I just want to make their life a bit easier and avoid
some unnecessary discussion.
Hi all!
I’m Dan Shick ( https://w.wiki/RDs ), the new technical writer at
Wikimedia Deutschland. My goals are to discover, improve, unify and
round out documentation for the Wikibase & Wikidata development team;
my specific duties are defined by my team leadership and the
leadership of both products.
I see a lot of documentation out there, and it needs organizing so
that people of every audience can find the information they’re looking
for. Audiences include volunteers & the community, employees of
Wikimedia Deutschland and independent users of the products, and I see
plenty of overlap between those groups. Perhaps most importantly, if
the documentation someone needs doesn’t exist, I want to see it get
written.
My first task is to collect and improve the Wikibase post-install
documentation. I have a lot of resources already on the table, but of
course I welcome pointers to and feedback on any and all existing
documentation.
You'll find this text on my wiki page as well; if you want to say hi
or have any questions or comments, feel free to shoot me an email or
speak up on my talk page.
Wiki: https://meta.wikimedia.org/wiki/User:Dan_Shick_(WMDE)
Phabricator: https://phabricator.wikimedia.org/p/danshick-wmde/
--
Dan Shick
Technical Writer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0 (reception)
https://wikimedia.de
Stay up to date with news and stories about Wikimedia, Wikipedia and
free knowledge by subscribing to our (German) newsletter.
We envision a world where all human beings can freely share in the sum
of all knowledge. Help us achieve that vision! Donate at
https://spenden.wikimedia.de .
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e.
V. Eingetragen im Vereinsregister des Amtsgerichts
Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig
anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207.
Hi,
I wonder if there is any guidance about how to poll the recent changes
feed of a MediaWiki instance (in particular of a Wikibase one) to keep
up with its stream of edits? In particular, how to do this responsibly
(without hammering the server) and how to ensure that all changes are
seen by the consumer?
EditGroups (https://tools.wmflabs.org/editgroups/) currently uses the
WMF Event Stream to do this, which works well but has the downside of
not being available for non-WMF wikis, and the lack of server-side
filtering support, so I have been looking into implementing recent
changes polling in it, so it can be run on other wikis.
So far it looks like my RC polling strategy misses some edits that the
WMF Event Stream includes, so I need to improve this. RC polling is
implemented in the WDQS updater here:
https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/…
Is this the best implementation to look at?
And actually - is this really worth doing? Perhaps I should instead
require that the target Wikibase runs the EventLogging extension
(https://www.mediawiki.org/wiki/Extension:EventLogging) which exposes
the edit stream in a Kafka instance, and then implement a Kafka topic
consumer in EditGroups. It does add requirements on the Wikibase
instance, but if RC polling is brittle, it would be wrong to promise
that EditGroups can be run off a stock MediaWiki instance anyway.
(Note that I still think EditGroups is not a long-term solution. We need
a MediaWiki extension to replace it:
https://phabricator.wikimedia.org/T203557. I am just looking into this
to help our OpenRefine GSoC intern Lu Liu who will be working on
Wikibase support in OpenRefine this summer.)
Cheers,
Antonin
Hello all,
This is a follow-up on previous announcements related to the migration of
wb_terms table (March 23rd
<https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2020/03#Importa…>,
April 6th
<https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2020/04#wb_term…>)
that concerns tool developers querying the database replicas on Labs.
As the last step of the migration, and as mentioned in the previous
discussions, the table wb_terms_no_longer_updated will be dropped. We are
doing this because the last steps of database migration have been achieved
and announced a month ago, and we don’t want to keep a table containing
outdated information accessible for too long. The table has already been
deleted in production (without any impact on tools or users) and the
replicas on Labs will be deleted on April 29th. (Ticket
<https://phabricator.wikimedia.org/T248086>)
If you’re maintaining tools querying the database with the replicas, you
can adapt your code to the new table structure
<https://doc.wikimedia.org/Wikibase/master/php/md_docs_storage_terms.html>
(the former shared link doesn't work anymore). You can also have a
look at suggestions
to optimize your queries
<https://www.wikidata.org/wiki/User:Amir_Sarabadani_(WMDE)/Database_normaliz…>.
If you have any questions or issues, feel free to reach out to me. Thanks
for your understanding,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hi everyone,
I hope you’re all having a great day!
I’m super excited to announce that I’ll be joining the software department at
Wikimedia Germany to help advance engagement between the software
development team and the communities using and contributing to
Wikidata/Wikibase.
Together with Léa, Sam and Lydia, I will be liaising with the different
user groups within the Wikibase community to provide information about
software changes and promote a smooth and productive collaboration between
stakeholders. You can share bug reports with us at <
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team>
Please leave a note on my talkpage <
https://www.wikidata.org/wiki/User_talk:Mohammed_Sadat_(WMDE)> or write to
me directly anytime you encounter issues with the Wikibase software so that
I can bring them to the development team:
* mohammed.sadat_ext(a)wikimedia.de
* Telegram (@masssly) and IRC (mabdulai)
Best Regards,
--
Mohammed Sadat Abdulai
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de