topic: removed semantic-web, guile, and added wikidata-tech.
Let's move the conversation to wikidata-tech@
Please remove wikidata(a)lists.wikimedia.org next time you reply.
Le dim. 22 déc. 2019 à 23:35, Ted Thibodeau Jr
<tthibodeau(a)openlinksw.com> a écrit :
>
>
> On Dec 22, 2019, at 03:17 PM, Amirouche Boubekki <amirouche.boubekki(a)gmail.com> wrote:
> >
> > Hello all ;-)
> >
> >
> > I ported the code to Chez Scheme to do an apple-to-apple comparison
> > between GNU Guile and Chez and took the time to launch a few queries
> > against Virtuoso available in Ubuntu 18.04 (LTS).
>
> Hi, Amirouche --
>
> Kingsley's points about tuning Virtuoso to use available
> RAM [1] and other system resources are worth looking into,
> but a possibly more important first question is --
>
> Exactly what version of Virtuoso are you testing?
>
> If you followed the common script on Ubuntu 18.04, i.e., --
>
> sudo apt update
>
> sudo apt install virtuoso-opensource
>
> -- then you likely have version 6.1.6 of VOS, the Open Source
> Edition of Virtuoso, which shipped 2012-08-02 [2], and is far
> behind the latest version of both VOS (v7.2.5+) and Enterprise
> Edition (v8.3+)!
>
> The easiest way to confirm what you're running is to review
> the first "paragraph" of output from the command corresponding
> to the name of your Virtuoso binary --
>
> virtuoso-t -?
$ virtuoso-t -?
Virtuoso Open Source Edition (multi threaded)
Version 6.1.6.3127-pthreads as of Feb 6 2018
>
> virtuoso-iodbc-t -?
>
I do not have that command. I use isql-vt:
$ isql-vt --help
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
> If I'm right, and you're running 6.x, you'll get much better
> test results just by running a current version of Virtuoso.
>
> You can build VOS 7.2.6+ from source [3] (we'd recommend the
> develop/7 branch [4] for the absolute latest), or download a
> precompiled binary [5] of VOS 7.2.5.1 or 7.2.6.dev.
>
> You can also try Enterprise Edition at no cost for 30 days [5].
>
Next round I will try the develop branch.
Like I said, previously, somewhere, those benchmark must be taken with
a grain of salt:
For one, the Virtuoso timings are reported by Virtuoso. Second,
nomuofu side, I do not convert the internal representation into the
external representation, third and most important point, this is just
a glimpse into the full picture.
My mails are mainly trying to spark some interest or discussion with
wikidata and wikimedia, so that I can work full time on this. I
already described my intents, that is to create a benchmark tool based
wikidata SPARQL logs [*], then use those to reallistically benchmark
Virtuoso, the current solution and a new solution (nomunofu) that I am
working on.
[*] https://iccl.inf.tu-dresden.de/web/Wissensbasierte_Systeme/WikidataSPARQL/en
Raw benchmarks would not tell all the thruth, because nomunofu can
rely on both WiredTiger and FoundationDB, which, as far as I know,
claim stronger guarantees than Virtuoso. The only way to know whether
Virtuoso is comparable to FoundationDB or WiredTiger, will be for
Virtuoso to pass the Jespen harness tests (https://jepsen.io/).
I did not put all the eggs in the same basket, I am considering other
options. But I think working for wikimedia by contract or permanent
position would be best overall.
I will make another WDQS proposal, based on some feedback I have been
given on IRC to add more technical details (and improve the road map).
>
> [1] http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning
>
> [2] http://vos.openlinksw.com/owiki/wiki/VOS/VOSNews2012#2012-08-02%20--%20Anno….
>
> [3] http://vos.openlinksw.com/owiki/wiki/VOS/VOSBuild
>
> [4] https://github.com/openlink/virtuoso-opensource/tree/develop/7
>
> [5] https://sourceforge.net/projects/virtuoso/files/virtuoso/
>
>
>
>
>
> > Spoiler: the new code is always faster.
> >
> > The hard disk is SATA, and the CPU is dubbed: Intel(R) Xeon(R) CPU
> > E3-1220 V2 @ 3.10GHz
> >
> > I imported latest-lexeme.nt (6GB) using guile-nomunofu, chez-nomunofu
> > and Virtuoso:
> >
> > - Chez takes 40 minutes to import 6GB
> > - Chez is 3 to 5 times faster than Guile
> > - Chez is 11% faster than Virtuoso
>
>
> How did you load the data? Did you use Virtuoso's build-load
> facilities? This is the recommended method [6].
>
> [6] http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader
>
>
> > Regarding query time, Chez is still faster than Virtuoso with or
> > without cache. The query I am testing is the following:
> >
> > SELECT ?s ?p ?o
> > FROM <http://fu>
> > WHERE {
> > ?s <http://purl.org/dc/terms/language> <http://www.wikidata.org/entity/Q150> .
> > ?s <http://wikiba.se/ontology#lexicalCategory>
> > <http://www.wikidata.org/entity/Q1084> .
> > ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o
> > };
> >
> > Virtuoso first query takes: 1295 msec.
> > The second query takes: 331 msec.
> > Then it stabilize around: 200 msec.
> >
> > chez nomunofu takes around 200ms without cache.
> >
> > There is still an optimization I can do to speed up nomunofu a little.
> >
> >
> > Happy hacking!
>
>
> I'll be interested to hear your new results, with a current build,
> and with proper INI tuning in place.
What will be the INI options I need to use? Thanks!
>
> Regards,
>
> Ted
>
>
>
> --
> A: Yes. http://www.idallen.com/topposting.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr. // voice +1-781-273-0900 x32
> Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com
> // http://twitter.com/TallTed
> OpenLink Software, Inc. // http://www.openlinksw.com/
> 20 Burlington Mall Road, Suite 322, Burlington MA 01803
> Weblog -- http://www.openlinksw.com/blogs/
> Community -- https://community.openlinksw.com/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter -- http://twitter.com/OpenLink
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
>
>
>
Regards,
Amirouche ~ zig ~ https://hyper.dev
Hi,
as some of you might know I wrote several tools for Wikidata, most recently:
MachtSinn[1] – a tool to generate Senses for Lexemes from wikidata-items
LexData[2] – a small python library to allow for easy editing of Lexemes
Those two got more appreciation than I would have expected. That’s
great, but it also means more bug-reports, more feature wishes and more
Pull Requests. I currently write my thesis and therefore cannot put much
time into these projects.
At the same time I feel bad letting these projects unmaintained for the
next three months and disappointing enthusiastic users.
Therefore I’m looking for people who would be interested in contributing
and co-maintaining these projects. It’s python code with a javascript
frontend for MachtSinn. If you are interested, write me a mail or create
an issue on github.
Cheers,
User:MichaelSchoenitzer
[1] https://tools.wmflabs.org/machtsinn/
[2] https://github.com/Nudin/LexData
--
Michael F. Schönitzer
Henrik-Ibsen-Str. 2
80638 München
Mail: michael(a)schoenitzer.de
Jabber: schoenitzer(a)jabber.ccc.de
Tel: 089/37918949 - Mobil: 017657895702
Hello,
apologies for cross-posting!
I am writing to inform you that at Wikimedia Deutschland we have decided to
stop having weekly Technical Advice IRC Meetings (TAIM)[1] at
#wikimedia-tech after the final meeting on December 18th. In other words,
Technical Advice IRC Meetings will not continue in 2020.
We realized that the regular IRC meeting format has achieved the limits of
its reach, and it is time for us to think about other means to better
support as many volunteer developers as possible contributing to Wikimedia
software.
At WMDE we’ll be dedicating our efforts in 2020 to constantly improve
documentation of our products, including Wikidata and Wikibase, to allow
easier usage and contribution. Our engineering teams are also exploring new
possibilities to support volunteer developers in a more asynchronous way
than regular IRC meetings. If you have any suggestions for this, please let
us know on our talk page.[2]
On this point we would like to send the warmest thank you to all the people
who hosted TAIM meetings, all participants asking and answering questions,
and all supporters of TAIM over the years.
Have a great seasonal break and see you around in 2020!
On behalf of TAIM crew at WMDE
Johanna
[1] https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
[2] https://www.mediawiki.org/wiki/Talk:Technical_Advice_IRC_Meeting
--
Johanna Strodt
Project Manager for Community Communication / Technical Wishes Project
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
https://wikimedia.de
Hello all,
We recently made some changes to the way the Wikidata Query Service UI (
https://query.wikidata.org/) loads the example queries
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples>.
This change can impact people who are maintaining these queries, as well as
people running their own Wikibase instance including the Query Service.
Before the change, our approach was to get the HTML of that wiki page from
Parsoid <https://www.mediawiki.org/wiki/Parsoid> (this includes some
template metadata which the normal parser output doesn’t include), and from
that extract the query parameters of all {{SPARQL}} and {{SPARQL2}}
transclusions.
With our improved approach, we get the HTML of that wiki page from the
parser <https://www.mediawiki.org/wiki/API:Parsing_wikitext>, and from that
extract the contents of all syntax highlighted blocks.
The improvements resulting from this change are the following:
-
The queries no longer have to be specified directly on the page using
{{SPARQL}} or {{SPARQL2}}; they can be transcluded indirectly, e.
g. using {{query
page}}
<https://www.wikidata.org/wiki/Template:Query_page#Transclusion_usage>.
You can see a comparison at
User:TweetsFactsAndQueries/Queries-test-transclude
<https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-trans…>
and User:TweetsFactsAndQueries/Queries-test-copy
<https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-copy>.
If we go with the solution of one query per page, we should be aware that
we can fit less queries on the examples page before we hit some parser
limits.
-
Examples can be loaded from wikis that don’t have Parsoid / VisualEditor
installed, making it much easier for third-party setups to manage their own
lists of examples.
-
Queries that contained an unescaped pipe character (|) were previously
cut off at that character in the query service UI, this should now be fixed
and all queries should be displayed just like on the wikipage.
-
If the examples page hits some limit of the parser, then some examples
will not be loaded, whereas with the previous approach they would still be
loaded and shown on the query service UI even though they weren’t working
correctly on the wiki page.
Configuration changes for other Wikibase instances: third-party setups may
have to update their configuration (custom-config.json). In the 'examples'
object, the 'endpoint' (pointing to the REST API for Parsoid) has been
replaced with the 'apiPath' (the path to api.php after the 'server';
related to $wgScriptPath
<https://www.mediawiki.org/wiki/Manual:$wgScriptPath>, but without a
leading slash [should instead be at the end of the 'server'] and including
the /api.php at the end).
If you encounter any issues with the examples page or while configuring
your own Query Service instance, please let us know by adding a comment under
this task <https://phabricator.wikimedia.org/T174298>.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Forwarding as this will also be relevant for people who consume Wikidata
XML dumps (but not entity dumps), and especially for people who are
interested in working with Structured Data on Commons from dumps.
---------- Forwarded message ---------
Von: Ariel Glenn WMF <ariel(a)wikimedia.org>
Date: Mi., 27. Nov. 2019 um 14:39 Uhr
Subject: [Wikitech-l] BREAKING CHANGE: schema update, xml dumps
To: Wikipedia Xmldatadumps-l <Xmldatadumps-l(a)lists.wikimedia.org>,
Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
We plan to move to the new schema for xml dumps for the February 1, 2020
run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means
that, for example, the commonswiki dump will contain MediaInfo information
as well as the usual wikitext. See
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs…
for the schema and
https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multi…
for further explanation and example outputs.
Phabricator task for the update: https://phabricator.wikimedia.org/T238972
PLEASE FORWARD to other lists as you deem appropriate. Thanks!
Ariel Glenn
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Lucas Werkmeister (he/er)
Full Stack Developer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
This message is important to everyone running an instance of Wikibase
including the Query Service GUI.
We just released a new version of the Wikidata Query Service GUI. This
release is primarily to fix several security issues described in T238822
<https://phabricator.wikimedia.org/T238822> and T238824
<https://phabricator.wikimedia.org/T238824> (these tasks will be made
public soon). These are different from the previous fix we deployed on
November 7th. The fix has been successfully deployed for the Wikidata Query
Service.
In order to keep your instance safe, please make sure to update your Query
Service GUI!
Git repositories, releases and currently active version docker images also
include the latest fixed code (see links below). If you have a local test
setup using the docker-compose example then see:
https://gist.github.com/addshore/36f8d6fe2331d28ca8f70df5abda20fd
Gerrit repositories:
-
https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/553311/
-
https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui-deploy/+/553313/
Docker images:
-
latest: digest:
sha256:6570acb916b429f10ccb3bf3479b66aa6697b3fb3982166a09aba87eeaba7c90
-
legacy: digest:
sha256:4503257bbe1744ce389f07f6dcbaf53db7569cc3e570e30dd5a85c8d0073a39d
If you have any questions or issues updating your code, please let us know
(you can write me an email, or ask in the Wikibase Telegram group
<https://t.me/joinchat/HGjGexZ9NE7BwpXzMsoDLA>)
Thanks for your understanding,
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hi,
I hope this is the right mailing list to discuss this issue.
Some time ago I ran into a series of temporary bans, I thought I managed to
tackle this basically by doing a full stop once it gets any response header
code other than 200.
However, this seems not to have fixed it, since I received the following
message:
"requests.exceptions.HTTPError: 403 Client Error: You have been banned
until 2019-10-18T10:21:36.495Z, please respect throttling and retry-after
headers. for url: https://query.wikidata.org/sparql"
I am looking into this from scratch and see if I can implement a better
solution and certainly one that really respects the retry-after time
instead of going full stop.
Whatever I try now, I keep getting 200 headers and I don't want to start an
excessive bot run to get into a ban state to see the exact header that the
bot needs to respect.
Is there an example of such a header which I can use to make my own test
script?
Or is there example python could that successfully deals with a retry-after
header?
Regards,
Andra
Hello all,
-and sorry for cross-posting-
This message is important to everyone running an instance of Wikibase
including the Query Service GUI.
We just released a new version of the Wikidata Query Service GUI. This
release is primarily to fix a security issue described in T233213
<https://phabricator.wikimedia.org/T233213> (hidden task, will be made
public soon). The fix has been successfully deployed for the Wikidata Query
Service.
In order to keep your instance safe, please make sure to update your Query
Service GUI!
Git repositories, releases and currently active version docker images also
include the latest fixed code (see links below). If you have a local test
setup using the docker-compose example then see:
https://gist.github.com/addshore/36f8d6fe2331d28ca8f70df5abda20fd
Gerrit repositories:
-
wikidata/query/gui after commit d9f964b88c01748e278ca8c4b8929a8ef0ef0267
-
wikidata/query/gui-deploy after commit
7445472ab0ec61890b42e4d524416fbc6a18aa8a
-
wikidata/query/deploy after 094d9cda98f3fb706cf9c25aefa3eb33f9f6999a
Docker images:
-
wdqs:0.3.6 (all versions of this tag)
-
wdqs:latest from digest
sha256:04237b42d0b904a2c49ecb7059c82ace8265ba0b7f690ee2d4b3004ad39517ee
-
wdqs-frontend:latest from digest
sha256:1308a7d6622b1e141783336fb52cd6993973321077f58359fbf907b77e105ca3
-
wdqs-frontend:legacy from digest
sha256:f830abd53fe5e79299011211a2aab7ad947181e56785c06eed6e9bd6b430d4ce
Downloadable releases:
-
https://archiva.wikimedia.org/repository/snapshots/org/wikidata/query/rdf/s…
If you have any questions or issues updating your code, please let us know
(you can write me an email, or ask in the Wikibase Telegram group
<https://t.me/joinchat/HGjGexZ9NE7BwpXzMsoDLA>)
Thanks for your understanding,
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.