Hello,
We are having issues launching a local copy of wikidata, when we use the
'importDump.php' tool, below the issues that we are facing.
If somebody has an idea of how we could solve this, please let me know. We
are also considering professional services to get fixes for this being
released in case somebody is professionally consulting around wikibase.
Thanks,
Miquel
Here the issues:
if I try to load the full dump, the error I get is:
root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php --conf
../LocalSettings.php
../images/wikidatawiki-20191101-pages-articles-multistream.xml.bz2
Warning: XMLReader::read():
uploadsource://d0cd78c216b067ffdd60946c258db6a7:45: parser error : Extra
content at the end of the document in
/var/www/html/includes/import/WikiImporter.php on line 646
Warning: XMLReader::read(): </siteinfo> in
/var/www/html/includes/import/WikiImporter.php on line 646
Warning: XMLReader::read(): ^ in
/var/www/html/includes/import/WikiImporter.php on line 646
Done!
You might want to run rebuildrecentchanges.php to regenerate RecentChanges,
If I try to load a partial dump, the warnings that I get (which I think
those mean nothing is loading) are:
root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php --conf
../LocalSettings.php
../images/wikidatawiki-20191020-pages-meta-current1.xml-p1p235321.bz2
Revision 1033865598 using content model wikibase-item cannot be stored on
"Q15" on this wiki, since that model is not supported on that page.
Revision 1034542603 using content model wikibase-item cannot be stored on
"Q17" on this wiki, since that model is not supported on that page.
Revision 1032554298 using content model wikibase-item cannot be stored on
"Q18" on this wiki, since that model is not supported on that page.
Revision 1032534215 using content model wikibase-item cannot be stored on
"Q20" on this wiki, since that model is not supported on that page.
Revision 1026713626 using content model wikibase-item cannot be stored on
"Q21" on this wiki, since that model is not supported on that page.
Revision 1023703278 using content model wikibase-item cannot be stored on
"Q22" on this wiki, since that model is not supported on that page.
Revision 1032815802 using content model wikibase-item cannot be stored on
"Q25" on this wiki, since that model is not supported on that page.
Revision 1032910600 using content model wikibase-item cannot be stored on
"Q26" on this wiki, since that model is not supported on that page.
topic: removed semantic-web, guile, and added wikidata-tech.
Let's move the conversation to wikidata-tech@
Please remove wikidata(a)lists.wikimedia.org next time you reply.
Le dim. 22 déc. 2019 à 23:35, Ted Thibodeau Jr
<tthibodeau(a)openlinksw.com> a écrit :
>
>
> On Dec 22, 2019, at 03:17 PM, Amirouche Boubekki <amirouche.boubekki(a)gmail.com> wrote:
> >
> > Hello all ;-)
> >
> >
> > I ported the code to Chez Scheme to do an apple-to-apple comparison
> > between GNU Guile and Chez and took the time to launch a few queries
> > against Virtuoso available in Ubuntu 18.04 (LTS).
>
> Hi, Amirouche --
>
> Kingsley's points about tuning Virtuoso to use available
> RAM [1] and other system resources are worth looking into,
> but a possibly more important first question is --
>
> Exactly what version of Virtuoso are you testing?
>
> If you followed the common script on Ubuntu 18.04, i.e., --
>
> sudo apt update
>
> sudo apt install virtuoso-opensource
>
> -- then you likely have version 6.1.6 of VOS, the Open Source
> Edition of Virtuoso, which shipped 2012-08-02 [2], and is far
> behind the latest version of both VOS (v7.2.5+) and Enterprise
> Edition (v8.3+)!
>
> The easiest way to confirm what you're running is to review
> the first "paragraph" of output from the command corresponding
> to the name of your Virtuoso binary --
>
> virtuoso-t -?
$ virtuoso-t -?
Virtuoso Open Source Edition (multi threaded)
Version 6.1.6.3127-pthreads as of Feb 6 2018
>
> virtuoso-iodbc-t -?
>
I do not have that command. I use isql-vt:
$ isql-vt --help
OpenLink Interactive SQL (Virtuoso), version 0.9849b.
> If I'm right, and you're running 6.x, you'll get much better
> test results just by running a current version of Virtuoso.
>
> You can build VOS 7.2.6+ from source [3] (we'd recommend the
> develop/7 branch [4] for the absolute latest), or download a
> precompiled binary [5] of VOS 7.2.5.1 or 7.2.6.dev.
>
> You can also try Enterprise Edition at no cost for 30 days [5].
>
Next round I will try the develop branch.
Like I said, previously, somewhere, those benchmark must be taken with
a grain of salt:
For one, the Virtuoso timings are reported by Virtuoso. Second,
nomuofu side, I do not convert the internal representation into the
external representation, third and most important point, this is just
a glimpse into the full picture.
My mails are mainly trying to spark some interest or discussion with
wikidata and wikimedia, so that I can work full time on this. I
already described my intents, that is to create a benchmark tool based
wikidata SPARQL logs [*], then use those to reallistically benchmark
Virtuoso, the current solution and a new solution (nomunofu) that I am
working on.
[*] https://iccl.inf.tu-dresden.de/web/Wissensbasierte_Systeme/WikidataSPARQL/en
Raw benchmarks would not tell all the thruth, because nomunofu can
rely on both WiredTiger and FoundationDB, which, as far as I know,
claim stronger guarantees than Virtuoso. The only way to know whether
Virtuoso is comparable to FoundationDB or WiredTiger, will be for
Virtuoso to pass the Jespen harness tests (https://jepsen.io/).
I did not put all the eggs in the same basket, I am considering other
options. But I think working for wikimedia by contract or permanent
position would be best overall.
I will make another WDQS proposal, based on some feedback I have been
given on IRC to add more technical details (and improve the road map).
>
> [1] http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning
>
> [2] http://vos.openlinksw.com/owiki/wiki/VOS/VOSNews2012#2012-08-02%20--%20Anno….
>
> [3] http://vos.openlinksw.com/owiki/wiki/VOS/VOSBuild
>
> [4] https://github.com/openlink/virtuoso-opensource/tree/develop/7
>
> [5] https://sourceforge.net/projects/virtuoso/files/virtuoso/
>
>
>
>
>
> > Spoiler: the new code is always faster.
> >
> > The hard disk is SATA, and the CPU is dubbed: Intel(R) Xeon(R) CPU
> > E3-1220 V2 @ 3.10GHz
> >
> > I imported latest-lexeme.nt (6GB) using guile-nomunofu, chez-nomunofu
> > and Virtuoso:
> >
> > - Chez takes 40 minutes to import 6GB
> > - Chez is 3 to 5 times faster than Guile
> > - Chez is 11% faster than Virtuoso
>
>
> How did you load the data? Did you use Virtuoso's build-load
> facilities? This is the recommended method [6].
>
> [6] http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader
>
>
> > Regarding query time, Chez is still faster than Virtuoso with or
> > without cache. The query I am testing is the following:
> >
> > SELECT ?s ?p ?o
> > FROM <http://fu>
> > WHERE {
> > ?s <http://purl.org/dc/terms/language> <http://www.wikidata.org/entity/Q150> .
> > ?s <http://wikiba.se/ontology#lexicalCategory>
> > <http://www.wikidata.org/entity/Q1084> .
> > ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o
> > };
> >
> > Virtuoso first query takes: 1295 msec.
> > The second query takes: 331 msec.
> > Then it stabilize around: 200 msec.
> >
> > chez nomunofu takes around 200ms without cache.
> >
> > There is still an optimization I can do to speed up nomunofu a little.
> >
> >
> > Happy hacking!
>
>
> I'll be interested to hear your new results, with a current build,
> and with proper INI tuning in place.
What will be the INI options I need to use? Thanks!
>
> Regards,
>
> Ted
>
>
>
> --
> A: Yes. http://www.idallen.com/topposting.html
> | Q: Are you sure?
> | | A: Because it reverses the logical flow of conversation.
> | | | Q: Why is top posting frowned upon?
>
> Ted Thibodeau, Jr. // voice +1-781-273-0900 x32
> Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com
> // http://twitter.com/TallTed
> OpenLink Software, Inc. // http://www.openlinksw.com/
> 20 Burlington Mall Road, Suite 322, Burlington MA 01803
> Weblog -- http://www.openlinksw.com/blogs/
> Community -- https://community.openlinksw.com/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter -- http://twitter.com/OpenLink
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
>
>
>
Regards,
Amirouche ~ zig ~ https://hyper.dev
Hi,
as some of you might know I wrote several tools for Wikidata, most recently:
MachtSinn[1] – a tool to generate Senses for Lexemes from wikidata-items
LexData[2] – a small python library to allow for easy editing of Lexemes
Those two got more appreciation than I would have expected. That’s
great, but it also means more bug-reports, more feature wishes and more
Pull Requests. I currently write my thesis and therefore cannot put much
time into these projects.
At the same time I feel bad letting these projects unmaintained for the
next three months and disappointing enthusiastic users.
Therefore I’m looking for people who would be interested in contributing
and co-maintaining these projects. It’s python code with a javascript
frontend for MachtSinn. If you are interested, write me a mail or create
an issue on github.
Cheers,
User:MichaelSchoenitzer
[1] https://tools.wmflabs.org/machtsinn/
[2] https://github.com/Nudin/LexData
--
Michael F. Schönitzer
Henrik-Ibsen-Str. 2
80638 München
Mail: michael(a)schoenitzer.de
Jabber: schoenitzer(a)jabber.ccc.de
Tel: 089/37918949 - Mobil: 017657895702
Hello,
apologies for cross-posting!
I am writing to inform you that at Wikimedia Deutschland we have decided to
stop having weekly Technical Advice IRC Meetings (TAIM)[1] at
#wikimedia-tech after the final meeting on December 18th. In other words,
Technical Advice IRC Meetings will not continue in 2020.
We realized that the regular IRC meeting format has achieved the limits of
its reach, and it is time for us to think about other means to better
support as many volunteer developers as possible contributing to Wikimedia
software.
At WMDE we’ll be dedicating our efforts in 2020 to constantly improve
documentation of our products, including Wikidata and Wikibase, to allow
easier usage and contribution. Our engineering teams are also exploring new
possibilities to support volunteer developers in a more asynchronous way
than regular IRC meetings. If you have any suggestions for this, please let
us know on our talk page.[2]
On this point we would like to send the warmest thank you to all the people
who hosted TAIM meetings, all participants asking and answering questions,
and all supporters of TAIM over the years.
Have a great seasonal break and see you around in 2020!
On behalf of TAIM crew at WMDE
Johanna
[1] https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
[2] https://www.mediawiki.org/wiki/Talk:Technical_Advice_IRC_Meeting
--
Johanna Strodt
Project Manager for Community Communication / Technical Wishes Project
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
https://wikimedia.de
Hello all,
We recently made some changes to the way the Wikidata Query Service UI (
https://query.wikidata.org/) loads the example queries
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples>.
This change can impact people who are maintaining these queries, as well as
people running their own Wikibase instance including the Query Service.
Before the change, our approach was to get the HTML of that wiki page from
Parsoid <https://www.mediawiki.org/wiki/Parsoid> (this includes some
template metadata which the normal parser output doesn’t include), and from
that extract the query parameters of all {{SPARQL}} and {{SPARQL2}}
transclusions.
With our improved approach, we get the HTML of that wiki page from the
parser <https://www.mediawiki.org/wiki/API:Parsing_wikitext>, and from that
extract the contents of all syntax highlighted blocks.
The improvements resulting from this change are the following:
-
The queries no longer have to be specified directly on the page using
{{SPARQL}} or {{SPARQL2}}; they can be transcluded indirectly, e.
g. using {{query
page}}
<https://www.wikidata.org/wiki/Template:Query_page#Transclusion_usage>.
You can see a comparison at
User:TweetsFactsAndQueries/Queries-test-transclude
<https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-trans…>
and User:TweetsFactsAndQueries/Queries-test-copy
<https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-copy>.
If we go with the solution of one query per page, we should be aware that
we can fit less queries on the examples page before we hit some parser
limits.
-
Examples can be loaded from wikis that don’t have Parsoid / VisualEditor
installed, making it much easier for third-party setups to manage their own
lists of examples.
-
Queries that contained an unescaped pipe character (|) were previously
cut off at that character in the query service UI, this should now be fixed
and all queries should be displayed just like on the wikipage.
-
If the examples page hits some limit of the parser, then some examples
will not be loaded, whereas with the previous approach they would still be
loaded and shown on the query service UI even though they weren’t working
correctly on the wiki page.
Configuration changes for other Wikibase instances: third-party setups may
have to update their configuration (custom-config.json). In the 'examples'
object, the 'endpoint' (pointing to the REST API for Parsoid) has been
replaced with the 'apiPath' (the path to api.php after the 'server';
related to $wgScriptPath
<https://www.mediawiki.org/wiki/Manual:$wgScriptPath>, but without a
leading slash [should instead be at the end of the 'server'] and including
the /api.php at the end).
If you encounter any issues with the examples page or while configuring
your own Query Service instance, please let us know by adding a comment under
this task <https://phabricator.wikimedia.org/T174298>.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.