Wikidata-tech December 2019

wikidata-tech@lists.wikimedia.org

6 participants
7 discussions

by Amirouche Boubekki

Hello, I would like to know if you are interested by the proposal I made at: https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB Like I said previously on wikidata mailing list, I can address the following problems related to WDQS: In an ideal world, WDQS should: * scale in terms of data size * scale in terms of number of edits * have low update latency * expose a SPARQL endpoint for queries * allow anyone to run any queries on the public WDQS endpoint * provide great query performance * provide a high level of availability ref: https://lists.wikimedia.org/pipermail/wikidata/2019-June/013124.html The other proposal I made is about replacing both wikibase and blazegraph: https://meta.wikimedia.org/wiki/Grants:Project/Iamamz3/Prototype_A_Scalable… nomunofu is a working prototype I made to micro benchmark (again) GNU Guile: https://github.com/amirouche/nomunofu What do you think?

4 years, 3 months

how to load a dump of wikidata into a local wikibase

by Miquel Àngel Farré

Hello, We are having issues launching a local copy of wikidata, when we use the 'importDump.php' tool, below the issues that we are facing. If somebody has an idea of how we could solve this, please let me know. We are also considering professional services to get fixes for this being released in case somebody is professionally consulting around wikibase. Thanks, Miquel Here the issues: if I try to load the full dump, the error I get is: root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php --conf ../LocalSettings.php ../images/wikidatawiki-20191101-pages-articles-multistream.xml.bz2 Warning: XMLReader::read(): uploadsource://d0cd78c216b067ffdd60946c258db6a7:45: parser error : Extra content at the end of the document in /var/www/html/includes/import/WikiImporter.php on line 646 Warning: XMLReader::read(): </siteinfo> in /var/www/html/includes/import/WikiImporter.php on line 646 Warning: XMLReader::read(): ^ in /var/www/html/includes/import/WikiImporter.php on line 646 Done! You might want to run rebuildrecentchanges.php to regenerate RecentChanges, If I try to load a partial dump, the warnings that I get (which I think those mean nothing is loading) are: root@4fc8cc9b76b3:/var/www/html/maintenance# php importDump.php --conf ../LocalSettings.php ../images/wikidatawiki-20191020-pages-meta-current1.xml-p1p235321.bz2 Revision 1033865598 using content model wikibase-item cannot be stored on "Q15" on this wiki, since that model is not supported on that page. Revision 1034542603 using content model wikibase-item cannot be stored on "Q17" on this wiki, since that model is not supported on that page. Revision 1032554298 using content model wikibase-item cannot be stored on "Q18" on this wiki, since that model is not supported on that page. Revision 1032534215 using content model wikibase-item cannot be stored on "Q20" on this wiki, since that model is not supported on that page. Revision 1026713626 using content model wikibase-item cannot be stored on "Q21" on this wiki, since that model is not supported on that page. Revision 1023703278 using content model wikibase-item cannot be stored on "Q22" on this wiki, since that model is not supported on that page. Revision 1032815802 using content model wikibase-item cannot be stored on "Q25" on this wiki, since that model is not supported on that page. Revision 1032910600 using content model wikibase-item cannot be stored on "Q26" on this wiki, since that model is not supported on that page.

4 years, 3 months

Future-proof WDQS (Was: Re: [Wikidata] [ANN] nomunofu v0.1.0)

by Amirouche Boubekki

topic: removed semantic-web, guile, and added wikidata-tech. Let's move the conversation to wikidata-tech@ Please remove wikidata(a)lists.wikimedia.org next time you reply. Le dim. 22 déc. 2019 à 23:35, Ted Thibodeau Jr <tthibodeau(a)openlinksw.com> a écrit : > > > On Dec 22, 2019, at 03:17 PM, Amirouche Boubekki <amirouche.boubekki(a)gmail.com> wrote: > > > > Hello all ;-) > > > > > > I ported the code to Chez Scheme to do an apple-to-apple comparison > > between GNU Guile and Chez and took the time to launch a few queries > > against Virtuoso available in Ubuntu 18.04 (LTS). > > Hi, Amirouche -- > > Kingsley's points about tuning Virtuoso to use available > RAM [1] and other system resources are worth looking into, > but a possibly more important first question is -- > > Exactly what version of Virtuoso are you testing? > > If you followed the common script on Ubuntu 18.04, i.e., -- > > sudo apt update > > sudo apt install virtuoso-opensource > > -- then you likely have version 6.1.6 of VOS, the Open Source > Edition of Virtuoso, which shipped 2012-08-02 [2], and is far > behind the latest version of both VOS (v7.2.5+) and Enterprise > Edition (v8.3+)! > > The easiest way to confirm what you're running is to review > the first "paragraph" of output from the command corresponding > to the name of your Virtuoso binary -- > > virtuoso-t -? $ virtuoso-t -? Virtuoso Open Source Edition (multi threaded) Version 6.1.6.3127-pthreads as of Feb 6 2018 > > virtuoso-iodbc-t -? > I do not have that command. I use isql-vt: $ isql-vt --help OpenLink Interactive SQL (Virtuoso), version 0.9849b. > If I'm right, and you're running 6.x, you'll get much better > test results just by running a current version of Virtuoso. > > You can build VOS 7.2.6+ from source [3] (we'd recommend the > develop/7 branch [4] for the absolute latest), or download a > precompiled binary [5] of VOS 7.2.5.1 or 7.2.6.dev. > > You can also try Enterprise Edition at no cost for 30 days [5]. > Next round I will try the develop branch. Like I said, previously, somewhere, those benchmark must be taken with a grain of salt: For one, the Virtuoso timings are reported by Virtuoso. Second, nomuofu side, I do not convert the internal representation into the external representation, third and most important point, this is just a glimpse into the full picture. My mails are mainly trying to spark some interest or discussion with wikidata and wikimedia, so that I can work full time on this. I already described my intents, that is to create a benchmark tool based wikidata SPARQL logs [*], then use those to reallistically benchmark Virtuoso, the current solution and a new solution (nomunofu) that I am working on. [*] https://iccl.inf.tu-dresden.de/web/Wissensbasierte_Systeme/WikidataSPARQL/en Raw benchmarks would not tell all the thruth, because nomunofu can rely on both WiredTiger and FoundationDB, which, as far as I know, claim stronger guarantees than Virtuoso. The only way to know whether Virtuoso is comparable to FoundationDB or WiredTiger, will be for Virtuoso to pass the Jespen harness tests (https://jepsen.io/). I did not put all the eggs in the same basket, I am considering other options. But I think working for wikimedia by contract or permanent position would be best overall. I will make another WDQS proposal, based on some feedback I have been given on IRC to add more technical details (and improve the road map). > > [1] http://vos.openlinksw.com/owiki/wiki/VOS/VirtRDFPerformanceTuning > > [2] http://vos.openlinksw.com/owiki/wiki/VOS/VOSNews2012#2012-08-02%20--%20Anno…. > > [3] http://vos.openlinksw.com/owiki/wiki/VOS/VOSBuild > > [4] https://github.com/openlink/virtuoso-opensource/tree/develop/7 > > [5] https://sourceforge.net/projects/virtuoso/files/virtuoso/ > > > > > > > Spoiler: the new code is always faster. > > > > The hard disk is SATA, and the CPU is dubbed: Intel(R) Xeon(R) CPU > > E3-1220 V2 @ 3.10GHz > > > > I imported latest-lexeme.nt (6GB) using guile-nomunofu, chez-nomunofu > > and Virtuoso: > > > > - Chez takes 40 minutes to import 6GB > > - Chez is 3 to 5 times faster than Guile > > - Chez is 11% faster than Virtuoso > > > How did you load the data? Did you use Virtuoso's build-load > facilities? This is the recommended method [6]. > > [6] http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader > > > > Regarding query time, Chez is still faster than Virtuoso with or > > without cache. The query I am testing is the following: > > > > SELECT ?s ?p ?o > > FROM <http://fu> > > WHERE { > > ?s <http://purl.org/dc/terms/language> <http://www.wikidata.org/entity/Q150> . > > ?s <http://wikiba.se/ontology#lexicalCategory> > > <http://www.wikidata.org/entity/Q1084> . > > ?s <http://www.w3.org/2000/01/rdf-schema#label> ?o > > }; > > > > Virtuoso first query takes: 1295 msec. > > The second query takes: 331 msec. > > Then it stabilize around: 200 msec. > > > > chez nomunofu takes around 200ms without cache. > > > > There is still an optimization I can do to speed up nomunofu a little. > > > > > > Happy hacking! > > > I'll be interested to hear your new results, with a current build, > and with proper INI tuning in place. What will be the INI options I need to use? Thanks! > > Regards, > > Ted > > > > -- > A: Yes. http://www.idallen.com/topposting.html > | Q: Are you sure? > | | A: Because it reverses the logical flow of conversation. > | | | Q: Why is top posting frowned upon? > > Ted Thibodeau, Jr. // voice +1-781-273-0900 x32 > Senior Support & Evangelism // mailto:tthibodeau@openlinksw.com > // http://twitter.com/TallTed > OpenLink Software, Inc. // http://www.openlinksw.com/ > 20 Burlington Mall Road, Suite 322, Burlington MA 01803 > Weblog -- http://www.openlinksw.com/blogs/ > Community -- https://community.openlinksw.com/ > LinkedIn -- http://www.linkedin.com/company/openlink-software/ > Twitter -- http://twitter.com/OpenLink > Facebook -- http://www.facebook.com/OpenLinkSoftware > Universal Data Access, Integration, and Management Technology Providers > > > > Regards, Amirouche ~ zig ~ https://hyper.dev

4 years, 3 months

Searching for Co-Maintainer (MachtSinn/LexData)

by Michael F. Schönitzer

Hi, as some of you might know I wrote several tools for Wikidata, most recently: MachtSinn[1] – a tool to generate Senses for Lexemes from wikidata-items LexData[2] – a small python library to allow for easy editing of Lexemes Those two got more appreciation than I would have expected. That’s great, but it also means more bug-reports, more feature wishes and more Pull Requests. I currently write my thesis and therefore cannot put much time into these projects. At the same time I feel bad letting these projects unmaintained for the next three months and disappointing enthusiastic users. Therefore I’m looking for people who would be interested in contributing and co-maintaining these projects. It’s python code with a javascript frontend for MachtSinn. If you are interested, write me a mail or create an issue on github. Cheers, User:MichaelSchoenitzer [1] https://tools.wmflabs.org/machtsinn/ [2] https://github.com/Nudin/LexData -- Michael F. Schönitzer Henrik-Ibsen-Str. 2 80638 München Mail: michael(a)schoenitzer.de Jabber: schoenitzer(a)jabber.ccc.de Tel: 089/37918949 - Mobil: 017657895702

4 years, 4 months

Technical Advice IRC Meetings will not continue in 2020

by Johanna Strodt

Hello, apologies for cross-posting! I am writing to inform you that at Wikimedia Deutschland we have decided to stop having weekly Technical Advice IRC Meetings (TAIM)[1] at #wikimedia-tech after the final meeting on December 18th. In other words, Technical Advice IRC Meetings will not continue in 2020. We realized that the regular IRC meeting format has achieved the limits of its reach, and it is time for us to think about other means to better support as many volunteer developers as possible contributing to Wikimedia software. At WMDE we’ll be dedicating our efforts in 2020 to constantly improve documentation of our products, including Wikidata and Wikibase, to allow easier usage and contribution. Our engineering teams are also exploring new possibilities to support volunteer developers in a more asynchronous way than regular IRC meetings. If you have any suggestions for this, please let us know on our talk page.[2] On this point we would like to send the warmest thank you to all the people who hosted TAIM meetings, all participants asking and answering questions, and all supporters of TAIM over the years. Have a great seasonal break and see you around in 2020! On behalf of TAIM crew at WMDE Johanna [1] https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [2] https://www.mediawiki.org/wiki/Talk:Technical_Advice_IRC_Meeting -- Johanna Strodt Project Manager for Community Communication / Technical Wishes Project Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Tel. (030) 219 158 26-0 https://wikimedia.de

4 years, 4 months

MacroDroid Log

by Robery Delacruz

Here is the MacroDroid log file.Ftb:1.315.562.141.724.11.4022

4 years, 4 months

Changes to Wikidata Query Service UI’s example loading

by Léa Lacroix

Hello all, We recently made some changes to the way the Wikidata Query Service UI ( https://query.wikidata.org/) loads the example queries <https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples>. This change can impact people who are maintaining these queries, as well as people running their own Wikibase instance including the Query Service. Before the change, our approach was to get the HTML of that wiki page from Parsoid <https://www.mediawiki.org/wiki/Parsoid> (this includes some template metadata which the normal parser output doesn’t include), and from that extract the query parameters of all {{SPARQL}} and {{SPARQL2}} transclusions. With our improved approach, we get the HTML of that wiki page from the parser <https://www.mediawiki.org/wiki/API:Parsing_wikitext>, and from that extract the contents of all syntax highlighted blocks. The improvements resulting from this change are the following: - The queries no longer have to be specified directly on the page using {{SPARQL}} or {{SPARQL2}}; they can be transcluded indirectly, e. g. using {{query page}} <https://www.wikidata.org/wiki/Template:Query_page#Transclusion_usage>. You can see a comparison at User:TweetsFactsAndQueries/Queries-test-transclude <https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-trans…> and User:TweetsFactsAndQueries/Queries-test-copy <https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Queries-test-copy>. If we go with the solution of one query per page, we should be aware that we can fit less queries on the examples page before we hit some parser limits. - Examples can be loaded from wikis that don’t have Parsoid / VisualEditor installed, making it much easier for third-party setups to manage their own lists of examples. - Queries that contained an unescaped pipe character (|) were previously cut off at that character in the query service UI, this should now be fixed and all queries should be displayed just like on the wikipage. - If the examples page hits some limit of the parser, then some examples will not be loaded, whereas with the previous approach they would still be loaded and shown on the query service UI even though they weren’t working correctly on the wiki page. Configuration changes for other Wikibase instances: third-party setups may have to update their configuration (custom-config.json). In the 'examples' object, the 'endpoint' (pointing to the REST API for Parsoid) has been replaced with the 'apiPath' (the path to api.php after the 'server'; related to $wgScriptPath <https://www.mediawiki.org/wiki/Manual:$wgScriptPath>, but without a leading slash [should instead be at the end of the 'server'] and including the /api.php at the end). If you encounter any issues with the examples page or while configuring your own Query Service instance, please let us know by adding a comment under this task <https://phabricator.wikimedia.org/T174298>. Cheers, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

4 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech December 2019