Wikidata-tech

wikidata-tech@lists.wikimedia.org

619 discussions

Fwd: [Wikitech-l] New search backend live on mediawiki.org

by Daniel Kinzler

Did I say it'll take until next year before we have the new search infrastructure? Looks like I was wrong, it's being beta tested on mediawiki.org already. Time to check how well it works with wikibase, then! Let's ask about that in the call tomorrow. -- daniel -------- Original-Nachricht -------- Betreff: [Wikitech-l] New search backend live on mediawiki.org Datum: Wed, 28 Aug 2013 14:20:10 -0400 Von: Nikolas Everett <neverett(a)wikimedia.org> Antwort an: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> An: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Today we threw the big lever and turned on our new search backend at mediawiki.org. It isn't the default yet but it is just about ready for you to try. Here is what is we think we've improved: 1. Templates are now expanded during search so: 1a. You can search for text included in templates 1b. You can search for categories included in templates 2. The search engine is updated very quickly after articles change. 3. A few funky things around intitle and incategory: 3a. You can combine them with a regular query (incategory:kings peaceful) 3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans) What we think we've made worse and we're working on fixing: 1. Because we're expanding templates some things that probably shouldn't be searched are being searched. We've fixed a few of these issues but I wouldn't be surprised if more come up. We opened Bug 53426 regarding audio tags. 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing search results that seem out of order. 3. We don't currently index headings beyond the article title in any special way. We'll be fixing that soon. (Bug 53481) 4. Searching for file names or clusters of punctuation characters doesn't work as well as it used to. It still works reasonably well if you surround your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948) 5. "Did you mean" suggestions currently aren't highlighted at all and sometimes we'll suggest things that aren't actually better. (Bugs 52286 and 52860) 6. incategory:"category with spaces" isn't working. (Bug 53415) What we've changed that you probably don't care about: 1. Updating search in bulk is much more slow then before. This is the cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that is being actively developed (Elasticsearch) so we're in a much better place to expand on the new solution as time goes on. Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure. So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You can see the results from CirrusSearch by performing your search as normal and adding "&srbackend=CirrusSearch" to the url parameters. If you notice any problems with CirrusSearch please file bugs directly for it: https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions… Nik Everett _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

10 years, 8 months

Database schema updating code review

by Jeroen De Dauw

Hey, I wrote up a little design spike for a bunch of code needed for the store schema update functionality. This is nothing big or out of the ordinary [0]. Since I am only continuing with this code later this week, and no real implementation has been done so far, this commit now lends itself well to a design level review. https://gerrit.wikimedia.org/r/#/c/81254/3/src/TableDefinitionReader.php [0] Someone please suggest writing a formal RFC, have 5 calls about it, a dedicated architecture review, a dedicated security review, and perhaps its own mailing list. Cheers -- Jeroen De Dauw http://www.bn2vs.com *Don't panic*. Don't be evil. ~=[,,_,,]:3 --

10 years, 8 months

Loading DataTypes in WMF config

by Jeroen De Dauw

Hey, We have a bunch of development TODOs which are blocked by WMF deployment config being updated. Aude made the following commit 3 weeks ago. Can someone with the appropriate rights please have a look at it? https://gerrit.wikimedia.org/r/#/c/76481/ Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

10 years, 8 months

Clarifying role of different Wikidata extensions (Re: Loading DataTypes in WMF config)

by Rob Lanphier

Hi Denny, Comments below: On Mon, Aug 26, 2013 at 9:26 AM, Denny Vrandečić <denny.vrandecic(a)wikimedia.de> wrote: > 2013/8/26 Rob Lanphier <robla(a)wikimedia.org> >> On Fri, Aug 23, 2013 at 8:03 AM, Denny Vrandečić >> <denny.vrandecic(a)wikimedia.de> wrote: >> > for Semantic MediaWiki we had this already listed for a while here: >> > <https://meta.wikimedia.org/wiki/Wikidata/Notes/SMW_and_Wikidata> >> > The list is not completely up-to-date, it reflects the previous package >> > structure. >> >> It would be helpful if there was a wiki page that reflected the >> current state of things. For MediaWiki core, we're trying to get into >> the practice of having RFCs for changes of this magnitude, and >> something of that level of rigor would be nice here too. > > > The current state is described here: > <http://www.mediawiki.org/wiki/Wikibase> , more specifically here: > <http://www.mediawiki.org/wiki/Wikibase#Packages_and_code_structure> > > The new structure was described in Emails on Wikidata-tech, most recently > here: > <http://www.mail-archive.com/wikidata-tech@lists.wikimedia.org/msg00200.html> > > We are indeed not yet using the RFC process, but I would prefer if we could > agree to move to this process in the future, as this particular discussion > is going on already a bit longer than what I would expect for a discussion > regarding the question about how to organize code. This is not just about how we organize code. This is about how we package our work. It's unusual for us to have extensions with hard dependencies on other extensions, let alone the complicated hierarchy you all have chosen. It may be ok to do that, but we should discuss it before setting the precedent. This gets exposed to end users via Special:Version: https://en.wikipedia.org/wiki/Special:Version Since MediaWiki administrators often use this as a means of understanding how to configure their wikis "like Wikipedia", it would be nice if we didn't clutter that page up with a lot of the internals of our systems. Each of the links should point to a page that does a good job of describing what the extension does, and the vast majority of them do. Unfortunately, for most of the Wikidata extensions right now, the pages are pretty much boilerplaite plus a one-liner in many cases. Rather than get too far into the implementation details of what your extensions are doing, I think maybe I'll hold off until someone on my team has more time to think about this and comment on it. Rob

10 years, 8 months

EntityId changes

by Jeroen De Dauw

Hey, After discussion with Danwe and Denny I decided to rewrite EntityId. It was the highest risk class according to PHPUnit (290 CRAP) in the DataModel component, which really is bad considering this is one of our most basic classes, which a lot of other code binds itself to. The goal of rewriting it is getting rid of needless complexity, needless bindings and making non-problematic use of the class easier. The rewrite currently includes: * Dropping of id prefix configurability (as previously discussed) * Separation of the id class and the entity id DataValue (EntityId no longer derives from DataValue. A new EntityIdDataValue composites in a EntrityId) * The EntityId class now has as fields entity type (string) and id serialization (string) * New ItemId and PropertyId derivatives, which in the constructor only take the id serialization. The long term goal is to make EntityId abstract * Serialization format compatibility where required. The format of the DataValue will remain the same for now Comments on the work in progress are welcome: https://gerrit.wikimedia.org/r/#/c/80394/ It is possible this change will be done by tomorrow. More likely it'll only be merged on Monday or later. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

10 years, 8 months

Re: [Wikidata-tech] [Wikidata-l] claims Datatypes inconsistency suspicion

by Daniel Kinzler

Am 25.08.2013 19:19, schrieb Markus Krötzsch: > If we have an IRI DV, considering that URLs are special IRIs, it seems clear > that IRI would be the best way of storing them. The best way of storing them really depends on the storage platform. It may be a string or something else. I think the real issue here is that we are exposing something that is really an internal detail (the data value type) instead of the high level information we actually should be exposing, namely property type. I think splitting the two was a mistake, and I think exposing the DV type while making the property type all but inaccessible makes things a lot worse. In my opinion, data should be self-descriptive, so the *semantic* type of the property should be included along with the value. People expect this, and assume that this is what the DV type is. But it's not, and should not be used or abused for this purpose. Ideally, it should not matter at all to any 3rd party if use use a string or IRI DV internally. The (semantic) property type would be URL, and that's all that matters. I'm quite unhappy about the current situation; we are beginning to see the backlash of the decision not to include the property type inline. If we don't do anything about this now, I fear the confusion is going to get worse. -- daniel

10 years, 8 months

Drop id prefix configurability

by Jeroen De Dauw

Hey, Are there any objections against, or concerns with, dropping the configurability of entity id prefixes? Hardcoding these in their associated entity classes will make a lot of things easier. To clarify, entity id prefixes are things such as the "Q" in "Q42" for item ids or the "P" in "P1337" in property ids. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

10 years, 8 months

Registering Value Formatters

by Daniel Kinzler

Since this topic came up again during the daily standup today, I'd like to share a few links. We have discussed how to best manage and register formatters before. A very brief summary of our discussion is here: * https://meta.wikimedia.org/wiki/Wikidata/Development/Data_values The relevant tickets are: * https://bugzilla.wikimedia.org/show_bug.cgi?id=49265 * https://bugzilla.wikimedia.org/show_bug.cgi?id=51799 Please consider these when working on value formatters. Thanks! -- daniel

10 years, 8 months

Using @covers tags

by Jeroen De Dauw

Hey, I'd like to reiterate over the importance of using @covers tags [0] for test cases rather then writing "Test for SomeClass". These tags make coverage reporting a lot more accurate, and thus make it easier to spot areas lacking tests. Today I added a bunch of @covers tags in the DataModel component. The reported coverage dropped from 85% to 57%, quite a difference. [0] http://phpunit.de/manual/current/en/appendixes.annotations.html#appendixes.… Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

10 years, 8 months

Fwd: Java-based Wiktionary Library (JWKTL) 1.0.0 released as open source software

by Sumana Harihareswara

Forwarding to the Wikidata tech list in case this makes a future Wiktionary collaboration easier. -------- Original Message -------- Subject: [Wiki-research-l] Java-based Wiktionary Library (JWKTL) 1.0.0 released as open source software (Wiki-research-l Digest, Vol 96, Issue 22) Date: Tue, 20 Aug 2013 14:20:56 +0000 From: Judith Eckle-Kohler <eckle-kohler(a)ukp.informatik.tu-darmstadt.de> Reply-To: wiki-research-l(a)lists.wikimedia.org To: wiki-research-l(a)lists.wikimedia.org <wiki-research-l(a)lists.wikimedia.org> [Apologies for X-posting] We are pleased to announce the release of the Java-based Wiktionary Library (JWKTL) 1.0.0 - an application programming interface for Wiktionary. Project homepage: http://code.google.com/p/jwktl/ == Overview == JWKTL (Java-based Wiktionary Library) is an application programming interface for the free multilingual online dictionary Wiktionary (http://www.wiktionary.org). JWKTL enables efficient and structured access to the information encoded in the English, the German, and the Russian Wiktionary language editions, including sense definitions, part of speech tags, etymology, example sentences, translations, semantic relations, and many other lexical information types. The Russian JWKTL parser is based on Wikokit (http://code.google.com/p/wikokit/). Prior to being available as open source software, JWKTL has been a research project at the Ubiquitous Knowledge Processing (UKP) Lab of the Technische Universität Darmstadt, Germany. The following people have mainly contributed to this project: Yevgen Chebotar, Iryna Gurevych, Christian M. Meyer, Christof Müller, Lizhen Qu, Torsten Zesch. == Publications == A detailed description of Wiktionary and JWKTL is available in our scientific articles: * Christian M. Meyer and Iryna Gurevych: Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography, Chapter 13 in S. Granger & M. Paquot (Eds.): Electronic Lexicography, pp. 259291, Oxford: Oxford University Press, November 2012. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) * Christian M. Meyer and Iryna Gurevych: OntoWiktionary Constructing an Ontology from the Collaborative Online Dictionary Wiktionary, chapter 6 in M. T. Pazienza and A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, pp. 131161, Hershey, PA: IGI Global, February 2012. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) * Torsten Zesch, Christof Müller, and Iryna Gurevych: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary, in: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pp. 16461652, May 2008. Marrakech, Morocco. (http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_p…) == License and Availability == The latest version of JWKTL is available via Maven Central. If you use Maven as your build tool, then you can add JWKTL as a dependency in your pom.xml file: <dependency> <groupId>de.tudarmstadt.ukp.jwktl</groupId> <artifactId>jwktl</artifactId> <version>1.0.0</version> </dependency> JWKTL is available as open source software under the Apache License 2.0 (ASL). The software thus comes "as is" without any warranty (see license text for more details). JWKTL makes use of Berkeley DB Java Edition 5.0.73 (Sleepycat License), Apache Ant 1.7.1 (ASL), Xerces 2.9.1 (ASL), JUnit 4.10 (CPL). Some classes have been taken from the Wikokit project (available under multiple licenses, redistributed under the ASL license). See NOTICE.txt for further details. == Contact == Please direct any questions or suggestions to https://groups.google.com/forum/#!forum/jwktl-users Group E-Mail: jwktl-users(a)googlegroups.com Best wishes, Christian M. Meyer -- Christian M. Meyer, M.Sc. Doctoral Researcher Ubiquitous Knowledge Processing (UKP Lab) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany Phone [+49] (0)6151 16-5386, fax -5455, room S2/02/B113 meyer(a)ukp.informatik.tu-darmstadt.de www.ukp.tu-darmstadt.de<http://www.ukp.tu-darmstadt.de> Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de<http://www.werc.tu-darmstadt.de>

10 years, 8 months

← Newer
1
...
52
53
54
55
56
57
58
...
62
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech