Hey Steffen and Andy,
Continuing what I started on Twitter here, as some more characters might be
helpful :)
It seems that both our projects (FLOW3 and Wikidata) are in a similar
situation. We are using Gerrit as CR tool, and TravisCI to run our tests.
And we both want to have Travis run tests for all patchsets submitted to
Gerrit, and then +1 or -1 on verified based on the build passing or
failing. To what extend have you gotten such a thing to work on your
project? Is there code available anywhere? If both projects can use the
same code for this, I'd be happy to contribute to what you already have.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hi!
I'm getting closer and closer to writing UI for badges in repo. Does anyone
have any ideas?
The only thing I came with is a additional row beneath each site link which
behaves (and looks) like aliases editing currently does, but I think that this
would take too much space.
Thanks,
Michał
We are planning to deploy URLs as data values rather soon (i.e. September
9, if all goes well).
There was a discussion on wikidata-l mailing list:
<http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg02664.html>
The current implementation for URLs uses a string data value. There was
also a IRI data value developed (for this use case), but in a previous
(internal) discussion it was decided to use string value instead.
The above thread included a few strong arguments by Markus for using the
IRI data value. If we want to do this, we need to decide that very quickly,
and change it accordingly.
Let's see if we can make the decision here on this list. We need to make
the decision by Monday latest, better earlier.
Here are my current thoughts (check also the above mentioned thread if you
did not have already). Currently I have a preference to using the string
value, just to point out my current bias, but I want wider input.
* I do not see the advantage of representing '
http://www.ietf.org/rfc/rfc1738.txt' as a structured data value of the form
{ protocol : 'http', hierarchicalpart : 'www.ietf.org/rfc/rfc1738.txt',
query : '', fragment : '' }.
* If we use string value, a number of necessary features come for free,
like the diffing, displaying it in the diffs, etc. Sure, there is the
argument that we can use the getString method for these, but then what is
the use case that we actually serve by using the structured data?
* I understood the advantages of being able to *identify* whether the value
of a snak is a string or a URL, but that seems to be the same advantages as
for knowing whether the value of a snak is a Commons media file name or a
string. None of the the use cases though have been explaining why using the
above data structure is advantageous over a simple string value.
Please let us collect the arguments for and against using the IRI data
value *structure* here (not for being able to *identify* whether a string
is an IRI or a string).
Not completely independent of that, there are a few questions that need to
be answered but that are not as immediate, i.e. do not have to be decided
by next week:
* should, in the external JSON structure, for every snak the data value
type be listed (as it currently is)? I.e. should it state "string" instead
of "Commons media filename"?
* should, in the external JSON structure, for every snak the data type of
the property used be listed? This would then say URL, and this would solve
all the use cases mentioned by Markus, which rely on *identifying* this
distinction, not on the actual IRI data structure.
* should, in the internal JSON structure, something be changed?
The external JSON structure is the one used when communicating through the
API.
The internal JSON structure is the one that you get when using the dumps.
We need to have an export of the whole Wikidata knowledge base in the
external JSON format, rather sooner than later, and hopefully also in RDF.
The lack of these dumps should not influence our decision right now, imho :)
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Hey,
Now the design issues of EntityId have been fixed, it's Entity's turn :)
(Note: this is about domain layer implementation details. Not considering
changing anything visible to the user here.)
While working on the QueryEntity together with addshore, we ran into a
number of issues with the current implementation of Entity. The main
problem is that Entity objects are constructed from their internal
"serialization". The constructor, which is marked as protected, takes in
this "serialization" (in array form). This is rather awkward, consider how
we now typically construct a Property:
$property = Property::newEmpty();
$property->setDataTypeId( $id );
(We also have a static newFromDataTypeId which wraps this.)
There is a bunch of code that assumes one can create empty Entity objects.
Esp in tests. I now think it was a mistake to allow this at all for
Property, which should not be constructed without a dataTypeId. The same
goes for QueryEntity, which should not be constructed without a Ask\Query.
It'd be much nicer if people could just use the constructors of the objects
and have these enforce the list of required parameters. They'd just take
the actual objects and not serializations.
$property = new Property( $id );
$queryEntity = new QueryEntity( $askQuery );
And since these things are enforced, one now gets back a string when
calling getDataTypeId, and a Ask\Query when calling getQueryDefinition,
rather then either that type or null.
Serialization and deserialization code can also go into dedicated service
objects. This is already done in QueryEntity, which is using the same
serializers as the ones the web API will use, saving us implementation of a
second format, which would not be of much help here anyway (I'd save some
disk space...).
There is also a lot of room to be more strict about things. Right now you
can happily construct a Entity that has ints as aliases, or as language
code for labels. On top of that, there are currently still TODOs from the
first months of the project in Entity related to normalization and handling
of duplicates. We might want to clearly define responsibilities at this
point :)
Oh and, of course Entity, Item, Property and Query each should go into
their own git repo.
Any objections or concerns about the above rambling?
There lately has also been some talking about doing things with Entity that
we did not consider before. Such as entities that contain other entities.
Is there a list of such thoughts? If not, lets compile one here so these
can be held into account.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hey,
This email is meant to provide an overview of the plans regarding the
reorganization of the components in the DataValues.git repository.
Current component situation:
* DataValues
* ValueParsers, depends on DataValues
* ValueValidators, depends on DataValues
* ValueFormatters, depends on DataValues
* DataTypes, depends on all the above
* ValueView, depends on all the above
All of these are bundling inheritance hierarchies and are both defining
interfaces as well as complex implementations.
Reorganization plans:
* DataValues, will hold interfaces, exceptions and trivial implementations
of current DataValues
* DataValues interfaces (still need a good name for this), will hold
interfaces, exceptions and trivial implementations of ValueParsers,
ValueFormatters and ValueValidators. Depends on DataValues
* DataValues implementations (still need a good name for this), will hold
common non-trivial implementations of the interfaces defined by the above
two components
* DataTypes, unchanged, now only dependent on DataValues and DataValues
interfaces
* ValueView, unchanged, now only dependent on DataTypes, DataValues and
DataValues interfaces
Dependencies are thus minimized and users are no longer forced to depend on
unstable concrete classes for no reason. Coincidentally the number of
components also drops by one.
Git repository wise, everything is currently in a single repository. Each
component will go into its own repo, with the exception of ValueView and
DataTypes, which we'll at least initially put together. This means creation
of 3 new git repos. The DataTypes git repo has already been created and we
are awaiting removal of the old DataTypes code from DataValues.git which
currently is blocked by WMF configuration update. Once this is done we can
proceed with the remaining two repos.
When this reorganization is done and the components reside in their own
repos, we can make the two abstract ones releasable. These are the ones
most dependent upon, and some of the current users have their own releases
blocked due to the lack of any released version of their DataValues
dependencies.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hello,
currently, there is a strong interdependence between the Wikibase
development and Wikidata deployment, which caused and causes friction.
Sorry for that.
A suggestion that was made yesterday by Rob was to introduce a build-step
for deployment of Wikidata (something ULS already seems to be doing). If
done right, this would allow to decouple the way components are split from
the way components are deployed.
If I understood it correctly, we would basically have either a deployment
module or a deployment branch on an existing module, which (preferably)
automatically gathers all dependencies, e.g. into a lib/ or dependency/
folder, or similar, and remains the sole module to be deployed.
There are a number of details to be decided obviously, but I wanted first
to gather consensus on whether this is the way forward for the short-term.
If so, we would start to work on this very soon.
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi Denny,
On Tue, Aug 27, 2013 at 4:27 AM, Denny Vrandečić <
denny.vrandecic(a)wikimedia.de> wrote:
> 2013/8/27 Rob Lanphier <robla(a)wikimedia.org>
>> On Mon, Aug 26, 2013 at 9:26 AM, Denny Vrandečić
>> <denny.vrandecic(a)wikimedia.de> wrote:
>> > We are indeed not yet using the RFC process, but I would prefer if we
>> > could
>> > agree to move to this process in the future, as this particular
>> > discussion
>> > is going on already a bit longer than what I would expect for a
>> > discussion
>> > regarding the question about how to organize code.
>>
>>
>> This is not just about how we organize code. This is about how we
>> package our work. It's unusual for us to have extensions with hard
>> dependencies on other extensions, let alone the complicated hierarchy
>> you all have chosen. It may be ok to do that, but we should discuss
>> it before setting the precedent.
>>
>
> This was already discussed here:
> <http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg69983.html>
Yes, it was discussed. I was traveling the week that thread broke out, and
never fully caught up on that particular thread (only fully read it this
morning), so my apologies.
That said, it didn't appear to me that we actually resolved anything there.
Did I miss something?
> Scribunto and Babel are other extensions that have dependencies, and there
> are more of them. We are not setting a sole precedent here.
Scribunto has optional integration with WikiEditor, SyntaxHighlight GeSHi,
CodeEditor. The latter three all are useful extensions to end users in
their own right, and Scribunto works fine without them. Moreover, I'm
comfortable that I could walk around the office and get a coherent and
mostly correct explanation about what all of these extensions do from many
(if not most) developers here.
I'm not as familiar with the dependencies that Babel has. In my cursory
look, the situation looks similar to Scribunto's dependencies. The
description for Babel could be a bit better, but it's sufficient for me to
know what end-user functionality it's providing.
I'm not saying that, because we haven't really managed dependencies between
internal libraries before that we can't do it now. However, I don't think
your examples support your case that this is already standard practice.
>> This gets exposed to end users via Special:Version:
>> https://en.wikipedia.org/wiki/Special:Version
>>
>> Since MediaWiki administrators often use this as a means of
>> understanding how to configure their wikis "like Wikipedia", it would
>> be nice if we didn't clutter that page up with a lot of the internals
>> of our systems. Each of the links should point to a page that does a
>> good job of describing what the extension does, and the vast majority
>> of them do. Unfortunately, for most of the Wikidata extensions right
>> now, the pages are pretty much boilerplaite plus a one-liner in many
>> cases.
>>
>
> I just checked a random sample of other extensions, and most of them are
> just boilerplate plus a one-liner. I just started at the bottom:
> <https://www.mediawiki.org/wiki/Extension:ZeroRatedMobileAccess>
> <https://www.mediawiki.org/wiki/Extension:WikimediaMessages>
> <https://www.mediawiki.org/wiki/Extension:WikimediaShopLink>
Starting at the bottom isn't a representative sample. Moreover, the
documentation for ZeroRatedMobileAccess is mostly in the clearly linked
README which is not the worst place for it.
Most of the spot checking I did on other extensions, there was sufficient
information there for me to understand the essential functionality provided.
> Also, this is a very different point than raised before, and we have, in
> several places tried to improve our documentation, as e.g. by creating
this
> page:
> <https://www.mediawiki.org/wiki/Wikibase>
> But the quality of our documentation seems to be a shift in the focus of
> this discussion. I would be happy if we could define well what the actual
> point of discussion is, so that we can resolve it as soon.
I'm trying to understand the breakup of the extensions, and in our
continuing discussion, you've pointed me at varying bits of documentation
that don't answer the questions that I have.
I'd like to understand what exactly is so terrible about the status quo
that you all are blocked on your development. If this refactoring is
really so urgent, why can't you clearly and concisely state not only what
you are doing, but *why* you are doing it.
>> Rather than get too far into the implementation details of what your
>> extensions are doing, I think maybe I'll hold off until someone on my
>> team has more time to think about this and comment on it.
>>
>
> As long as this does not contradict your other mail, where we said not to
> further delay the refactoring of the DataValues-related extensions, sure.
Well, we can move forward with this specifically:
https://gerrit.wikimedia.org/r/#/c/76481/
I would hope you can hold off on the other refactoring until there is at
least one person on my team who can confidently explain what the role of
each of the extensions you're proposing is. I was going to bite the bullet
and just get my head around it myself, but I need to be realistic about the
level of effort I can expend with this.
Rob
Did I say it'll take until next year before we have the new search
infrastructure? Looks like I was wrong, it's being beta tested on mediawiki.org
already. Time to check how well it works with wikibase, then!
Let's ask about that in the call tomorrow.
-- daniel
-------- Original-Nachricht --------
Betreff: [Wikitech-l] New search backend live on mediawiki.org
Datum: Wed, 28 Aug 2013 14:20:10 -0400
Von: Nikolas Everett <neverett(a)wikimedia.org>
Antwort an: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
An: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Today we threw the big lever and turned on our new search backend at
mediawiki.org. It isn't the default yet but it is just about ready for you
to try. Here is what is we think we've improved:
1. Templates are now expanded during search so:
1a. You can search for text included in templates
1b. You can search for categories included in templates
2. The search engine is updated very quickly after articles change.
3. A few funky things around intitle and incategory:
3a. You can combine them with a regular query (incategory:kings peaceful)
3b. You can use prefix searches with them (incategory:norma*)
3c. You can use them everywhere in the query (roger incategory:normans)
What we think we've made worse and we're working on fixing:
1. Because we're expanding templates some things that probably shouldn't
be searched are being searched. We've fixed a few of these issues but I
wouldn't be surprised if more come up. We opened Bug 53426 regarding audio
tags.
2. The relative weighting of matches is going to be different. We're
still fine tuning this and we'd appreciate any anecdotes describing search
results that seem out of order.
3. We don't currently index headings beyond the article title in any
special way. We'll be fixing that soon. (Bug 53481)
4. Searching for file names or clusters of punctuation characters doesn't
work as well as it used to. It still works reasonably well if you surround
your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948)
5. "Did you mean" suggestions currently aren't highlighted at all and
sometimes we'll suggest things that aren't actually better. (Bugs 52286 and
52860)
6. incategory:"category with spaces" isn't working. (Bug 53415)
What we've changed that you probably don't care about:
1. Updating search in bulk is much more slow then before. This is the
cost of expanding templates.
2. Search is now backed by a horizontally scalable search backend that is
being actively developed (Elasticsearch) so we're in a much better place to
expand on the new solution as time goes on.
Neat stuff if you run your own MediaWiki:
CirrusSearch is much easier to install than our current search
infrastructure.
So what will you notice? Nothing! That is because while the new search
backend (CirrusSearch) is indexing we've left the current search
infrastructure as the default while we work on our list of bugs. You can
see the results from CirrusSearch by performing your search as normal and
adding "&srbackend=CirrusSearch" to the url parameters.
If you notice any problems with CirrusSearch please file bugs directly for
it:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions…
Nik Everett
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hey,
I wrote up a little design spike for a bunch of code needed for the store
schema update functionality. This is nothing big or out of the ordinary
[0]. Since I am only continuing with this code later this week, and no real
implementation has been done so far, this commit now lends itself well to a
design level review.
https://gerrit.wikimedia.org/r/#/c/81254/3/src/TableDefinitionReader.php
[0] Someone please suggest writing a formal RFC, have 5 calls about it, a
dedicated architecture review, a dedicated security review, and perhaps its
own mailing list.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
*Don't panic*. Don't be evil. ~=[,,_,,]:3
--
Hey,
We have a bunch of development TODOs which are blocked by WMF deployment
config being updated. Aude made the following commit 3 weeks ago. Can
someone with the appropriate rights please have a look at it?
https://gerrit.wikimedia.org/r/#/c/76481/
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--