Hey,
I've updated the documentation on the Wikidata browsertests. See [1] for
how to setup your system and run the tests using the "new" way with
cucumber.
If you have any questions don't hesitate asking me!
cheers, tobi
[1]
https://meta.wikimedia.org/wiki/Wikidata/Development/Testing#Browser_Testin…
--
Tobi Gritschacher
Software Developer - Wikidata - http://www.wikidata.org
Imagine a world, in which every single human being can freely
share in the sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland e.V. | Obentrautstraße 72 | 10963 Berlin
Phone +49 (0)30 219 158 260
http://wikimedia.de <http://www.wikimedia.de/>
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Since this is potentially a blocker on Chinmay's GSoC project, I'm
cc'ing wikidata-tech...
-andrew
On Thu, Sep 12, 2013 at 3:12 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote:
> On 11 September 2013 20:31, Chinmay Naik <chin.naik26(a)gmail.com> wrote:
>>
>> hi,
>> I am trying to retreive wikidata items by label through wbsearchentities()
>> using the "continue" parameter from
>> https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentities&fo…
>> Can i retreive more than 100 items using this? I notice the
>> 'search-continue' returned by the search result disappears after 50 items.
>> for ex
>> https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentities&fo…
>>
>
> The api docs at https://www.wikidata.org/w/api.php explicitly state the
> highest value for 'continue' is 50:
>
> limit - Maximal number of results
> The value must be between 0 and 50
> Default: 7
> continue - Offset where to continue a search
> The value must be between 0 and 50
> Default: 0
>
> which indeed suggests there is a hard limit of 100 entries. Maybe someone in
> the Wikidata dev team can explain the reason behind this?
>
> Merlijn
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
Hi all.
With today's deployment, the Wikibase API modules used on wikidata.org will
change from using lower-case IDs (q12345) to upper-case IDs (Q12345). This is
done for consistency with the way IDs are shown in the UI and used in URLs.
The API will continue to accept entity IDs in lower-case as well as upper-case.
Any bot or other client that has no property or item IDs hardcoded or configured
in lower case should be fine.
If however your code looks for some specific item or property in the output
returned from the API, and it's using a lower-case ID to do so, it may now fail
to match the respective ID.
There is potential for similar problems with Lua code, depending on how the data
structure is processed by Lua. We are working to minimize the impact there.
Sorry for the short notice.
Please test your code against test.wikidata.org and let us know if you find any
issues.
Thanks,
Daniel
PS: issue report on bugzilla: https://bugzilla.wikimedia.org/show_bug.cgi?id=53894
Hi all,
for my latest toy [1] I would really like to get the item IDs (Qxxx) for
the items in the dropdown box when editing a claim that points to an item.
I can get the item IDs for the search dropdown via the link hrefs, but
didn't find anything similar in the "role='menuitem'" <li>s for the edit
box.
Is there a global data structure with the ID list I could access? Are the
IDs attached to DOM nodes via jQuery data()? A hook I can add my function
to? Or something else I could use?
Alternatively, could you add an attribute (e.g. "itemid='Qxxx'") to the
<li> nodes? Shouldn't be too hard, as it couldn't really break anything
AFAICT. Could be really handy.
Thanks,
Magnus
[1] http://magnusmanske.de/wordpress/?p=64
Hey,
The commit moving the files in the DataValues repo around finally has been
merged \o/. We can thus now create the 2 new git repos that still need to
be created (the ValueView one is already there). We however still do not
have settled on names for these. Currently we have:
* DataValues(.git) (Composer: data-values/data-values): the DataValues
interface and trivial implementations (ie BooleanValue).
* DataValuesInterfaces(.git) (Composer: data-values/interfaces):
ValueParser/Formatter/Validator interfaces and trivial implementations
* DataValuesCommon(.git) (Composer: data-values/common): All currently
existing non-trivial implementations of the interfaces defined by the above
two packages that are not in Wikibase. Things might be split from this at a
later point if we deem this to be of use (for instance having a
DataValuesGeo data-values/geo with GlobeCoordinateValue, LatLongValue and
all the parsing and formatting code).
We will not be able to change these names without a lot of hassle later on,
so if you do not agree with one of the later two, speak up now. Unless
there are better suggestions, I'll be requesting repos with said names this
weekend.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hey,
There recently has been quite some talk on having multiple components,
including some concerns that simply having multiple components is in itself
a bad idea. To those who are not familiar with the principles of component
design, I can recommend the "Principles of Component Design" talk by Robert
C. Martin [0]. While the whole talk is interesting and fun to watch, the
first half contains a lot of historical details and other concerns not
relevant to PHP development. So if you need to pick one half to watch, go
for the second one.
[0] https://vimeo.com/68236438
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hi!
I'm getting closer and closer to writing UI for badges in repo. Does anyone
have any ideas?
The only thing I came with is a additional row beneath each site link which
behaves (and looks) like aliases editing currently does, but I think that this
would take too much space.
Thanks,
Michał
We are planning to deploy URLs as data values rather soon (i.e. September
9, if all goes well).
There was a discussion on wikidata-l mailing list:
<http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg02664.html>
The current implementation for URLs uses a string data value. There was
also a IRI data value developed (for this use case), but in a previous
(internal) discussion it was decided to use string value instead.
The above thread included a few strong arguments by Markus for using the
IRI data value. If we want to do this, we need to decide that very quickly,
and change it accordingly.
Let's see if we can make the decision here on this list. We need to make
the decision by Monday latest, better earlier.
Here are my current thoughts (check also the above mentioned thread if you
did not have already). Currently I have a preference to using the string
value, just to point out my current bias, but I want wider input.
* I do not see the advantage of representing '
http://www.ietf.org/rfc/rfc1738.txt' as a structured data value of the form
{ protocol : 'http', hierarchicalpart : 'www.ietf.org/rfc/rfc1738.txt',
query : '', fragment : '' }.
* If we use string value, a number of necessary features come for free,
like the diffing, displaying it in the diffs, etc. Sure, there is the
argument that we can use the getString method for these, but then what is
the use case that we actually serve by using the structured data?
* I understood the advantages of being able to *identify* whether the value
of a snak is a string or a URL, but that seems to be the same advantages as
for knowing whether the value of a snak is a Commons media file name or a
string. None of the the use cases though have been explaining why using the
above data structure is advantageous over a simple string value.
Please let us collect the arguments for and against using the IRI data
value *structure* here (not for being able to *identify* whether a string
is an IRI or a string).
Not completely independent of that, there are a few questions that need to
be answered but that are not as immediate, i.e. do not have to be decided
by next week:
* should, in the external JSON structure, for every snak the data value
type be listed (as it currently is)? I.e. should it state "string" instead
of "Commons media filename"?
* should, in the external JSON structure, for every snak the data type of
the property used be listed? This would then say URL, and this would solve
all the use cases mentioned by Markus, which rely on *identifying* this
distinction, not on the actual IRI data structure.
* should, in the internal JSON structure, something be changed?
The external JSON structure is the one used when communicating through the
API.
The internal JSON structure is the one that you get when using the dumps.
We need to have an export of the whole Wikidata knowledge base in the
external JSON format, rather sooner than later, and hopefully also in RDF.
The lack of these dumps should not influence our decision right now, imho :)
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Hey,
Replying to DanielKs comment on the "Improvements to the implementation of
Entity" thread, which has nothing to do with the topic of said thread,
hence a new one.
I got asked what "high risk" in my last email means.
>
> These results are using the CRAP metric [0] to determine the CRAPiness of
> the code (which is indicated by the number in brackets). You can have
> PHPUnit generate this by running the following command in the root
> directory of WikibaseDataModel and clicking "dashboard" on the generated
> index.html page: phpunit --coverage-html /some/path
>
> [0] http://googletesting.blogspot.de/2011/02/this-code-is-crap.html
>
I suggest to have this generated regularly and put it online somewhere
(maybe on wikidata-docs.wikimedia.de).
I suggested doing so many times in the past, though it turned out that
maintaining such own infrastructure is to much work for our team. Our
initial go at it is still online, even though it has been broken for over
half a year:
http://wikidata-docs.wikimedia.de/testcoverage/http://wikidata-docs.wikimedia.de/testcoverage/phpcoverage/20130320/extensi…
If we can get such a thing up and running again, and not have it break
every week, that'd be awesome. Going with a service we do not need to
maintain ourselves does seem to be the more sustainable approach though.
Coveralls.io [0] support is already a big step in that direction, though
this service unfortunately does not provide these CRAP reports at this time.
There are several other interesting metrics we could track, some of which
make good candidates for CI rules (ie commits with new methods with a
complexity above n get a -1). There are some people (of course I forgot
who) currently looking at PHPCS [1] integration into WMF Jenkins, which is
a promising development.
[0] https://coveralls.io/r/wikimedia
[1] http://pear.php.net/package/PHP_CodeSniffer
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hey,
Now the design issues of EntityId have been fixed, it's Entity's turn :)
(Note: this is about domain layer implementation details. Not considering
changing anything visible to the user here.)
While working on the QueryEntity together with addshore, we ran into a
number of issues with the current implementation of Entity. The main
problem is that Entity objects are constructed from their internal
"serialization". The constructor, which is marked as protected, takes in
this "serialization" (in array form). This is rather awkward, consider how
we now typically construct a Property:
$property = Property::newEmpty();
$property->setDataTypeId( $id );
(We also have a static newFromDataTypeId which wraps this.)
There is a bunch of code that assumes one can create empty Entity objects.
Esp in tests. I now think it was a mistake to allow this at all for
Property, which should not be constructed without a dataTypeId. The same
goes for QueryEntity, which should not be constructed without a Ask\Query.
It'd be much nicer if people could just use the constructors of the objects
and have these enforce the list of required parameters. They'd just take
the actual objects and not serializations.
$property = new Property( $id );
$queryEntity = new QueryEntity( $askQuery );
And since these things are enforced, one now gets back a string when
calling getDataTypeId, and a Ask\Query when calling getQueryDefinition,
rather then either that type or null.
Serialization and deserialization code can also go into dedicated service
objects. This is already done in QueryEntity, which is using the same
serializers as the ones the web API will use, saving us implementation of a
second format, which would not be of much help here anyway (I'd save some
disk space...).
There is also a lot of room to be more strict about things. Right now you
can happily construct a Entity that has ints as aliases, or as language
code for labels. On top of that, there are currently still TODOs from the
first months of the project in Entity related to normalization and handling
of duplicates. We might want to clearly define responsibilities at this
point :)
Oh and, of course Entity, Item, Property and Query each should go into
their own git repo.
Any objections or concerns about the above rambling?
There lately has also been some talking about doing things with Entity that
we did not consider before. Such as entities that contain other entities.
Is there a list of such thoughts? If not, lets compile one here so these
can be held into account.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--