Hi!
I wonder if anybody have run/is running Wikibase without CirrusSearch
installed and whether the fulltext search is supposed to work in that
configuration? The suggester/prefix search, aka wbsearchentities, works
ok, but I can't make fulltext aka Special:Search find anything on my VM
(which very well may be a consequence of me messing up, or some bug, or
both :)
So, I wonder - is it *supposed* to be working? Is anybody using it this
way and does anybody care for such a use case?
Thanks,
--
Stas Malyshev
smalyshev(a)wikimedia.org
Dear Users,
I have the great honour to inform you that the Call for Proposals for WikiIndaba 2018 is now open. WikiIndaba 2018 is the 3rd conference of African Wikimedia movement and will give to participants the opportunity to share their Wikimedia-related experience and skills with a wide and active African Wikimedia audience. The conference will be held in Tunisia from 16 to 18 March 2018. If you want to participate to WikiIndaba and share your works and thoughts with African Wikimedians, feel free to submit your proposal in https://meta.m.wikimedia.org/wiki/WikiIndaba_conference_2018/Submissions. The deadline for giving proposals will be January 15th, 2018.
If you need a scholarship to attend WikiIndaba 2018, you can apply to it in https://docs.google.com/forms/d/e/1FAIpQLSdJJ2I0FBqp4SuiW5ypj-9lnLaAidUmhMs….
Looking forward to seeing you in Tunis next March.
Yours Sincerely,
Houcemeddine Turki
Felix Nartey
Isla Haddow-Flood
Dear Users,
I have the great honour to inform you that the Call for Proposals for WikiIndaba 2018 is now open. WikiIndaba 2018 is the 3rd conference of African Wikimedia movement and will give to participants the opportunity to share their Wikimedia-related experience and skills with a wide and active African Wikimedia audience. The conference will be held in Tunisia from 16 to 18 March 2018. If you want to participate to WikiIndaba and share your works and thoughts with African Wikimedians, feel free to submit your proposal in https://meta.m.wikimedia.org/wiki/WikiIndaba_conference_2018/Submissions. The deadline for giving proposals will be January 15th, 2018.
If you need a scholarship to attend WikiIndaba 2018, you can apply to it in https://docs.google.com/forms/d/e/1FAIpQLSdJJ2I0FBqp4SuiW5ypj-9lnLaAidUmhMs….
Looking forward to seeing you in Tunis next March.
Yours Sincerely,
Houcemeddine Turki
Felix Nartey
Isla Haddow-Flood
Dear all,
I just released a new version (0.8) of the Wikidata Toolkit [1]. Wikidata
Toolkit is a Java library allowing to easily reuse Wikidata content using
dumps or the API but also providing helpers for editing.
This release mainly adds several fixes that are needed to keep Wikidata
Toolkit working with the changes done on Wikidata provides support for JDK
9.
It also provides two features related to the Wikibase API: it is now
possible to edit labels, descriptions and aliases using the
WikibaseDataEditor (this is a work in progress that is likely to change)
and there is now a wrapper for the wbEntitySearch API action.
We have created a short survey to help doing technical choices for the
future versions of the Wikidata Toolkit, especially related to Java 7
support and the RDF converter. Please fill it if you are using Wikidata
Toolkit (it should take less than a minute):
https://docs.google.com/forms/d/e/1FAIpQLSdN25X2sTv2wQe-y56d0hC4QmU06s6crr1…
Best,
Thomas (Tpt)
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
Dear Mr. or Ms.,
I thank you for your efforts. When I was in AICCSA 2017 conference last month, I discussed with Arab computational linguists several ideas. I found that you do not know these rules:
* The label of a proper entity (Person, Place, Trademark...) in Modern Standard Arabic is the same as the one of such an entity in the following Arabic dialects: South Levantine Arabic (ajp), Gulf Arabic (afb), Hejazi Arabic (acw), Najdi Arabic (ars), Hadhrami Arabic (ayh), Sanaani Arabic (ayn), Ta'izzi-Adeni Arabic (acq), and Mesopotamian Arabic (acm).
* Labels of places and people from Palestine, Jordan, Syria, Iraq, Kuwait, Yemen, Oman, Bahrain, Qatar, UAE, Saudi Arabia, Sudan, Djibouti, Comoros, Somalia, and Mauritania are the same in all Arabic dialects as in Modern Standard Arabic.
I ask if you can make a bot that automatically applies these two rules.
Yours Sincerely,
Houcemeddine Turki
Dear Mr. or Ms.,
I thank you for your interest. The proceedings paper, the presentation slides as well as an overview about the discussions I have done about using Wikidata for the Natural Language Processing of Arabic dialects are available now in ResearchGate. Please see https://www.researchgate.net/publication/321039195_Using_WikiData_as_a_mult… for the proceedings paper and https://www.researchgate.net/publication/321039289_AICCSA_2017_-_Wikidata_P… for the presentation slides and for the overview about the discussions I have done about using Wikidata for the Natural Language Processing of Arabic dialects in ResearchGate.
Yours Sincerely,
Houcemeddine Turki
did you try to point the wdqs copy to your tdb/fuseki endpoint?
On Thu, 7 Dec 2017 at 18:58, Andy Seaborne <andy(a)apache.org> wrote:
> Dell XPS 13 (model 9350) - the 2015 model.
> Ubuntu 17.10, not a VM.
> 1T SSD.
> 16G RAM.
> Two volumes = root and user.
> Swappiness = 10
>
> java version "1.8.0_151" (OpenJDK)
>
> Data: latest-truthy.nt.gz (version of 2017-11-24)
>
> == TDB1, tdbloader2
> 8 hours // 76,164 TPS
>
> Using SORT_ARGS: --temporary-directory=/home/afs/Datasets/tmp
> to make sure the temporary files are on the large volume.
>
> The run took 28877 seconds and resulted in a 173G database.
>
> All the index files are the same size.
>
> node2id : 12G
> OSP : 53G
> SPO : 53G
> POS : 53G
>
> Algorithm:
>
> Data phase:
>
> parse file, create node table and a temporary file of all triples (3x 64
> bit numbers, written in text.
>
> Index phase:
>
> for each index, sort the temp file (using sort(1), an external sort
> utility), and make the index file by writing the sorted results, filling
> the data blocks and creating any tree blocks needed. This is a
> stream-write process - calculate the data block, write it out when full
> and never touch it again.
>
> This results in data blocks being completely full, unlike the standard
> B+Tree insertion algorithm. It is why indexes are exactly the same size.
>
> Building SPO is faster because the data is nearly sorted to start with,.
> Data often tends to arrive grouped by subject.
>
> tdbloader2 is doing stream (append) I/O on index files, not a random
> access pattern.
>
> == TDB1 tdbloader1
> 29 hours 43 minutes // 20,560 TPS
>
> 106,975 seconds
> 297G DB-truthy
>
> node2id: 12G
> OSP: 97G
> SPO: 96G
> POS: 98G
>
> Same size node2id table, larger indexes.
>
> Algorithm:
>
> Data phase:
>
> parse the file and create the node table and the SPO index.
> The creation of SPO is by b+tree insert so blocks are partially full
> (average is empirically about 2/3 full). When a block fills up, it is
> split into 2. The node table is exactly the same as tdbloader2 because
> nodes are stored in the same order.
>
> Index phase:
>
> for each index, copy SPO to the index. This is a tree sort and the
> access pattern on blocks is fairly random which is a bad thing. Doing
> one at a time is faster than two together because more RAM in the
> OS-managed file system cache, is devoted to caching one index. A cache
> miss is a possible write to disk, and always a read from disk, which is
> a lot of work even with an SSD.
>
> Stream reading SPO is efficient - it is not random I/O, it is stream I/O.
>
> Once the cache-efficiency of the OS disk cache drops, tdbloader slows
> down markedly.
>
> == Comparison of TDB1 loaders.
>
> Building an index is a sort because the B+Trees hold data sorted.
>
> The approach of tdbloader2 is to use an external sort algorithm (i.e.
> sort larger than RAM using temporary files) done by a highly tuned
> utility, unix sort(1).
>
> The approach of tdbloader1 is to copy into a sorted datastructure. For
> example, copying index SPO to POS, it is creating a file with keys
> sorted by P then O then S, which is not the arrival order which is
> S-sorted. tdbloader1 maximises OS caching of memory mapped files by
> doing indexes one at a time. Experimentation shows that doing two at
> once is slower, and doing two in parallel is no better and sometimes
> worse, than doing sequentially.
>
> == TDB2
>
> TDB2 is experimental. The current TDB2 loader is a functional placeholder.
>
> It is writing all three indexes at the same time. While for SPO this is
> not a bad access pattern (subjects are naturally grouped), for POS and
> OSP, the I/O is a random pattern, not a stream pattern. There is more
> than double contention for OS disk cache, hence it is slow and gets
> slower faster.
>
> == More details.
>
> For more information, consult the Jena dev@ and user@ archives and the
> code.
>
--
---
Marco Neumann
KONA
Hello all,
Some information that may interest people who contribute to Wikidata and
Wikibase, or follow the development.
We made some changes in our deployment structure, the most relevant things
for you are:
- we are now able to deploy new code for Wikidata every week, in the
regular Mediawiki train, instead of every two weeks. test.wikidata.org
will be updated on Tuesdays and wikidata.org on Wednesdays (details
<https://wikitech.wikimedia.org/wiki/Deployments>)
- beta.wikidata.org is now updated every 10min with new code (details
<https://integration.wikimedia.org/ci/view/Beta/>)
Which basically means that new features and bug fixes will arrive faster to
you :)
Thanks to Addshore who made this happen!
If you have questions or want further details, feel free to ask.
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hi!
We are seeing more use of the Wikidata Query Service by Wikimedia
projects. Which is excellent news, but somewhat worse news is that the
maintainers of WDQS do not have a good idea what these services are,
what they needs are and so on. So, we have decided we want to start
tracking internal uses of Wikidata Query Service.
To that point, if you run any functionality on Wikimedia sites
(Wikipedias, Wikidata, etc., anything with wikimedia domain) that uses
queries to the Wikidata Query Service, please go to:
https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Usage
and add your project there. That is both if your project runs queries by
itself on the background, or if it uses queries as part of user
interaction scenario.
We do not include labs tools currently unless it is absolutely vital
infrastructure (i.e. if it went down, would it substantially degrade the
main site functionality or make some features unusable?) If you still
feel we should know about certain lab tool, please leave a note on the
talk page.
What's in it for you?
We want to know these in order to better understand the scope of
internal usage and as preparation for T178492 (creating internal WDQS
setup) - with the goal to provide internal users more robust and more
flexible service. Also we want it to ensure we do not break anything
important when we do maintenance, and we know who to talk to if some
queries do not work as expected and we want to fix it.
What we want to know?
- We'd like to have general description of the functionality (i.e., what
the service is for)
- How to recognize queries run by it - user agent? source host? specific
query pattern? some other mark? It is recommended that it would be
possible to recognize
- What kind of queries it runs (no need to list every possible one of
course but if there are typical cases it'd help to see it)?
- How often the queries run - if it's periodic, or what is
expected/statistical usage of the tool if it's user driven tool?
- Where could we see the code at the base of it and who maintains it?
- Feel free to add any other information about anything you think would
be useful for us to know.
What was that page again?
https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Usage
Thanks in advance,
--
Stas Malyshev
smalyshev(a)wikimedia.org