Wikidata December 2019

wikidata@lists.wikimedia.org

27 participants
21 discussions

Weekly Summary #338
by Léa Lacroix 10 Oct '20

10 Oct '20

4 4

Accessing tabular data from SPARQL?
by Daniel Mietchen 07 Jul '20

07 Jul '20

Hi, I'm looking into ways to use tabular data like https://commons.wikimedia.org/wiki/Data:Zika-institutions-test.tab in SPARQL queries but could not find anything on that. My motivation here is in part coming from the time out limits, and the basic idea here would be to split queries that typically time out into sets of queries that do not time out and - if their results were aggregated - would yield the results that would be expected for the original query would it not time out. The second line of motivation here is that of keeping track of how things develop over time, which would be interesting for both content and maintenance queries as well as usage of things like classes, references, lexemes or properties. I would appreciate any pointers or thoughts on the matter. Thanks, Daniel

3 5

Wiki Workshop 2020 Announcement and Call for Papers
by Leila Zia 19 Apr '20

19 Apr '20

Hi everyone, We are delighted to announce that Wiki Workshop 2020 will be held in Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as part of the Web Conference 2020 [1]. In the past years, Wiki Workshop has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San Francisco. You can read more about the call for papers and the workshops at http://wikiworkshop.org/2020/#call. Please note that the deadline for the submissions to be considered for proceedings is January 17. All other submissions should be received by February 21. If you have questions about the workshop, please let us know on this list or at wikiworkshop(a)googlegroups.com. Looking forward to seeing you in Taipei. Best, Miriam Redi, Wikimedia Foundation Bob West, EPFL Leila Zia, Wikimedia Foundation [1] https://www2020.thewebconf.org/

2 7

Call for Proposals for grants to support WikiCite satellite events
by Liam Wyatt 25 Mar '20

25 Mar '20

Dear Wikimedians, The call for proposals for grants to organise WikiCite satellite events <https://meta.wikimedia.org/wiki/WikiCite> to be held first half of 2020 is now open. WikiCite is an initiative to develop open citations and linked bibliographic data to serve free knowledge. Since 2016, there has been an annual conference in support of that goal. In 2020, this will become a series of independently organised Satellite events around the world. The steering committee, with the financial support of the Alfred P. Sloan Foundation and the Wikimedia Foundation, are offering grants to support WikiCite Satellite events in three categories for events happening in the first half of 2020: 1. *International events in association with professional conferences *[Maximum $10k each, Application deadline 1 February] (details <https://meta.wikimedia.org/wiki/WikiCite#1:_International_events_in_associa…> ) 2. Local workshops and meetups [Maximum $2k each, Open until funds are allocated] (details <https://meta.wikimedia.org/wiki/WikiCite#2:_Local_workshops_and_meetups> ) 3. Events seeking non-financial support (details <https://meta.wikimedia.org/wiki/WikiCite#3:_Events_seeking_non-financial_su…> ) Submissions are especially encouraged if they are related to marginalised knowledge and/or hosted at a location in the global South. To read the detailed criteria for each of these categories, and to apply for a WikiCite Satellite event grant, visit: https://meta.wikimedia.org/wiki/WikiCite Sincerely and on behalf of the steering committee <http://wikicite/administration#Coordination>, Liam Wyatt / Wittylama WikiCite program manager [WMF Contractor] Note: If you wish to host an event in support of WikiCite’s aims, but do not require any support, you do not need to apply for ‘permission’. Please add your event to the ongoing calendar of events <https://meta.wikimedia.org/wiki/WikiCite/media>.

5 5

Subject: New Office hours for WMF/Research&Analytics starting in January 2020
by Martin Gerlach 17 Jan '20

17 Jan '20

Hi all, We, the Research team at Wikimedia Foundation, have received some requests over the past months for making ourselves more available to answer some of the research questions that you as Wikimedia volunteers, affiliates' staff, and researchers face in your projects and initiatives. Starting January 2020, we will experiment with monthly office hours organized jointly by our team and the Analytics team where you can join us and direct your questions to us. We will revisit this experiment in June 2020 to assess whether to continue it or not. The scope We encourage you to attend the office hour if you have research related questions. These can be questions about our teams, our projects, or more importantly questions about your projects or ideas that we can support you with during the office hours. You can also ask us questions about how to use a specific dataset available to you, to answer a question you have, or some other question. Note that the purpose of the office hours is to answer your questions during the dedicated time of the office hour. Questions that may require many hours of back-and-forth between our team and you are not suited for this forum. For these bigger questions, however, we are happy to brainstorm with you in the office hour and point you to some good directions to explore further on your own (and maybe come back in the next office hour and ask more questions). Time and Location We meet on the 4th Wednesday of every month 17.00-18.00 (UTC) in #wikimedia-research IRC channel on freenode [1]. The first meeting will be on January 22. Up-to-date information on mediawiki [2] Archiving If you miss the office hour, you can read the logs of it at [3]. The future announcements about these office hours will only go to the following lists so please make sure you're subscribed to them if you like to receive a ping: * wiki-research-l mailing list [4] * analytics mailing list [5] * wikidata mailing list [6] * the Research category in Space [7] on behalf of Research and Analytics at WMF, Martin [1] irc://irc.freenode.net/wikimedia-research [2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours [3] https://wm-bot.wmflabs.org/logs/%23wikimedia-research/ [4] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l [5] https://lists.wikimedia.org/mailman/listinfo/analytics [6] https://lists.wikimedia.org/mailman/listinfo/wikidata [7] https://discuss-space.wmflabs.org/tags/research -- Martin Gerlach Research Scientist Wikimedia Foundation

2 1

Wikidata WikiProject Newsletter #2
by Bodhisattwa Mandal 01 Jan '20

01 Jan '20

Hi all, Happy New Year! You can find the 2nd issue of the Wikidata WikiProject India quarterly newsletter here <https://www.wikidata.org/wiki/Wikidata:WikiProject_India/Newsletter/2020_01…>, where you can get a quick overview of the Wikidata activities related to India which happened over the last 3 months. To get timely notification in your talk page, you can subscribe here. <https://meta.wikimedia.org/wiki/Global_message_delivery/Targets/Wikidata_In…> Regards, Bodhisattwa WIkidata Advisor, CIS-A2K

1 0

Weekly Summary #396
by Lydia Pintscher 31 Dec '19

31 Dec '19

1 0

[ANN] nomunofu v0.1.0
by Amirouche Boubekki 30 Dec '19

30 Dec '19

I am very pleased to announce the immediate availability of nomunofu. nomunofu is database server written in GNU Guile that is powered by WiredTiger ordered key-value store. It allows to store and query triples. The goal is to make it much easier, definitely faster to query as big as possible tuples of three items. To achieve that goal, the server part of the database is made very simple and it only knows how to do pattern matching. Also, it is possible to swap the storage engine to something that is horizontally scalable and resilient. The client must be smarter, and do as they please to full-fill user requests. Today release only include a minimal Python client. In the future, I plan to extend the Python client to fully support SPARQL 1.1. Preliminary tests over 100 000 and 1 000 000 triples are good looking. Next step is to reach 1 billion triples and eventually 9 billions wikidata triples. You can get the code with the following command: git clone https://github.com/amirouche/nomunofu After the installation of GNU Guix [0], you can do: make init && gunzip test.nt.gz && make index && make web And in another terminal: make query Thanks to Guix, portable binaries for amd64 Ubuntu 18.04 will be made available in a few weeks, along with this, a docker image will be built. The binary release will include wikidata pre-loaded. [0] https://guix.gnu.org/download/ Here is an example ipython session: $ ipython Python 3.7.3 (default, Oct 7 2019, 12:56:13) Type 'copyright', 'credits' or 'license' for more information IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: from nomunofu import Nomunofu In [2]: from nomunofu import var In [3]: nomunofu = Nomunofu('http://localhost:8080'); In [4]: nomunofu.query((var('uid'), " http://www.w3.org/2000/01/rdf-schema#label", "Belgium")) Out[4]: [{'uid': 'http://www.wikidata.org/entity/Q31'}] In [5]: nomunofu.query((var('uid'), " http://www.w3.org/2000/01/rdf-schema#label", "Belgium"), (var('about'), "http:// ...: schema.org/about", var('uid'))) Out[5]: [{'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://www.wikidata.org/wiki/Special:EntityData/Q31'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://it.wikivoyage.org/wiki/Belgio'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://an.wikipedia.org/wiki/Belchica'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://sl.wikipedia.org/wiki/Belgija'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://pfl.wikipedia.org/wiki/Belgien'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://crh.wikipedia.org/wiki/Bel%C3%A7ika'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://fiu-vro.wikipedia.org/wiki/Belgi%C3%A4'}, {'uid': 'http://www.wikidata.org/entity/Q31', 'about': 'https://fr.wikipedia.org/wiki/Belgique'} ... Cheers, Amirouche ~ zig ~ https://hyper.dev

3 9

Concise/Notable Wikidata Dump
by Aidan Hogan 22 Dec '19

22 Dec '19

Hey all, As someone who likes to use Wikidata in their research, and likes to give students projects relating to Wikidata, I am finding it more and more difficult to (recommend to) work with recent versions of Wikidata due to the increasing dump sizes, where even the truthy version now costs considerable time and machine resources to process and handle. In some cases we just grin and bear the costs, while in other cases we apply an ad hoc sampling to be able to play around with the data and try things quickly. More generally, I think the growing data volumes might inadvertently scare people off taking the dumps and using them in their research. One idea we had recently to reduce the data size for a student project while keeping the most notable parts of Wikidata was to only keep claims that involve an item linked to Wikipedia; in other words, if the statement involves a Q item (in the "subject" or "object") not linked to Wikipedia, the statement is removed. I wonder would it be possible for Wikidata to provide such a dump to download (e.g., in RDF) for people who prefer to work with a more concise sub-graph that still maintains the most "notable" parts? While of course one could compute this from the full-dump locally, making such a version available as a dump directly would save clients some resources, potentially encourage more research using/on Wikidata, and having such a version "rubber-stamped" by Wikidata would also help to justify the use of such a dataset for research purposes. ... just an idea I thought I would float out there. Perhaps there is another (better) way to define a concise dump. Best, Aidan

9 12

Re: [Wikidata] Concise/Notable Wikidata Dump Wikidata Digest, Vol 97, Issue 13
by PWN 22 Dec '19

22 Dec '19

Hello all, Regarding the limiting of dumps, I fear it nullifies one of the huge advantages of wikidata, which is to expand structured, referenced data beyond the often too narrow confines of Wikipedia. Women and marginalized communities who are frequently eliminated for lack of “notability” by overzealous or misguided Wikipedia editors risk being accidentally re-eliminated by confining dumps to items with wikilinks. (Remember the female researcher whose Wikipedia page was rejected for “lack of notability” - just before she won a Noble prize?) I think Wikidata dumps should be complete, with a possibility of user-controlled selection by topic or period or other query, but not by what amounts to a kind of a “hidden” filter of approval by a Wikipedia editor somewhere outside of Wikidata in a widely disseminated dump marked, misleadingly, as “notable”. Selection is very powerful in the digital world, where people assume (wrongly) that what they see is what exists Sent from my iPad > On Dec 20, 2019, at 13:00, wikidata-request(a)lists.wikimedia.org wrote: > > Send Wikidata mailing list submissions to > wikidata(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata > or, via email, send a message with subject or body 'help' to > wikidata-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikidata-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata digest..." > > > Today's Topics: > > 1. Re: Concise/Notable Wikidata Dump (Aidan Hogan) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 19 Dec 2019 19:15:09 -0300 > From: Aidan Hogan <aidhog(a)gmail.com> > To: wikidata(a)lists.wikimedia.org > Subject: Re: [Wikidata] Concise/Notable Wikidata Dump > Message-ID: <dc03559e-a670-b1dc-c88f-b73b9902fb30(a)gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hey all, > > Just a general response to all the comments thus far. > > - @Marco et al., regarding the WDumper by Benno, this is a very cool > initiative! In fact I spotted it just *after* posting so I think this > goes quite some ways towards addressing the general issue raised. > > - @Markus, I partially disagree regarding the importance of > rubber-stamping a "notable dump" on the Wikidata side. I would see it's > value as being something like the "truthy dump", which I believe has > been widely used in research for working with a concise sub-set of > Wikidata. Perhaps a middle ground is for a sporadic "notable dump" to be > generated by WDumper and published on Zenodo. This may be sufficient in > terms of making the dump available and reusable for research purposes > (or even better than the current dumps, given the permanence you > mention). Also it would reduce costs on the Wikidata side (I don't think > a notable dump would be necessary to generate on a weekly basis, for > example). > > - @Lydia, good point! I was thinking that filtering by wikilinks will > just drop some more obscure nodes (like Q51366847 for example), but had > not considered that there are some more general "concepts" that do not > have a corresponding Wikipedia article. All the same, in a lot of the > research we use Wikidata for, we are not particularly interested in one > thing or another, but more interested in facilitating what other people > are interested in. Examples would be query performance, finding paths, > versioning, finding references, etc. But point taken! Maybe there is a > way to identify "general entities" that do not have wikilinks, but do > have a high degree or centrality, for example? Would a degree-based or > centrality-based filter be possible in something like WDumper (perhaps > it goes beyond the original purpose; certainly it does not seem trivial > in terms of resources used)? Would it be a good idea? > > In summary, I like the idea of using WDumper to sporadically generate -- > and publish on Zenodo -- a "notable version" of Wikidata filtered by > sitelinks (perhaps also allowing other high-degree or high-PageRank > nodes to pass the filter). At least I know I would use such a dump. > > Best, > Aidan > >> On 2019-12-19 6:46, Lydia Pintscher wrote: >>> On Tue, Dec 17, 2019 at 7:16 PM Aidan Hogan <aidhog(a)gmail.com> wrote: >>> >>> Hey all, >>> >>> As someone who likes to use Wikidata in their research, and likes to >>> give students projects relating to Wikidata, I am finding it more and >>> more difficult to (recommend to) work with recent versions of Wikidata >>> due to the increasing dump sizes, where even the truthy version now >>> costs considerable time and machine resources to process and handle. In >>> some cases we just grin and bear the costs, while in other cases we >>> apply an ad hoc sampling to be able to play around with the data and try >>> things quickly. >>> >>> More generally, I think the growing data volumes might inadvertently >>> scare people off taking the dumps and using them in their research. >>> >>> One idea we had recently to reduce the data size for a student project >>> while keeping the most notable parts of Wikidata was to only keep claims >>> that involve an item linked to Wikipedia; in other words, if the >>> statement involves a Q item (in the "subject" or "object") not linked to >>> Wikipedia, the statement is removed. >>> >>> I wonder would it be possible for Wikidata to provide such a dump to >>> download (e.g., in RDF) for people who prefer to work with a more >>> concise sub-graph that still maintains the most "notable" parts? While >>> of course one could compute this from the full-dump locally, making such >>> a version available as a dump directly would save clients some >>> resources, potentially encourage more research using/on Wikidata, and >>> having such a version "rubber-stamped" by Wikidata would also help to >>> justify the use of such a dataset for research purposes. >>> >>> ... just an idea I thought I would float out there. Perhaps there is >>> another (better) way to define a concise dump. >>> >>> Best, >>> Aidan >> >> Hi Aiden, >> >> That the dumps are becoming too big is an issue I've heard a number of >> times now. It's something we need to tackle. My biggest issue is >> deciding how to slice and dice it though in a way that works for many >> use cases. We have https://phabricator.wikimedia.org/T46581 to >> brainstorm about that and figure it out. Input from several people >> very welcome. I also added a link to Benno's tool there. >> As for the specific suggestion: I fear relying on the existence of >> sitelinks will kick out a lot of important things you would care about >> like professions so I'm not sure that's a good thing to offer >> officially for a larger audience. >> >> >> Cheers >> Lydia >> > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > ------------------------------ > > End of Wikidata Digest, Vol 97, Issue 13 > ****************************************

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata December 2019