Wiki-research-l July 2019

wiki-research-l@lists.wikimedia.org

23 participants
16 discussions

soweego: link Wikidata to large catalogs
by Marco Fossati 11 Jul '19

11 Jul '19

[You can safely skip this message if you have already seen it in the Wikidata mailing list, and pardon for the spam] Dear all, ----------------------------------------------------------------------- TL;DR: soweego version 1 will be released soon. In the meanwhile, why don't you consider endorsing the next steps? https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Hjfocs/soweego_1.1 ----------------------------------------------------------------------- This is a pre-release notification for early feedback. Does the name *soweego* ring you a bell? It is a machine learning-based pipeline that links Wikidata to large catalogs [1]. It is a close friend of Mix'n'match [2], which mainly caters for small catalogs. The first version is almost done, and will start uploading results soon. Confident links are going to feed Wikidata via a bot [3], while others will get into Mix'n'match for curation. The next short-term steps are detailed in a rapid grant proposal [4], and I would be really grateful if you could consider an endorsement there. The soweego team has also tried its best to address the following community requests: 1. plan a sync mechanism between Wikidata and large catalogs / implement checks against external catalogs to find mismatches in Wikidata; 2. enable users to add links to new catalogs in a reasonable time. So, here is the most valuable contribution you can give to the project right now: understand how to *import a new catalog* [5]. Can't wait for your reactions. Cheers, Marco [1] https://soweego.readthedocs.io/ [2] https://tools.wmflabs.org/mix-n-match/ [3] see past contributions: https://www.wikidata.org/w/index.php?title=Special:Contributions/Soweego_bo… [4] https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Hjfocs/soweego_1.1 [5] https://soweego.readthedocs.io/en/latest/new_catalog.html

1 0

Analytics clients (stat/notebook hosts) and backups of home directories
by Luca Toscano 11 Jul '19

11 Jul '19

Hi everybody, as part of https://phabricator.wikimedia.org/T201165 the Analytics team thought to reach out to everybody to make it clear that all the home directories on the stat/notebook nodes are not backed up periodically. They run on a software RAID configuration spanning multiple disks of course, so we are resilient on a disk failure, but even if unlikely if might happen that a host could loose all its data. Please keep this in mind when working on important projects and/or handling important data that you care about. I just added a warning to https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients. If you have really important data that is too big to backup, keep in mind that you can use your home directory (/user/your-username) on HDFS (that replicates data three times across multiple nodes). Please let us know if you have comments/suggestions/etc.. in the aforementioned task. Thanks in advance! Luca (on behalf of the Analytics team)

2 1

Firewall on stat100x and notebook100x hosts
by Luca Toscano 10 Jul '19

10 Jul '19

TL;DR: In https://phabricator.wikimedia.org/T170826 the Analytics team wants to add base firewall rules to stat100x and notebook100x hosts, that will cause any non-localhost or known traffic to be blocked by default. Please let us know in the task if this is a problem for you. Hi everybody, the Analytics team has always left the stat100x and notebook100x hosts without a set of base firewall rules to avoid impacting any research/test/etc.. activity on those hosts. This choice has a lot of downsides, one of the most problematic ones is that usually environments like the Python venvs can install potentially any package, and if the owner does not pay attention to security upgrades then we may have a security problem if the environment happens to bind to a network port and accept traffic from anywhere. One of the biggest problems was Spark: when somebody launches a shell using Hadoop Yarn (--master yarn), a Driver component is created that needs to bind to a random port to be able to communicate with the workers created on the Hadoop cluster. We assumed that instructing Spark to use a predefined range of random ports was not possible, but in https://phabricator.wikimedia.org/T170826 we discovered that there is a way (that seems to work fine from our tests). The other big use case that we know, Jupyter notebooks, seems to require only localhost traffic flow without restrictions. Please let us know in the task if you have a use case that requires your environment to bind to a network port on stat100x or notebook100x and accept traffic from other hosts. For example, having a python app that binds to port 33000 on stat1007 and listens/accepts traffic from other stat or notebook hosts. If we don't hear anything, we'll start adding base firewall rules to one host at the time during the upcoming weeks, tracking our work on the aforementioned task. Thanks! Luca (on behalf of the Analytics team)

2 3

open source alternatives to dbpedia?
by James Salsman 07 Jul '19

07 Jul '19

Given some of the recent under-performance noticed by toxicity-sniffing tools, I thought I would ask what people here think of http://chat.dbpedia.org Have a look: https://i.imgur.com/jKqRRTw.png Is anyone else working on an open source text chatbot based on Wikidata? I can offer no promises about how much it will teach you about Wikidata or Natural Language Processing, but a good starter task would be e.g. https://github.com/dbpedia/GSoC/issues/11

1 0

New paper - Indigenous knowledge on Wikipedia
by Nathalie Casemajor 06 Jul '19

06 Jul '19

Hello, For those of you who are interested in "small" Wikipedias and Indigenous languages, here's a new academic paper co-signed by yours truly. Published in an open access journal :) Nathalie Casemajor (Seeris) - *Openness, Inclusion and Self-Affirmation: Indigenous knowledge in Open Knowledge Projects <http://peerproduction.net/editsuite/issues/issue-13-open/peer-reviewed-pape…>* This paper is based on an action research project (Greenwood and Levin, 1998) conducted in 2016-2017 in partnership with the Atikamekw Nehirowisiw Nation and Wikimedia Canada. Built into the educational curriculum of a secondary school on the Manawan reserve, the project led to the launch of a Wikipedia encyclopaedia in the Atikamekw Nehirowisiw language. We discuss the results of the project by examining the challenges and opportunities raised in the collaborative process of creating Wikimedia content in the Atikamekw Nehirowisiw language. What are the conditions of inclusion of Indigenous and traditional knowledge in open projects? What are the cultural and political dimensions of empowerment in this relationship between openness and inclusion? How do the processes of inclusion and negotiation of openness affect Indigenous skills and worlding processes? Drawing from media studies, indigenous studies and science and technology studies, we adopt an ecological perspective (Star, 2010) to analyse the complex relationships and interactions between knowledge practices, ecosystems and infrastructures. The material presented in this paper is the result of the group of participants’ collective reflection digested by one Atikamekw Nehirowisiw and two settlers. Each co-writer then brings his/her own expertise and speaks from what he or she knows and has been trained for. Casemajor N., Gentelet K., Coocoo C. (2019), « Openness, Inclusion and Self-Affirmation: Indigenous knowledge in Open Knowledge Projects », *Journal of Peer Production*, no13, pp. 1-20. More info about the Atikamekw Wikipetcia project and the involvement of Wikimedia Canada: https://ca.wikimedia.org/…/Atikamekw_knowledge,_culture_and… <https://ca.wikimedia.org/wiki/Atikamekw_knowledge,_culture_and_language_in_…>

9 15

Need your opinion on Wikipedia’s gender gap
by Bowen Yu 04 Jul '19

04 Jul '19

Hello, I’m with a group of researchers <https://grouplens.org/> working on using Artificial Intelligence (AI) tools to promote gender diversity in Wikipedia contents and thus to close the gender gap <https://en.wikipedia.org/wiki/Gender_bias_on_Wikipedia>. We want to build a recommender system that targets the gender gap in content, while creating personalized article recommendations for editors. To ensure that our tool addresses real community issues, we plan to design the recommender algorithms by incorporating the feedback from stakeholders in the community, such as members of the WikiProject Women in Red, related WikiProjects, and others who are concerned with this issue. We want to understand your concerns and values as we come up with effective algorithmic designs. For more details about our project, please refer to our Wikimedia project meta page <https://meta.wikimedia.org/wiki/Research:Closing_the_Gender_Content_Gap_in_…> . If you are interested or have any thoughts and suggestions, please feel free to reach out to me at bowen-yu(a)umn.edu and we can plan a time to connect. Thanks!

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l July 2019