Wiki-research-l June 2018

wiki-research-l@lists.wikimedia.org

17 participants
16 discussions

by Leila Zia

Hi all, We just added a new formal collaboration with Ramtin Yazdanian from EPFL to develop, design and test models that can help us learn about New Editor Interests. While the applications of these models are numerous, we expect to use them at least in the line of research in addressing Wikipedia contributor diversity gaps. This research is aimed to address the cold start problem in Wikimedia projects, when a user enters the system and you have almost no information about the user and yet you want to engage with the user in the areas they're interested in. Please see project details at https://meta.wikimedia.org/wiki/Research:Voice_and_exit_in_a_voluntary_work… Best, Leila -- Leila Zia Senior Research Scientist, Lead Wikimedia Foundation

5 years, 10 months

Open position - Research Scientist

by Leila Zia

[apologies for cross-posting.] Hi all, The Research team at the Wikimedia Foundation has opened a Research Scientist position. Please review the job description at https://boards.greenhouse.io/wikimedia/jobs/1173279?gh_src=a41847991 , apply if you're interested or share it with colleagues and friends. Best, Leila -- Leila Zia Senior Research Scientist, Lead Wikimedia Foundation

5 years, 10 months

Presentation on UX design, mental models, and behavioral vs. survey data

by Pine W

I thought that this video, published in May 2018, was somewhat interesting and I am sharing it in case others are also interested. The presenter uses a change of design of Wikipedia's front page search box from 2010 (see https://blog.wikimedia.org/2010/06/15/usability-why-did-we-move-the-search-…) as an example, though I would hope that the lesson from this video isn't that it's okay to frequently disrupt the workflows of existing users with design changes regardless of the amount of complaints from existing users. The main points that I drew from this presentation are that interfaces should be intuitive and should have relatively light cognitive load. Those points may sound obvious to experienced UX designers, but may be of interest to people whose areas of expertise are in other domains. I also appreciated that the presenter shared an example of a situation in which people said one thing in surveys but behaved in the opposite way in practice. Here is the link to the video: https://www.youtube.com/watch?v=mxzK4sWfvH8 Regards, Pine ( https://meta.wikimedia.org/wiki/User:Pine )

5 years, 10 months

What percentage of digital assistants cite Wikipedia?

by Stella Yu

Curious, what percentage of digital assistants (Alexa, Siri, Cortana, Google) cite Wikipedia when a person asks a question? Does the current Wikipedia mobile app support voice search? Are there any reports on this? Thanks in advance! Sincere regards, Stella -- Stella Yu | STELLARESULTS | 415 690 7827 "Chronicling heritage brands and legendary people."

5 years, 10 months

EventStreams offset reset - June 5 2018

by Andrew Otto

Hi all! *If you are not an active user of the EventStreams service, you can ignore this email.* We’re in the process of upgrading <https://phabricator.wikimedia.org/T152015> the backend infrastructure that powers the EventStreams service. When we switch EventStreams to the new infrastructure <https://phabricator.wikimedia.org/T185225>, the ‘offsets’ AKA Last-Event-IDs will change. Connected EventStreams SSE clients will reconnect and not be able to automatically consume from the exact position in the stream where they left off. Instead, reconnecting clients will begin consuming from the latest messages in the stream. This means that connected clients will likely miss any messages that occurred during the reconnect period. Hopefully this will be a very small number of messages, as your SSE client should reconnect quickly. This switch is scheduled to happen on June 5 2018, at around 17:30 UTC. Let us know if you have any questions. Thanks! - Andrew Otto Senior Systems Engineer, WMF

5 years, 10 months

Machine-utilizable Crowdsourced Lexicons

by Adam Sobieski

INTRODUCTION Machine-utilizable lexicons can enhance a great number of speech and natural language technologies. Scientists, engineers and technologists – linguists, computational linguists and artificial intelligence researchers – eagerly await the advancement of machine lexicons which include rich, structured metadata and machine-utilizable definitions. Wiktionary, a collaborative project to produce a free-content multilingual dictionary, aims to describe all words of all languages using definitions and descriptions. The Wiktionary project, brought online in 2002, includes 139 spoken languages and American sign language [1]. This letter hopes to inspire exploration into and discussion regarding machine wiktionaries, machine-utilizable crowdsourced lexicons, and services which could exist at https://machine.wiktionary.org/ . LEXICON EDITIONING The premise of editioning is that one version of the resource can be more or less frozen, e.g. a 2018 edition, while wiki editors collaboratively work on a next version, e.g. a 2019 edition. Editioning can provide stability for complex software engineering scenarios utilizing an online resource. Some software engineering teams, however, may choose to utilize fresh dumps or data exports of the freshest edition. SEMANTIC WEB A machine-utilizable lexicon could include a semantic model of its contents and a SPARQL endpoint. MACHINE-UTILIZABLE DEFINITIONS Machine-utilizable definitions, available in a number of knowledge representation formats, can be granular, detailed and nuanced. There exist a large number of use cases for machine-utilizable definitions. One use case is providing natural language processing components with the capabilities to semantically interpret natural language, to utilize automated reasoning to disambiguate lexemes, phrases and sentences in contexts. Some contend that the best output after a natural language processing component processes a portion of natural language is each possible interpretation, perhaps weighted via statistics. In this way, (1) natural language processing components could process ambiguous language, (2) other components, e.g. automated reasoning components, could narrow sets of hypotheses utilizing dialogue contexts, (3) other components, e.g. automated reasoning components, could narrow sets of hypotheses utilizing knowledgebase content, and (4) mixed-initiative dialogue systems could also ask users questions to narrow sets of hypotheses. Such disambiguation and interpretation would utilize machine-utilizable definitions of senses of lexemes. CONJUGATION, DECLENSION AND THE URL-BASED SPECIFICATION OF LEXEMES AND LEXICAL PHRASES A grammatical category [2] is a property of items within the grammar of a language; it has a number of possible values, sometimes called grammemes, which are normally mutually exclusive within a given category. Verb conjugation, for example, may be affected by the grammatical categories of: person, number, gender, tense, aspect, mood, voice, case, possession, definiteness, politeness, causativity, clusivity, interrogativity, transitivity, valency, polarity, telicity, volition, mirativity, evidentiality, animacy, associativity, pluractionality, reciprocity, agreement, polypersonal agreement, incorporation, noun class, noun classifiers, and verb classifiers in some languages [3]. By combining the grammatical categories from each and every language together, we can precisely specify a conjugation or declension. For example, the URL: https://machine.wiktionary.org/wiki/lookup.php?edition=2018&language=en-US&… includes an edition, a language of a lemma, a lemma, a lexical category, and conjugates (with ellipses) the verb in a language-independent manner. We can further specify, via URL query string, the semantic sense of a grammatical element: https://machine.wiktionary.org/wiki/lookup.php?edition=2018&language=en-US&… Specifying a grammatical item fully in a URL query string, as indicated in the previous examples, could result in a redirection to another URL. That is, the URL: https://machine.wiktionary.org/wiki/lookup.php?edition=2018&language=en-US&… could redirect to: https://machine.wiktionary.org/wiki/index.php?edition=2018&id=12345678 or to: https://machine.wiktionary.org/wiki/2018/12345678/ and the URL with a specified semantic sense: https://machine.wiktionary.org/wiki/lookup.php?edition=2018&language=en-US&… could redirect to: https://machine.wiktionary.org/wiki/index.php?edition=2018&id=12345678&sens… or to: https://machine.wiktionary.org/wiki/2018/12345678/4/ The URL https://machine.wiktionary.org/wiki/2018/12345678/ is intended to indicate a conjugation or declension with one or more meanings or senses. The URL https://machine.wiktionary.org/wiki/2018/12345678/4/ is intended to indicate a specific sense or definition of a conjugation or declension. A feature from having URL’s for both conjugations or declensions and for specific meanings or senses is that HTTP request headers can specify languages and content types of the output desired for a particular URL. The provided examples intended to indicate that each complete, language-independent conjugation or declension can have an ID number as opposed to each headword or lemma. Instead of one ID number for all variations of “fly”, there is one ID number for “flew”, another for “have flown”, another for “flying”, and one for each conjugation or declension. Reasons for indexing the conjugations and declensions instead of traditional headwords or lemmas include that, at least for some knowledge representation formats, the formal semantics of the definitions vary per conjugation or declension. CONCLUSION This letter broached machine wiktionaries and some of the services which could exist at https://machine.wiktionary.org/ . It is my hope that this letter indicated a few of the many exciting topics with regard to machine-utilizable crowdsourced lexicons. REFERENCES [1] https://en.wiktionary.org/wiki/Index:All_languages#List_of_languages [2] https://en.wikipedia.org/wiki/Grammatical_category [3] https://en.wikipedia.org/wiki/Grammatical_conjugation [4] https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Request_fields

5 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l June 2018