Wiki-research-l May 2019

wiki-research-l@lists.wikimedia.org

17 participants
13 discussions

by Haifeng Zhang

Dear folks, Is there a way to compute content similarity between two Wikipedia articles? For example, I can think of representing each article as a vector of likelihoods over possible topics. But, I wonder there are other work people have already explored in the past. Thanks, Haifeng

5 years

CfP AMAR 2019: 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019)

by Lucie-Aimée Kaffee

Call For Papers 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019) https://events.tib.eu/amar2019/ co-located with SEMANTiCS 2019 September 09 – 12, 2019 Karlsruhe, Germany ------------------------------------------------------------------------------------------------------------------------ Overview ------------------------------------------------------------------------------------------------------------------------ Recently, there has been a rapid growth in the amount of data available on the Web. Data is produced by different communities working in a wide range of domains, using several techniques. This way a large volume of data in different formats and languages is generated. Accessibility of such heterogeneous and multilingual data becomes an obstacle for reuse due to the incompatibility of data formats and the language gap. This incompatibility of data formats impedes the accessibility of data sources to the right community. For instance, most of open domain question answering systems are developed to be effective when data is represented in RDF. They can not operate with data in the very common CSV files or presented in unstructured formats. Usually, the data they draw from is in English rendering them unable to answer questions e.g. in Spanish. On the other hand, NLP applications in Spanish cannot make use of a knowledge graph in English. Different communities have different requirements in terms of data representation and modeling. It is crucial to make the data interoperable to make it accessible for a variety of applications. ------------------------------------------------------------------------------------------------------------------------ Topics of Interest ------------------------------------------------------------------------------------------------------------------------ We invite paper submissions from two communities: (i) data consumers and (ii) data providers. This includes practitioners, such as data scientists, that have experience in fitting the data available to their use case; Semantic Web researchers, that have been investigating data reuse from heterogeneous data in tools; researchers in the field of data linking and translation; and other researchers working on the general field of data integration. We invite submissions from the following communities: - Data Integration - Multilingual Data - Data Linking - Ontology and Knowledge Engineering We welcome original contributions about all topics related to data interoperability, including but not limited to: - Approaches to convert data between formats, languages, and schema - Best practices for processing heterogeneous data - Translation of different language data - Cross-lingual applications - Recommendations for language modeling in linked data - Labeling of data with natural language information - Datasets for different communities’ data needs - Tools reusing different data formats - Converting datasets between different formats - Applications in different domains, e.g., Life Sciences, Scholarly, Industry 4.0, Humanities ------------------------------------------------------------------------------------------------------------------------ Author Instructions ------------------------------------------------------------------------------------------------------------------------ Paper submission this workshop will be via EasyChair ( https://easychair.org/conferences/?conf=amar2019). The papers should follow the Springer LNCS format, and be submitted in PDF on or before July 9, 2019 (midnight Hawaii time). We accept papers of the following formats: - Full research papers (8 - 12 pages) - Short research papers (3 - 5 pages) - Position papers (6 - 8 pages) - Resource papers (8 - 12 pages, including the publication of the dataset) - In-Use papers (6 - 8 pages) Accepted papers will be published as CEUR workshop proceedings. We target the creation of a special issue including the best papers of the workshop. ------------------------------------------------------------------------------------------------------------------------ Important Dates ------------------------------------------------------------------------------------------------------------------------ Submission: July 9, 2019 Notification: July 30, 2019 Workshop: September 9, 2019 ------------------------------------------------------------------------------------------------------------------------ Workshop Organizers ------------------------------------------------------------------------------------------------------------------------ Lucie-Aimée Kaffee, University of Southampton, UK & TIB Leibniz Information Centre for Science and Technology, Hannover, Germany Kemele M. Endris, TIB Leibniz Information Centre for Science and Technology and L3S Research Centre University of Hannover, Germany Maria-Esther Vidal, TIB Leibniz Information Centre for Science and Technology and and L3S Research Centre University of Hannover, Germany Please contact us, if you have any questions. -- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton -- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton

5 years

Analysis into active user stats

by RhinosF1 Wikipedia

Hi all, As you maybe aware, Over the last 3 weeks, I've been looking into the accuracy of active user statistics on English Wikipedia. I haven't had a chance to upload the final results to https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed the gathering of statistics and have attached a .pdf of the results to this email. I've found it interesting how there is a sudden drop in the number of active users although I half expected this and intended to find it although I want to look deeper. I'd like too see whether this is down to blocks or just not continuing and asses whether time requirements or edit requirements have bigger impact. I look forward to any feedback and help in the research. The plan for the next stages are as follows: 1. About 10-14 days for people getting this email to respond. 2. Run the new list of queries for about 2-3 week to gather some data to show 3. Show the data to enwiki users and ask for feedback / help collecting data 4. Present results in 2-3 months time. 5. Gather wide feedback on results 6. Maybe take action to improve it if we can see what action needs doing As you will see most of the data is from around 9pm UTC so in future stages I would appreciate data collection from a larger range of times. Thanks in advance, RhinosF1

5 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l May 2019