Wiki-research-l March 2015

wiki-research-l@lists.wikimedia.org

31 participants
23 discussions

CFP: 2nd Workshop on Linked Data Qquality at ESWC #LDQ2015
by Amrapali J Zaveri 05 Mar '15

05 Mar '15

LDQ 2015 CALL FOR PAPERS 2nd Workshop on Linked Data Quality co-located with ESWC 2015, Portorož, Slovenia June 1, 2015 http://ldq.semanticmultimedia.org/ <http://ldq.semanticmultimedia.org/> /*News flash: Invited talk by Prof.Dr.Felix Naumann on "Brave new data, revisited"*//* */http://ldq.semanticmultimedia.org/program/keynote_felix_naumann <http://ldq.semanticmultimedia.org/program/keynote_felix_naumann> *Important Dates* * Submission of research papers: March 16, 2015 * Notification of paper acceptance: April 9, 2015 * Submission of camera-ready papers: April 30, 2015 Since the start of the Linked Open Data (LOD) Cloud, we have seen an unprecedented volume of structured data published on the web, in most cases as RDF and Linked (Open) Data. The integration across this LOD Cloud, however, is hampered by the ‘publish first, refine later’ philosophy. This is due to various quality problems existing in the published data such as incompleteness, inconsistency, incomprehensibility, etc. These problems affect every application domain, be it scientific (e.g., life science, environment), governmental, or industrial applications. We see linked datasets originating from crowdsourced content like Wikipedia and OpenStreetMap such as DBpedia and LinkedGeoData and also from highly curated sources e.g. from the library domain. Quality is defined as “fitness for use”, thus DBpedia currently can be appropriate for a simple end-user application but could never be used in the medical domain for treatment decisions. However, quality is a key to the success of the data web and a major barrier for further industry adoption. Despite the quality in Linked Data being an essential concept, few efforts are currently available to standardize how data quality tracking and assurance should be implemented. Particularly in Linked Data, ensuring data quality is a challenge as it involves a set of autonomously evolving data sources. Additionally, detecting the quality of datasets available and making the information explicit is yet another challenge. This includes the (semi-)automatic identification of problems. Moreover, none of the current approaches uses the assessment to ultimately improve the quality of the underlying dataset. The goal of the Workshop on Linked Data Quality is to raise the awareness of quality issues in Linked Data and to promote approaches to assess, monitor, maintain and improve Linked Data quality. The workshop*topics*include, but are not limited to: * Concepts * - Quality modeling vocabularies * Quality assessment * - Methodologies * - Frameworks for quality testing and evaluation * - Inconsistency detection * - Tools/Data validators * Quality improvement * - Refinement techniques for Linked Datasets * - Linked Data cleansing * - Error correction * - Tools * Quality of ontologies * Reputation and trustworthiness of web resources * Best practices for Linked Data management * User experience, empirical studies *Submission guidelines* We seek novel technical research papers in the context of Linked Data Quality with a length of up to 8 pages (long) and 4 pages (short) papers. Papers should be submitted in PDF format. Other supplementary formats (e.g. html) are also accepted but a pdf version is required. Paper submissions must be formatted in the style of the Springer Publications format for Lecture Notes in Computer Science (LNCS). Please submit your paper via EasyChair at https://easychair.org/conferences/?conf=ldq2015 <https://easychair.org/conferences/?conf=ldq2015>. Submissions that do not comply with the formatting of LNCS or that exceed the page limit will be rejected without review. We note that the author list does not need to be anonymized, as we do not have a double-blind review process in place. Submissions will be peer reviewed by three independent reviewers. Accepted papers have to be presented at the workshop. *Organizing Committee* * Anisa Rula – University of Milano-Bicocca, IT * Amrapali Zaveri – AKSW, University of Leipzig, DE * Magnus Knuth – Hasso Plattner Institute, University of Potsdam, DE * Dimitris Kontokostas – AKSW, University of Leipzig, DE

1 0

English Wikipedia NLP markup
by Leila Zia 05 Mar '15

05 Mar '15

Hi, If you want to do more NLP research on enwiki and having an NLP markup of Wikipedia is the bottleneck, you should look at the WIKI dataset just released by Chris Re's team at Stanford based on a snapshot of enwiki as of late January 2015. You can find this and other interesting datasets released by the team at http://deepdive.stanford.edu/doc/opendata/ The data format is explained on the top of the page. Making the WIKI dataset required 24K machine hours. The team has access to more machine hours and is actively receiving feedback from the NLP community to generate more datasets. If you're interested about the recent release or have suggestions for the team to generate other datasets based on publicly available data, please contact the team. Best, Leila

1 0

[Release]
by Oliver Keyes 04 Mar '15

04 Mar '15

Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation

10 17

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l March 2015