LDQ 2015 CALL FOR PAPERS
2nd Workshop on Linked Data Quality
co-located with ESWC 2015, Portorož, Slovenia
June 1, 2015
http://ldq.semanticmultimedia.org/ <http://ldq.semanticmultimedia.org/>
/*News flash: Invited talk by Prof.Dr.Felix Naumann on "Brave new data,
revisited"*//*
*/http://ldq.semanticmultimedia.org/program/keynote_felix_naumann
<http://ldq.semanticmultimedia.org/program/keynote_felix_naumann>
*Important Dates*
* Submission of research papers: March 16, 2015
* Notification of paper acceptance: April 9, 2015
* Submission of camera-ready papers: April 30, 2015
Since the start of the Linked Open Data (LOD) Cloud, we have seen an
unprecedented volume of structured data published on the web, in most
cases as RDF and Linked (Open) Data. The integration across this LOD
Cloud, however, is hampered by the ‘publish first, refine later’
philosophy. This is due to various quality problems existing in the
published data such as incompleteness, inconsistency,
incomprehensibility, etc. These problems affect every application
domain, be it scientific (e.g., life science, environment),
governmental, or industrial applications.
We see linked datasets originating from crowdsourced content like
Wikipedia and OpenStreetMap such as DBpedia and LinkedGeoData and also
from highly curated sources e.g. from the library domain. Quality is
defined as “fitness for use”, thus DBpedia currently can be appropriate
for a simple end-user application but could never be used in the medical
domain for treatment decisions. However, quality is a key to the success
of the data web and a major barrier for further industry adoption.
Despite the quality in Linked Data being an essential concept, few
efforts are currently available to standardize how data quality tracking
and assurance should be implemented. Particularly in Linked Data,
ensuring data quality is a challenge as it involves a set of
autonomously evolving data sources. Additionally, detecting the quality
of datasets available and making the information explicit is yet another
challenge. This includes the (semi-)automatic identification of
problems. Moreover, none of the current approaches uses the assessment
to ultimately improve the quality of the underlying dataset.
The goal of the Workshop on Linked Data Quality is to raise the
awareness of quality issues in Linked Data and to promote approaches to
assess, monitor, maintain and improve Linked Data quality.
The workshop*topics*include, but are not limited to:
* Concepts
* - Quality modeling vocabularies
* Quality assessment
* - Methodologies
* - Frameworks for quality testing and evaluation
* - Inconsistency detection
* - Tools/Data validators
* Quality improvement
* - Refinement techniques for Linked Datasets
* - Linked Data cleansing
* - Error correction
* - Tools
* Quality of ontologies
* Reputation and trustworthiness of web resources
* Best practices for Linked Data management
* User experience, empirical studies
*Submission guidelines*
We seek novel technical research papers in the context of Linked Data
Quality with a length of up to 8 pages (long) and 4 pages (short)
papers. Papers should be submitted in PDF format. Other supplementary
formats (e.g. html) are also accepted but a pdf version is required.
Paper submissions must be formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). Please
submit your paper via EasyChair at
https://easychair.org/conferences/?conf=ldq2015
<https://easychair.org/conferences/?conf=ldq2015>. Submissions that do
not comply with the formatting of LNCS or that exceed the page limit
will be rejected without review. We note that the author list does not
need to be anonymized, as we do not have a double-blind review process
in place. Submissions will be peer reviewed by three independent
reviewers. Accepted papers have to be presented at the workshop.
*Organizing Committee*
* Anisa Rula – University of Milano-Bicocca, IT
* Amrapali Zaveri – AKSW, University of Leipzig, DE
* Magnus Knuth – Hasso Plattner Institute, University of Potsdam, DE
* Dimitris Kontokostas – AKSW, University of Leipzig, DE
Hi,
If you want to do more NLP research on enwiki and having an NLP markup
of Wikipedia is the bottleneck, you should look at the WIKI dataset just
released by Chris Re's team at Stanford based on a snapshot of enwiki as of
late January 2015. You can find this and other interesting datasets
released by the team at http://deepdive.stanford.edu/doc/opendata/ The data
format is explained on the top of the page.
Making the WIKI dataset required 24K machine hours. The team has access
to more machine hours and is actively receiving feedback from the NLP
community to generate more datasets. If you're interested about the recent
release or have suggestions for the team to generate other datasets based
on publicly available data, please contact the team.
Best,
Leila
Hey all!
We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
Hope it's useful to people!
--
Oliver Keyes
Research Analyst
Wikimedia Foundation