Dear folks,
Is there a way to compute content similarity between two Wikipedia articles?
For example, I can think of representing each article as a vector of likelihoods over possible topics.
But, I wonder there are other work people have already explored in the past.
Thanks,
Haifeng
Call For Papers
1st International Workshop on Approaches for Making Data Interoperable
(AMAR 2019)
https://events.tib.eu/amar2019/
co-located with SEMANTiCS 2019
September 09 – 12, 2019 Karlsruhe, Germany
------------------------------------------------------------------------------------------------------------------------
Overview
------------------------------------------------------------------------------------------------------------------------
Recently, there has been a rapid growth in the amount of data available on
the Web. Data is produced by different communities working in a wide range
of domains, using several techniques. This way a large volume of data in
different formats and languages is generated. Accessibility of such
heterogeneous and multilingual data becomes an obstacle for reuse due to
the incompatibility of data formats and the language gap. This
incompatibility of data formats impedes the accessibility of data sources
to the right community. For instance, most of open domain question
answering systems are developed to be effective when data is represented in
RDF. They can not operate with data in the very common CSV files or
presented in unstructured formats. Usually, the data they draw from is in
English rendering them unable to answer questions e.g. in Spanish. On the
other hand, NLP applications in Spanish cannot make use of a knowledge
graph in English. Different communities have different requirements in
terms of data representation and modeling. It is crucial to make the data
interoperable to make it accessible for a variety of applications.
------------------------------------------------------------------------------------------------------------------------
Topics of Interest
------------------------------------------------------------------------------------------------------------------------
We invite paper submissions from two communities: (i) data consumers and
(ii) data providers. This includes practitioners, such as data scientists,
that have experience in fitting the data available to their use case;
Semantic Web researchers, that have been investigating data reuse from
heterogeneous data in tools; researchers in the field of data linking and
translation; and other researchers working on the general field of data
integration.
We invite submissions from the following communities:
-
Data Integration
-
Multilingual Data
-
Data Linking
-
Ontology and Knowledge Engineering
We welcome original contributions about all topics related to data
interoperability, including but not limited to:
-
Approaches to convert data between formats, languages, and schema
-
Best practices for processing heterogeneous data
-
Translation of different language data
-
Cross-lingual applications
-
Recommendations for language modeling in linked data
-
Labeling of data with natural language information
-
Datasets for different communities’ data needs
-
Tools reusing different data formats
-
Converting datasets between different formats
-
Applications in different domains, e.g., Life Sciences, Scholarly,
Industry 4.0, Humanities
------------------------------------------------------------------------------------------------------------------------
Author Instructions
------------------------------------------------------------------------------------------------------------------------
Paper submission this workshop will be via EasyChair (
https://easychair.org/conferences/?conf=amar2019). The papers should follow
the Springer LNCS format, and be submitted in PDF on or before July 9, 2019
(midnight Hawaii time).
We accept papers of the following formats:
-
Full research papers (8 - 12 pages)
-
Short research papers (3 - 5 pages)
-
Position papers (6 - 8 pages)
-
Resource papers (8 - 12 pages, including the publication of the dataset)
-
In-Use papers (6 - 8 pages)
Accepted papers will be published as CEUR workshop proceedings. We target
the creation of a special issue including the best papers of the workshop.
------------------------------------------------------------------------------------------------------------------------
Important Dates
------------------------------------------------------------------------------------------------------------------------
Submission: July 9, 2019
Notification: July 30, 2019
Workshop: September 9, 2019
------------------------------------------------------------------------------------------------------------------------
Workshop Organizers
------------------------------------------------------------------------------------------------------------------------
Lucie-Aimée Kaffee, University of Southampton, UK & TIB Leibniz Information
Centre for Science and Technology, Hannover, Germany
Kemele M. Endris, TIB Leibniz Information Centre for Science and Technology
and L3S Research Centre University of Hannover, Germany
Maria-Esther Vidal, TIB Leibniz Information Centre for Science and
Technology and and L3S Research Centre University of Hannover, Germany
Please contact us, if you have any questions.
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
Hi all,
As you maybe aware, Over the last 3 weeks, I've been looking into the
accuracy of active user statistics on English Wikipedia.
I haven't had a chance to upload the final results to
https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed
the gathering of statistics and have attached a .pdf of the results to this
email.
I've found it interesting how there is a sudden drop in the number of
active users although I half expected this and intended to find it although
I want to look deeper.
I'd like too see whether this is down to blocks or just not continuing and
asses whether time requirements or edit requirements have bigger impact.
I look forward to any feedback and help in the research.
The plan for the next stages are as follows:
1. About 10-14 days for people getting this email to respond.
2. Run the new list of queries for about 2-3 week to gather some data to
show
3. Show the data to enwiki users and ask for feedback / help collecting
data
4. Present results in 2-3 months time.
5. Gather wide feedback on results
6. Maybe take action to improve it if we can see what action needs doing
As you will see most of the data is from around 9pm UTC so in future stages
I would appreciate data collection from a larger range of times.
Thanks in advance,
RhinosF1