Hello All ,
i'm planning to write a proposal for WikiData to DBpedia project in GSoC2013
i've found in the change propagation
pag<http://meta.wikimedia.org/wiki/Wikidata/Notes/Change_propagation>e
:
Support for 3rd party clients, that is, client wikis and other consumers
> outside of Wikimedia, is currently not essential and will not be
> implemented for now. It shall however be kept in mind for all design
> decisions.
and i wanted to know two things :
1- what would be the time frame for change propagation to be ready , even a
rough estimation ? could it be ready within 2 or 3 months ?
2- is there any design pattern or a brief outline for the change
propagation design , how it would be ? in order that i could make a rough
plan and estimation about how it could be consumed from the DBpedia side ?
thanks
regards
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>
email : hadyelsahar(a)gmail.com
Phone : +2-01220887311
http://hadyelsahar.me/
<http://www.linkedin.com/in/hadyelsahar>
Apologies for cross-posting!
=======================
NLP & DBpedia Workshop 2013
=======================
Free, open, interoperable and multilingual NLP for DBpedia and DBpedia
for NLP:
http://nlp-dbpedia2013.blogs.aksw.org/
Collocated with the International Semantic Web Conference 2013 (ISWC 2013)
21-22 October 2013, in Sydney, Australia (*Submission deadline July 8th*)
**********************************
Recently, the DBpedia community has experienced an immense increase in
activity and we believe, that the time has come to explore the
connection between DBpedia & Natural Language Processing (NLP) in a yet
unpreceded depth. The goal of this workshop can be summarized by this
(pseudo-) formula:
NLP & DBpedia == DBpedia4NLP && NLP4DBpedia
http://db0.aksw.org/downloads/CodeCogsEqn_bold2.gif
DBpedia has a long-standing tradition to provide useful data as well as
a commitment to reliable Semantic Web technologies and living best
practices. With the rise of WikiData, DBpedia is step-by-step relieved
from the tedious extraction of data from Wikipedia's infoboxes and can
shift its focus on new challenges such as extracting information from
the unstructured article text as well as becoming a testing ground for
multilingual NLP methods.
Contribution
=========
Within the timeframe of this workshop, we hope to mobilize a community
of stakeholders from the Semantic Web area. We envision the workshop to
produce the following items:
* an open call to the DBpedia data consumer community will generate a
wish list of data, which is to be generated from Wikipedia by NLP
methods. This wish list will be broken down to tasks and benchmarks and
a GOLD standard will be created.
* the benchmarks and test data created will be collected and published
under an open license for future evaluation (inspired by OAEI and
UCI-ML). An overview of the benchmarks can be found here:
http://nlp-dbpedia2013.blogs.aksw.org/benchmarks
Please sign up to our mailing list, if you are interested in discussing
guidelines and NLP benchmarking:
http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp-dbpedia-public
Important dates
===========
8 July 2013, Paper Submission Deadline
9 August 2013, Notification of accepted papers sent to authors
Motivation
=======
The central role of Wikipedia (and therefore DBpedia) for the creation
of a Translingual Web has recently been recognized by the Strategic
Research Agenda (cf. section 3.4, page 23) and most of the contributions
of the recently held Dagstuhl seminar on the Multilingual Semantic Web
also stress the role of Wikipedia for Multilingualism. As more and more
language-specific chapters of DBpedia appear (currently 14 language
editions), DBpedia is becoming a driving factor for a Linguistic Linked
Open Data cloud as well as localized LOD clouds with specialized domains
(e.g. the Dutch windmill domain ontology created from
http://nl.dbpedia.org ).
The data contained in Wikipedia and DBpedia have ideal properties for
making them a controlled testbed for NLP. Wikipedia and DBpedia are
multilingual and multi-domain, the communities maintaining these
resource are very open and it is easy to join and contribute. The open
license allows data consumers to benefit from the content and many parts
are collaboratively editable. Especially, the data in DBpedia is widely
used and disseminated throughout the Semantic Web.
NLP4DBpedia
==========
DBpedia has been around for quite a while, infusing the Web of Data with
multi-domain data of decent quality. These triples are, however, mostly
extracted from Wikipedia infoboxes. To unlock the full potential of
Wikipedia articles for DBpedia, the information contained in the
remaining part of the articles needs to be analysed and triplified.
Here, the NLP techniques may be of favour.
DBpedia4NLP
==========
On the other hand NLP, and information extraction techniques in
particular, involve various resources while processing texts from
various domains. These resources may be used e.g. as an element of a
solution e.g. gazetteer being an important part of a rule created by an
expert or disambiguation resource, or while delivering a solution e.g.
within machine learning approaches. DBpedia easily fits in both of these
roles.
We invite papers from both these areas including:
1. Knowledge extraction from text and HTML documents (especially
unstructured and semi-structured documents) on the Web, using
information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
2. Representation of NLP tool output and NLP resources as RDF/OWL, and
linking the extracted output to the LOD cloud.
3. Novel applications using the extracted knowledge, the Web of Data or
NLP DBpedia-based methods.
The specific topics are listed below.
Topics
=====
- Improving DBpedia with NLP methods
- Finding errors in DBpedia with NLP methods
- Annotation methods for Wikipedia articles
- Cross-lingual data and text mining on Wikipedia
- Pattern and semantic analysis of natural language, reading the Web,
learning by reading
- Large-scale information extraction
- Entity resolution and automatic discovery of Named Entities
- Multilingual entity recognition task of real world entities
- Frequent pattern analysis of entities
- Relationship extraction, slot filling
- Entity linking, Named Entity disambiguation, cross-document
co-reference resolution
- Disambiguation through knowledge base
- Ontology representation of natural language text
- Analysis of ontology models for natural language text
- Learning and refinement of ontologies
- Natural language taxonomies modeled to Semantic Web ontologies
- Use cases for potential data extracted from Wikipedia articles
- Use cases of entity recognition for Linked Data applications
- Impact of entity linking on information retrieval, semantic search
Furthermore, an informal list of NLP tasks can be found on this
Wikipedia page:
http://en.wikipedia.org/wiki/Natural_language_processing#Major_tasks_in_NLP
These are relevant for the workshop as long as they fit into the
DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around
Wikipedia and DBpedia).
Submission formats
==============
Paper submission
-----------------------
All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and
relevance to the workshop. At least one author of each accepted paper is
expected to attend the workshop.
* Full research paper (up to 12 pages)
* Position papers (up to 6 pages)
* Use case descriptions (up to 6 pages)
* Data/benchmark paper (2-6 pages, depending on the size and complexity)
Note: data and benchmarks papers are meant to provide a citable
reference for your data and benchmarks. We kindly require, that you
upload any data you use to our benchmark repository in parallel to the
submission. We recommend to use an open license (e.g. CC-BY), but
minimum requirement is free use. Please write to the mailing list, if
you have any problems.
Full instructions are available at:
http://nlp-dbpedia2013.blogs.aksw.org/submission/
Submission of data and use cases
--------------------------------------------
This workshop also targets non-academic users and developers. If you
have any (open) data (e.g. texts or annotations) that can be used for
benchmarking NLP tools, but do not want or needd to write an academic
paper about it, please feel free to just add it to this table:
http://tinyurl.com/nlp-benchmarks or upload it to our repository:
http://github.com/dbpedia/nlp-dbpedia
Full instructions are available at:
http://nlp-dbpedia2013.blogs.aksw.org/benchmarks/
Also if you have any ideas, use cases or data requests please feel free
to just post them on our mailing list: nlp-dbpedia-public [at]
lists.informatik.uni-leipzig.de or send them directly to the chairs:
nlp-dbpedia2013 [at] easychair.org
Program committee
==============
* Guadalupe Aguado, Universidad Politécnica de Madrid, Spain
* Chris Bizer, Universität Mannheim, Germany
* Volha Bryl, Universität Mannheim, Germany
* Paul Buitelaar, DERI, National University of Ireland, Galway
* Charalampos Bratsas, OKFN, Greece, ???????????? ????????????
????????????, (Aristotle University of Thessaloniki), Greece
* Philipp Cimiano, CITEC, Universität Bielefeld, Germany
* Samhaa R. El-Beltagy, ?????_????? (Nile University), Egypt
* Daniel Gerber, AKSW, Universität Leipzig, Germany
* Jorge Gracia, Universidad Politécnica de Madrid, Spain
* Max Jakob, Neofonie GmbH, Germany
* Anja Jentzsch, Hasso-Plattner-Institut, Potsdam, Germany
* Ali Khalili, AKSW, Universität Leipzig, Germany
* Daniel Kinzler, Wikidata, Germany
* David Lewis, Trinity College Dublin, Ireland
* John McCrae, Universität Bielefeld, Germany
* Uroš Miloševic', Institut Mihajlo Pupin, Serbia
* Roberto Navigli, Sapienza, Università di Roma, Italy
* Axel Ngonga, AKSW, Universität Leipzig, Germany
* Asunción Gómez Pérez, Universidad Politécnica de Madrid, Spain
* Lydia Pintscher, Wikidata, Germany
* Elena Montiel Ponsoda, Universidad Politécnica de Madrid, Spain
* Giuseppe Rizzo, Eurecom, France
* Harald Sack, Hasso-Plattner-Institut, Potsdam, Germany
* Felix Sasaki, Deutsches Forschungszentrum für künstliche Intelligenz,
Germany
* Mladen Stanojevic', Institut Mihajlo Pupin, Serbia
* Hans Uszkoreit, Deutsches Forschungszentrum für künstliche
Intelligenz, Germany
* Rupert Westenthaler, Salzburg Research, Austria
* Feiyu Xu, Deutsches Forschungszentrum für künstliche Intelligenz, Germany
Contact
=====
Of course we would prefer that you will post any questions and comments
regarding NLP and DBpedia to our public mailing list at:
nlp-dbpedia-public [at] lists.informatik.uni-leipzig.de
If you want to contact the chairs of the workshop directly, please write
to:
nlp-dbpedia2013 [at] easychair.org
Kind regards,
Sebastian Hellmann, Agata Filipowska, Caroline Barrière,
Pablo N. Mendes, Dimitris Kontokostas
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org,
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Forwarding this to the discussion list for Wikidata.
Sven
On Sat, Apr 27, 2013 at 10:53 AM, Maarten Dammers <maarten(a)mdammers.nl>wrote:
> Hi everyone,
>
> Now is a good time to start information about our cultural heritage to
> Wikidata. Not everything is possible yet, but we can at least start with
> the simple things that need to be done anyway:
> * Have items for every monument article and list
> * Add claims to every monument article (P31) and list (P360) linking them
> to the article about the local cultural heritage (example: Rijksmonument)
> * Add country (P17) to all lists and articles
>
> Some things that are also possible (but not in all countries yet):
> * Add the identifier
> * Add the type of building
> * Add the administrative unit (state/province/municipality, etc)
> * Image
> * Commons category
>
> To keep track of this, we created https://www.wikidata.org/wiki/**
> Wikidata:Cultural_heritage_**task_force<https://www.wikidata.org/wiki/Wikidata:Cultural_heritage_task_force>. Who wants to help out? You don't need to be a bot or a wikidata wizard,
> just start with looking up the article about cultural heritage in your
> region. You can see several examples in the list.
>
> For bot owners: I'm using a claimit.py . It's in the rewrite branch of
> Pywikipedia. You can use it to mass add claims to Wikidata. For example for
> the lists of Rijksmonumenten:
> claimit.py -catr:Lijsten_van_**rijksmomumenten -namespace:0 P360 Q916333
>
> Maarten
>
>
> ______________________________**_________________
> Wiki Loves Monuments mailing list
> WikiLovesMonuments(a)lists.**wikimedia.org<WikiLovesMonuments(a)lists.wikimedia.org>
> https://lists.wikimedia.org/**mailman/listinfo/**wikilovesmonuments<https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments>
> http://www.wikilovesmonuments.**org <http://www.wikilovesmonuments.org>
On Sun, Apr 28, 2013 at 5:05 AM, Paul Selitskas <p.selitskas(a)gmail.com> wrote:
> How about if I don't want such fallback to work for me? What if I'd
> like to see what is labeled and what is not? Have you considered this
> a user option with a flexible fallback schema or a site-wide
> preference with a fixed one?
>
> In general, this is a very good Wikidata feature yet not implemented.
> And thanks for raising the category redirects once again! :)
>
I guess being able to see what's translated and what's not can be
resolved by appending language names to labels when it's falling back
to another language.
On Sun, Apr 28, 2013 at 5:55 AM, Lukas Benedix
<benedix(a)zedat.fu-berlin.de> wrote:
> Hi,
>
> If I understand your proposal on User:Liangent/wb-lang right you want to
> write an extension or implement the language-fallback in directly wikibase.
>
> I thougt about doing something about the missing-lang-issue by myself and
> think writing a gadget would have a higher possibility to get it deployed on
> wikidata.org. And users who don't like it could easily disable the gadget in
> their preferences.
I guess I prefer to patch Wikibase directly.
>
> You should keep in mind that the userinterface of such a feature is not easy
> to design.
>
So there're two weeks used for collecting feedback about designs in my proposal.
On Sun, Apr 28, 2013 at 10:03 AM, Daniel Friesen
<daniel(a)nadir-seen-fire.com> wrote:
> We already have a way to handle that kind of thing with normal language
> fallbacks. &uselang=qqx.
> Wikidata should be able to do something similar trivially.
>
I don't understand how &uselang=qqx would work for this. Any explanation?
-Liangent
Hello,
I've drafted my proposal about language fallback and conversion issues
for Wikidata at [1].
Currently Wikidata stores multilingual contents. Labels (names,
descriptions etc) are expected to be written in every language, so
every user can read them in their own language. But there're some
problems currently:
* If some content doesn't exist in some specific language, users with
this exact language set in their preferences see something meaningless
(its ID instead). This renders some language with fewer users (thus
fewer labels filled) even unusable.
* There're some similar languages which may often share the same
value. Having strings populated for every language one by one wastes
resources and may allow them out of sync later.
* Even for languages which are not "that similar", MediaWiki already
has some facility to transliterate (aka. convert) contents from its
another sister language (aka. variant) which can be used to provide
better results for users.
This proposal aims at resolving these issues by displaying contents
from another language to users based on user preferences (some users
may know more than one languages), language similarity (language
fallback chain), or the possibility to do transliteration, and allow
proper editing on these contents.
Although Wikidata is in its fast development stage, lots of data have
been added to it. The later we resolve these issues, the more
duplications may be created which will require more clean up work in
the future, like what we had to face before / when the language
converter (that transliteration system) was introduced for the Chinese
Wikipedia. So I'm planning to do this project in this summer.
There's also a backup proposal about category redirects at [2]. I
wrote it because I really want to see it implemented too, either by me
or someone else. Some of its contents may be also useful for other
participants willing to do this project.
Comments are welcome and appreciated.
[1] https://www.mediawiki.org/wiki/User:Liangent/wb-lang
[2] https://www.mediawiki.org/wiki/User:Liangent/cat-redir
-Liangent
Hello,
I've been having discussions about my GSoC 2013 project with the Wikidata
group on the IRC(#mediawiki-wikidata) for a few days and have completed the
first draft of my proposal. I'd really appreciate some feedback on it. I
welcome any queries that you may have and would love to get tips on how to
improve it.
http://www.mediawiki.org/wiki/User:Pragunbhutani/GSoC_2013_Proposal
On Sumanah's suggestion, I've limited the scope of the project to 6 weeks
to allow time for code review and bug fixes. I thought it best to run it by
Wikidata-I before sharing it with Wikitech-I for their opinion.
Many thanks!
--
Pragun Bhutani
http://pragunbhutani.in
Skype : pragun.bhutani
Heya folks :)
Lot's of good stuff happened around Wikidata this week. Your summary
is here: http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_04_26
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Technical Projects
Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
CCing wikidata.
I don't think this is a good approach. We shouldn't be breaking API just
because there is a new under-the-hood feature (wikibase). From the API
client's perspective, it should work as before, plus there should be an
extra flag notifying if the sitelink is stored in wikidata or locally.
Sitelinks might be the first, but not the last change - e.g. categories,
etc.
As for the implementation, it seems the hook approach might not satisfy all
the usage scenarios:
* Given a set of pages (pageset), give all the sitelinks (possibly filtered
with a set of wanted languages). Rendering page for the UI would use this
approach with just one page.
* langbacklinks - get a list of pages linking to a site.
* filtering based on having/not having specific langlink for other modules.
E.g. list all pages that have/don't have a link to a site X.
* alllanglinks (not yet implemented, but might be to match corresponding
allcategories, ...) - list all existing langlinks in the site.
We could debate the need of some of these scenarios, but I feel that we
shouldn't be breaking existing API.
On Thu, Apr 25, 2013 at 2:24 PM, Brad Jorsch <bjorsch(a)wikimedia.org> wrote:
> Language links added by Wikidata are currently stored in the parser
> cache and in the langlinks table in the database, which means they
> work the same as in-page langlinks but also that the page must be
> reparsed if these wikidata langlinks change. The Wikidata team has
> proposed to remove the necessity for the page reparse, at the cost of
> changing the behavior of the API with regard to langlinks.
>
> Gerrit change 59997[1] (still in review) will make the following
> behavioral changes:
> * action=parse will return only the in-page langlinks by default.
> Inclusion of Wikidata langlinks may be requested using a new
> parameter.
> * list=allpages with apfilterlanglinks will only consider in-page
> langlinks.
> * list=langbacklinks will only consider in-page langlinks.
> * prop=langlinks will only list in-page langlinks.
>
> Gerrit change 60034[2] (still in review) will make the following
> behavioral changes:
> * prop=langlinks will have a new parameter to request inclusion of the
> Wikidata langlinks in the result.
>
> A future change, not coded yet, will allow for Wikidata to flag its
> langlinks in various ways. For example, it could indicate which of the
> other-language articles are Featured Articles.
>
> At this time, it seems likely that the first change will make it into
> 1.22wmf3.[3] The timing of the second and third changes are less
> certain.
>
>
> [1]: https://gerrit.wikimedia.org/r/#/c/59997
> [2]: https://gerrit.wikimedia.org/r/#/c/60034
> [3]: https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap
>
> --
> Brad Jorsch
> Software Engineer
> Wikimedia Foundation
>
> _______________________________________________
> Mediawiki-api-announce mailing list
> Mediawiki-api-announce(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce
>
Heya folks :)
The start of phase 2 has just been deployed on all 274 remaining Wikipedias \o/
http://blog.wikimedia.de/2013/04/24/wikidata-all-around-the-world/
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Technical Projects
Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
I am completely amazed by a particularly brilliant way that Wikipedia uses
Wikidata. Instead of simply displaying the data from Wikidata and removing
the local data, a template and workflow is proposed, which...
* grabs the relevant data from Wikidata
* compares it with the data given locally in the Wikipedia
* displays the Wikipedia data
* adds a maintenance category in case the data is different
This allows both communities to check the maintenance category, provide a
security net for vandal changes, still notice if some data has changed,
etc. -- and to phase out the local data over time when they get comfortable
and if they want to. It is a balance of maintenance effort and data quality.
I am not saying that is the right solution in every use case, for every
topic, for every language. But it is a perfect example how the community
will surprise us by coming up with ingenious solutions if they get enough
flexibility, powerful tools, and enough trust.
Yay, Wikipedia!
The workflow is described here:
<
http://en.wikipedia.org/wiki/Template_talk:Commons_category#Edit_request_on…
>
There is an RFC currently going on about whether and how to use Wikidata
data in the English Wikipedia, coming out of the discussion that was here a
few days ago. If you are an English Wikipedian, you might be interested:
<
http://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Wikidata_Phase_2
>
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.