Dear all,
I thank you for your efforts. I invite you to our session about Wikidata and Health that will occur on 10.30 in Tu's Room. It will be an honour to see your opinions about this important area.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
Hi!
As part of our Wikidata Query Service setup, we maintain the namespace
serving DCAT-AP (DCAT Application Profile) data[1]. (If you don't know
what I'm talking about you can safely ignore the rest of the message).
Recent check showed that this namespace is virtually unused - over the
last two months, only 3 query per month were served from that namespace,
and all of them coming from WMF servers (not sure whether it's a tool or
somebody querying manually, did not dig further).
So I wonder if it makes sense to continue maintaining this namespace?
While it does not require very significant effort - it's mostly
automated - it does need occasional attention when maintenance is
performed, and some scripts and configurations become slightly more
complex because of it. No big deal if somebody is using it, that's what
the service is for, but if it is completely unused, no point is spending
even minimal effort on it, at least on main production servers (of
course, it'd be possible to set up a simple SPARQL server in labs with
the same data).
In any case, RDF dcatap data will be available in
https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf, no change
is planned there, but if the namespace is phased out, the data could no
longer be queried using WDQS. One could still download it and, since
it's a very small dataset, use any tool that can read RDF to parse it
and work with it.
I'd like to hear from anybody interested in this whether they are using
this namespace or plan to use it and what for. Please either answer here
or even better in the task[2] on Phabricator.
[1]
https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#DCAT-AP
[2] https://phabricator.wikimedia.org/T228297
--
Stas Malyshev
smalyshev(a)wikimedia.org
Dear all,
we would like to share consolidated updates for the GlobalFactSync (GFS)
project with you (copied from
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE/News)
We polished everything for our presentation at Wikimania tomorrow:
https://wikimania.wikimedia.org/wiki/2019:Technology_outreach_%26_innovatio…
All feedback welcome!
-- Sebastian (with the team: Tina, Włodzimierz, Krzysztof, Johannes and
Marvin)
User Script, Data Browser, Reference web service (15. August 2019)
After the Kick-Off note end of July
<https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE/New…>,
which described our first edit and the concept better, we shaped the
technical microservices and data into more concise tools that are easier
to use and demo during our Wikimania presentation
<https://wikimania.wikimedia.org/wiki/2019:Technology_outreach_%26_innovatio…>:
1. User Script <https://en.wikipedia.org/wiki/User_scripts> available
at User:JohannesFre/global.js
<https://meta.wikimedia.org/wiki/User:JohannesFre/global.js> shows
links from each article and Wikidata to the Data Browser and
Reference Web Service
<https://meta.wikimedia.org/wiki/User:JohannesFre/global.js>
1.
User Script Linking to the GFS Data Browser
2. GFS Data Browser <https://global.dbpedia.org/> Github
<https://github.com/dbpedia/gfs> now accepts any URI in subject from
Wikipedia, DBpedia or Wikidata, see the Boys Don't Cry example from
Kick-Off Note
<https://global.dbpedia.org/?s=https%3A%2F%2Fglobal.dbpedia.org%2Fid%2F2nrbo…>,
Berlin/Geo-coords lat
<https://global.dbpedia.org/?s=https%3A%2F%2Fglobal.dbpedia.org%2Fid%2F4pafr…>
long
<https://global.dbpedia.org/?s=https%3A%2F%2Fglobal.dbpedia.org%2Fid%2F4pafr…>,
Albert Einstein's Religion
<https://global.dbpedia.org/?s=https%3A%2F%2Fglobal.dbpedia.org%2Fid%2F55LmB…>.
*Not Live yet, edits/fixes are not reflected*
3. Reference Web Service (Albert Einstein:
http://dbpedia.informatik.uni-leipzig.de:8111/infobox/references?article=ht…)
extracts (1) all references from a Wikipedia page, (2) matched to
the infobox parameter and (3) also extracts the fact from it. The
service will remain stable, so you can use it.
Furthermore, we are designing a friendly fork of HarvestTemplates
<https://github.com/Pascalco/harvesttemplates> to effectively import all
that data into Wikidata.
Kick-off note (25. Juli 2019)
*GlobalFactSync - Synchronizing Wikidata and Wikipedia's infoboxes*
How is data edited in Wikipedia/Wikidata? Where does it come from? And
how can we synchronize it globally?
The GlobalFactSync
<https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE>
(GFS) Project — funded by the Wikimedia Foundation — started in June
2019 and has two goals:
* Answer the above-mentioned three questions.
* Build an information system to synchronize facts between all
Wikipedia language-editions and Wikidata.
Now we are seven weeks into the project (10+ more months to go) and we
are releasing our first prototypes to gather feedback.
/How – Synchronization vs Consensus/
We follow an absolute *Human(s)-in-the-loop* approach when we talk about
synchronization. The final decision whether to synchronize a value or
not should rest with a human editor who understands consensus and the
implications. There will be no automatic imports. Our focus is to
drastically reduce the time to research all references for individual
facts.
A trivial example is the release date of the single “Boys Don’t Cry”
(March 16th, 1989) in the English
<https://en.wikipedia.org/wiki/Boys_Don%27t_Cry_(Moulin_Rouge_song)>,
Japanese
<https://ja.wikipedia.org/wiki/%E6%B6%99%E3%82%92%E3%81%BF%E3%81%9B%E3%81%AA…'t_Cry%E3%80%9C>,
and French
<https://fr.wikipedia.org/wiki/Namida_wo_Misenaide_(Boys_Don%27t_Cry)>
Wikipedia, Wikidata <https://www.wikidata.org/wiki/Q3020026#P577> and
finally in the external open database MusicBrainz
<https://musicbrainz.org/artist/e57182dc-2693-46fc-a739-a81c734a4326>. A
human editor might need 15-30 minutes finding and opening all different
sources, while our current prototype can spot differences and display
them in 5 seconds.
We already had our first successful edit where a Wikipedia editor fixed
the discrepancy with our prototype: “I’ve updated Wikidata so that all
five sources are in agreement.” We are now working on the following tasks:
* Scaling the system to all infoboxes, Wikidata and selected external
databases (see below on the difficulties there)
* Making the system:
o “live” without stale information
o “reliable” with less technical errors when extracting and
indexing data
o “better referenced” by not only synchronizing facts but also
references
/Contributions and Feedback/
To ensure that GlobalFactSync will serve and help the Wikiverse we
encourage everyone to try our data and micro-services and leave us some
feedback, either on our Meta-Wiki page
<https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncRE>
or via gfs(a)infai.org <mailto:gfs@infai.org>. In the following 10+
months, we intend to improve and build upon these initial results. At
the same time, these microservices are available to every developer to
exploit it and hack useful applications. The most promising
contributions will be rewarded and receive the book “Engineering Agile
Big-Data Systems”. Please post feedback or any tool or GUI here. In case
you need changes to be made to the API, please let us know, too. For the
ambitious future developers among you, we have some budget left that we
will dedicate to an internship. In order to apply, just mention it in
your feedback post.
Finally, to talk to us and other GlobalfactSync-Users you may want to
visit WikidataCon and Wikimania, where we will present the latest
developments and the progress of our project.
/Data, APIs & Microservices (Technical prototypes)/
Data Processing and Infobox Extraction:
For GlobalFactSync we use data from Wikipedia infoboxes of different
languages, as well as Wikidata, and DBpedia and fuse them to receive one
big, consolidated dataset – a PreFusion dataset
<https://databus.dbpedia.org/dbpedia/prefusion> (in JSON-LD). More
information on the fusion process, which is the engine behind GFS, can
be found in the FlexiFusion paper
<https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf>. One of
our next steps is to integrate MusicBrainz into this process as an
external dataset. We hope to implement even more such external datasets
to increase the amount of available information and references.
*First microservices:*
We deployed a set of microservices to show the current state of our
toolchain.
* [Initial User Interface] The GFS Data Browser is our GlobalFactSync
UI prototype (available at http://global.dbpedia.org) which shows
all extracted information available for one entity for different
sources. It can be used to analyze the factual consensus between
different Wikipedia articles for the same thing. Example: Look at
the variety of population counts for Grimma
<https://global.dbpedia.org/?s=https%3A%2F%2Fglobal.dbpedia.org%2Fid%2F9QwA&…>.
* [PreFusion JSON API] While the UI allows simple, fast and easy
browsing for one entity at a time, we also provide raw access to the
underlying data (PreFusion dump). The query UI
(http://global.dbpedia.org:8990 (user: read, pw: gfs) can be
utilized to run simple analytical queries. Thus, we can determine
the number of locations having at least one population value
<http://global.dbpedia.org:8990/db/prefusion/provenance?query=%7B%0D%0A++++%…>
(1,194,007) but can also focus on examples with data quality
problems (e.g. one of the 4,268 locations with more than 10
population values
<http://global.dbpedia.org:8990/db/prefusion/provenance?query=%7B%0D%0A++++%…>).
Moreover, documentation about the PreFusion dataset and the download
link for the data are available on the Databus website
<https://databus.dbpedia.org/dbpedia/prefusion>.
* [Reference Data Download] We ran the Reference Extraction Service
over 10 Wikipedia languages. Download dumps here
<http://dbpedia.informatik.uni-leipzig.de/repo/lewoniewski/gfs/infobox-refs/…>.
* [Reference Extraction Service] Good references are crucial for an
import of facts from Wikipedia to Wikidata. We are currently working
with colleagues from Poznań University of Economics and Business on
reference extraction for facts from Wikipedia. A current development
reference extraction microservice
<http://dbpedia.informatik.uni-leipzig.de:8111/infobox/references?article=ht…>
shows all references and the location where they were spotted in the
Infobox – ad hoc – for a given article:
http://dbpedia.informatik.uni-leipzig.de:8111/infobox/references?article=ht…
( ‘&format=tsv’ also available)
* [Infobox Extraction Service] A similar ad hoc extraction of factual
information from infoboxes and other Wikipedia article information
is available here. This microservice displays information which can
be extracted with the help of DBpedia mappings from an infobox e.g.
from the German Facebook Wikipedia article:
http://dbpedia.informatik.uni-leipzig.de:9998/server/extraction/en/extract?….
See here for more options:
http://dbpedia.informatik.uni-leipzig.de:9999/server/extraction/.
* [ID service] Last but not least, we offer the Global ID Resolution
Service
<https://global.dbpedia.org/same-thing/lookup/?uri=http://dbpedia.org/resour…>.
It ties together all available identifiers for one thing (i.e. at
the moment all DBpedia/Wikipedia and Wikidata identifiers –
MusicBrainz coming soon…) and shows their stable DBpedia Global ID.
/Finding sync targets/
In order to test out our algorithms, we started by looking at various
groups of subjects, our so-called sync targets. Based on the different
subjects a set of problems were identified with varying layers of
complexity:
* identity check/check for ambiguity — Are we talking about the same
entity?
* fixed vs. varying property — Some properties vary depending on
nationality (e.g., release dates), or point in time (e.g.,
population count).
* reference — Depending on the entity’s identity check and the
property’s fixed or varying state the reference might vary. Also,
for some targets, no query-able online reference might be available.
* normalization/conversion of values — Depending on
language/nationality of the article properties can have varying
units (e.g., currency, metric vs imperial system).
The check for ambiguity is the most crucial step to ensure that the
infoboxes that are being compared do refer to the same entity. We found,
instances where the Wikipedia page and the infobox shown on that page
were presenting information about different subjects (e.g., see here
<https://en.wikipedia.org/wiki/Boys_Don%27t_Cry_(Moulin_Rouge_song)>).
/Examples/
As a good sync target to start with the group ‘NBA players’ was
identified. There are no ambiguity issues, it is a clearly defined group
of persons, and the amount of varying properties is very limited.
Information seems to be derived from mainly two web sites (nba.com and
basketball-reference.com) and normalization is only a minor issue.
‘Video games’ also proved to be an easy sync target, with the main
problem being varying properties such as different release dates for
different platforms (Microsoft Windows, Linux, MacOS X, XBox) and
different regions (NA vs EU).
More difficult topics, such as ‘cars’, ’music albums’, and ‘music
singles’ showed more potential for ambiguity as well as property
variability. A major concern we found was Wikipedia pages that contain
multiple infoboxes (often seen for pages referring to a certain type of
car, such as this one <https://en.wikipedia.org/wiki/Volkswagen_Polo>).
Reference and fact extraction can be done for each infobox, but
currently, we run into trouble once we fuse this data.
Further information about sync targets and their challenges can be found
on our Meta-Wiki discussion page
<https://meta.wikimedia.org/wiki/Grants_talk:Project/DBpedia/GlobalFactSyncR…>,
where Wikipedians that deal with infoboxes on a regular basis can also
share their insights on the matter. Some issues were also found
regarding the mapping of properties. In order to make GlobalFactSync as
applicable as possible, we rely on the DBpedia community to help us
improve the mappings. If you are interested in participating, we will
connect with you at http://mappings.dbpedia.org and in the DBpedia forum
<https://forum.dbpedia.org/>.
Bottomline – We value your feedback!
Hello:
What needs to happen so someone can use {{cite Q|Q...}} in
languages other than English, especially in es.wikipedia.org?
I translated the Wikipedia article on Julia Cagé from French into
English. I started to do it into Spanish but stopped, because I could
not figure out how to use a Wikicite in Spanish.
Thanks,
Spencer Graves
aka DavidMCEddy
p.s. I'm currently at Wikimania. I currently have German mobile phone
number 004915739830186. After August 20, I'll have US cell number
408-655-4567.
-------- Forwarded Message --------
Subject: Julia Cagé in spanish wikipedia
Date: Wed, 14 Aug 2019 14:12:14 +0200
From: Adrián Estévez Iglesias <adrian.estevez.iglesias(a)gmail.com>
To: spencer.graves(a)effectivedefense.org
This <https://es.wikipedia.org/wiki/Usuario_discusi%C3%B3n:Alelapenya>
is the user-talk page of User:Alelapenya, who deleted Julia Cagé's
article <https://es.wikipedia.org/wiki/Julia_Cag%C3%A9> in spanish language.
Write him, and ask him if he can retrieve the text.
Cheers
{{Q|12068060}},
During the Independence Day (India) 2019, a Wikidata-thon is planned on
15-22 August 2019. The objective is to improve India-related Wikidata items
(a task list has been created, please check the task list on the event
page).
Please see the event page here:
https://www.wikidata.org/wiki/Wikidata:WikiProject_India/Events/Indian_Inde…
If you are interested please join as a participant.
Thanks
Tito Dutta
Note: If I don't reply to your email in 2 days, please feel free to remind
me over email or phone call.
Hello Wikidata,
Sorry in advance if I am using wrong mail. I need Wikidata ontology in XML
form. can you please tell me that from which link I can download it. Thanks
in advance.
Dear Ms.,
I thank you for your answer. Wikidata is organized in RDF Format. However, what Mr. Bob DuCharme has just said is that there is a lack of organization of the information provided by Wikidata. In fact, several Wikidata items and statements are missing or not absolutely accurate.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
-------- Message d'origine --------
De : Marijane White <whimar(a)ohsu.edu>
Date : 2019/08/10 18:18 (GMT+01:00)
À : Discussion list for the Wikidata project <wikidata(a)lists.wikimedia.org>
Objet : Re: [Wikidata] Ontology in XML
Perhaps someone can correct me if I am wrong, but I am under the impression that such a thing doesn’t exist and that Wikidata’s models are intentionally not documented as an ontology. I gathered this understanding from Bob DuCharme’s blog post about extracting RDF models from Wikidata with SPARQL queries: http://www.bobdc.com/blog/extracting-rdf-data-models-fro/
Marijane White, M.S.L.I.S.
Data Librarian, Assistant Professor
Oregon Health & Science University Library
Phone: 503.494.3484
Email: whimar(a)ohsu.edu<mailto:whimar@ohsu.edu>
ORCiD: https://orcid.org/0000-0001-5059-4132
From: Wikidata <wikidata-bounces(a)lists.wikimedia.org> on behalf of Manzoor Ali <manzoorali29(a)gmail.com>
Reply-To: Discussion list for the Wikidata project <wikidata(a)lists.wikimedia.org>
Date: Saturday, August 10, 2019 at 2:38 AM
To: "wikidata(a)lists.wikimedia.org" <wikidata(a)lists.wikimedia.org>
Subject: [Wikidata] Ontology in XML
Hello Wikidata,
Sorry in advance if I am using wrong mail. I need Wikidata ontology in XML form. can you please tell me that from which link I can download it. Thanks in advance.
Dear Mr.,
I thank you for your answer. The latest XML dump can be found at https://dumps.wikimedia.org/wikidatawiki/20190720/.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
-------- Message d'origine --------
De : Manzoor Ali <manzoorali29(a)gmail.com>
Date : 2019/08/09 16:53 (GMT+01:00)
À : wikidata(a)lists.wikimedia.org
Objet : [Wikidata] Ontology in XML
Hello Wikidata,
Sorry in advance if I am using wrong mail. I need Wikidata ontology in XML form. can you please tell me that from which link I can download it. Thanks in advance.