Hi Marco,
On October 1, 2019 11:48:02 PM GMT+02:00, Marco Fossati <fossati(a)spaziodati.eu>
wrote:
Hi Denny,
Thanks for publishing your Colab notebook!
I went through it and would like to share my first thoughts here. We
can
then move further discussion somewhere else.
1. in general, how can we compare datasets with totally different time
stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia
dump
is old;
DBpedia made monthly releases for the past three months which will continue to improve and
grow in an agile Manne, we focused on debugging and integration. Max age would be 30 days.
I think that is OK. Denny validated against the live endpoint. This is OK to drive
growth, but not reproducible scientifically compared to dumps.
2. given that all datasets contain Wikipedia links,
perhaps we could
use
them as a bridge for the comparison, instead of Wikidata mappings. I'm
assuming that Freebase and DBpedia entities with Wikidata mappings are
subsets of the whole datasets (but this should be verified);
3. we could use record linkage techniques to connect Wikidata entities
with Freebase and DBpedia ones, then assess the agreement in terms of
statements per entity. There has been some experimental work (different
use case and goal) in the soweego project:
https://soweego.readthedocs.io/en/latest/validator.html
On 10/1/19 1:13 AM, Denny Vrandečić wrote:
Marco, I totally agree with what you said - the
project has stalled,
and
there is plenty of opportunity to harvest more
data from Freebase and
bring it to Wikidata, and this should be
reignited.
Yeah, that would be great.
There is known work to do, but it's hard to sustain such a big project
without allocated resources:
https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R
BTW, there is also version 2 of the Wikidata primary sources tool that
needs love, although I'm now skeptical that it will be an effective way
to achieve the Freebase harvesting.
We should probably rethink the whole thing, and restart small with very
simple use cases, pretty much like the Harvest templates tool you
mentioned:
https://tools.wmflabs.org/pltools/harvesttemplates/
Cheers,
Marco
P.S.: I *might* have found the freshest relevant DBpedia datasets:
https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects
I said *might* because it was really painful to find a download button
and to guess among multiple versions of the same dataset:
https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.0…
@Sebastian may know if it's the good one :-)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.