Hi all,
On 15.03.21 02:57, Hydriz Scholz wrote:
I also intend to
integrate a "watchlist" feature that can automatically notify users
when new datasets are available.
Not sure, if this is a killer feature for human users, i.e. mailbox
notification. We are using the Wikimedia Dumps since 13 years now for
DBpedia and implemented a download function [1]. However, this is not
running optimal. I think it still uses the links in the HTML page to
find the download URLs.
The way we implemented it is: download (2021-01-01) and then it tries to
download the dumps from the beginning of the month and fails if it don't
find some and you need to re-run later.
Would be nice to have an API to check for availability and define sets.
We are in the progress of open-sourcing
databus.dbpedia.org which is a
registry offering this functionality for any files, i.e. shasums,
downloadUrls, API for querying, machine-readable and actionable
licenses, etc. We will put the wikimedia dumps on the bus eventually.
For me/us, we would value the ability to work with them programatically
over yet another notification, but others might have different opinions.
-- Sebastian
[1]
https://github.com/dbpedia/extraction-framework/blob/a334ac2af877531a082dc9…