Hi all,
On 15.03.21 02:57, Hydriz Scholz wrote:
I also intend to integrate a "watchlist" feature that can automatically notify users when new datasets are available.
Not sure, if this is a killer feature for human users, i.e. mailbox notification. We are using the Wikimedia Dumps since 13 years now for DBpedia and implemented a download function [1]. However, this is not running optimal. I think it still uses the links in the HTML page to find the download URLs.
The way we implemented it is: download (2021-01-01) and then it tries to download the dumps from the beginning of the month and fails if it don't find some and you need to re-run later.
Would be nice to have an API to check for availability and define sets. We are in the progress of open-sourcing databus.dbpedia.org which is a registry offering this functionality for any files, i.e. shasums, downloadUrls, API for querying, machine-readable and actionable licenses, etc. We will put the wikimedia dumps on the bus eventually.
For me/us, we would value the ability to work with them programatically over yet another notification, but others might have different opinions.
-- Sebastian
[1] https://github.com/dbpedia/extraction-framework/blob/a334ac2af877531a082dc9a...