Hi Everyone,
Is there a mechanism in place for producing and publishing delta-centric dumps for Wikidata?
A delta-centric dump would comprise new triples for relevant Wikipedia pages that can be applied progressively to existing Wikidata instances. For instance, we maintain a Wikidata instance [1][2] that we would like to keep up to data by applying deltas rather than performing wholesale instance reloads etc..
Looking forward to any insights regarding this important matter.
Related Links
[1] https://wikidata.demo.openlinksw.com/fct
[2] https://wikidata.demo.openlinksw.com/sparql
Kingsley Idehen via Wikidata, 25/02/21 19:26:
Is there a mechanism in place for producing and publishing delta-centric dumps for Wikidata?
There's https://phabricator.wikimedia.org/T72246
Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC.
Federico
Hello!
We are working on a new update process for WDQS, based on a stream of changes [1]. While not exactly the solution you are looking for, this might be a building block for differential dumps. For example by aggregating the stream of changes over a period of time.
Note that at this point, the stream of changes that we construct is published to an internal Kafka that isn't exposed to the internet. If there is enough interest, we might be able to expose it in some form.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Kingsley Idehen via Wikidata, 25/02/21 19:26:
Is there a mechanism in place for producing and publishing delta-centric dumps for Wikidata?
There's https://phabricator.wikimedia.org/T72246
Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC.
Federico
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
Hello!
We are working on a new update process for WDQS, based on a stream of changes [1]. While not exactly the solution you are looking for, this might be a building block for differential dumps. For example by aggregating the stream of changes over a period of time.
Note that at this point, the stream of changes that we construct is published to an internal Kafka that isn't exposed to the internet. If there is enough interest, we might be able to expose it in some form.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590 https://phabricator.wikimedia.org/T244590
Hi Guillaume,
I am very interested in exposure right now since we are trying to have an up-to-date mirror of Wikidata.
We can discuss offline if you like.
Kingsley
On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com> wrote:
Kingsley Idehen via Wikidata, 25/02/21 19:26: > Is there a mechanism in place for producing and publishing delta-centric > dumps for Wikidata? There's https://phabricator.wikimedia.org/T72246 <https://phabricator.wikimedia.org/T72246> Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC. Federico _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
-- *Guillaume Lederrey* (he/him) Engineering Manager Wikimedia Foundation https://wikimediafoundation.org/
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi, first time chiming in here, this topic is very relevant for our wikidata usecase as well, having daily or weekly diffs would be very useful.
Cheers, CH
On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata < wikidata@lists.wikimedia.org> wrote:
On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
Hello!
We are working on a new update process for WDQS, based on a stream of changes [1]. While not exactly the solution you are looking for, this might be a building block for differential dumps. For example by aggregating the stream of changes over a period of time.
Note that at this point, the stream of changes that we construct is published to an internal Kafka that isn't exposed to the internet. If there is enough interest, we might be able to expose it in some form.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
Hi Guillaume,
I am very interested in exposure right now since we are trying to have an up-to-date mirror of Wikidata.
We can discuss offline if you like.
Kingsley
On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Kingsley Idehen via Wikidata, 25/02/21 19:26:
Is there a mechanism in place for producing and publishing delta-centric dumps for Wikidata?
There's https://phabricator.wikimedia.org/T72246
Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC.
Federico
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Guillaume Lederrey* (he/him) Engineering Manager Wikimedia Foundation https://wikimediafoundation.org/
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
+1 to this. Incremental dumps (even if just weekly) would be extremely useful.
On Fri, Feb 26, 2021 at 9:58 AM Chris Hokamp chris.hokamp@gmail.com wrote:
Hi, first time chiming in here, this topic is very relevant for our wikidata usecase as well, having daily or weekly diffs would be very useful.
Cheers, CH
On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata < wikidata@lists.wikimedia.org> wrote:
On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
Hello!
We are working on a new update process for WDQS, based on a stream of changes [1]. While not exactly the solution you are looking for, this might be a building block for differential dumps. For example by aggregating the stream of changes over a period of time.
Note that at this point, the stream of changes that we construct is published to an internal Kafka that isn't exposed to the internet. If there is enough interest, we might be able to expose it in some form.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
Hi Guillaume,
I am very interested in exposure right now since we are trying to have an up-to-date mirror of Wikidata.
We can discuss offline if you like.
Kingsley
On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Kingsley Idehen via Wikidata, 25/02/21 19:26:
Is there a mechanism in place for producing and publishing
delta-centric
dumps for Wikidata?
There's https://phabricator.wikimedia.org/T72246
Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC.
Federico
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Guillaume Lederrey* (he/him) Engineering Manager Wikimedia Foundation https://wikimediafoundation.org/
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi all,
Thanks for bringing this topic up. While we’re unfortunately not ready at the moment to expose anything yet, this seems like a good feature to provide in the future based on the interest.
If you’re able to provide a bit more information about your use cases and problems/pain points with the current way things work, it’d be helpful for us in planning a good solution.
Thanks!
—
Mike Pham (he/him) Sr Product Manager, Search Platform Wikimedia Foundation https://wikimediafoundation.org/
On 26February, 2021 at 10:37:16, Samuel Klein (meta.sj@gmail.com) wrote:
+1 to this. Incremental dumps (even if just weekly) would be extremely useful.
On Fri, Feb 26, 2021 at 9:58 AM Chris Hokamp chris.hokamp@gmail.com wrote:
Hi, first time chiming in here, this topic is very relevant for our wikidata usecase as well, having daily or weekly diffs would be very useful.
Cheers, CH
On Fri, Feb 26, 2021 at 2:47 PM Kingsley Idehen via Wikidata < wikidata@lists.wikimedia.org> wrote:
On 2/26/21 3:46 AM, Guillaume Lederrey wrote:
Hello!
We are working on a new update process for WDQS, based on a stream of changes [1]. While not exactly the solution you are looking for, this might be a building block for differential dumps. For example by aggregating the stream of changes over a period of time.
Note that at this point, the stream of changes that we construct is published to an internal Kafka that isn't exposed to the internet. If there is enough interest, we might be able to expose it in some form.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
Hi Guillaume,
I am very interested in exposure right now since we are trying to have an up-to-date mirror of Wikidata.
We can discuss offline if you like.
Kingsley
On Fri, Feb 26, 2021 at 8:49 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Kingsley Idehen via Wikidata, 25/02/21 19:26:
Is there a mechanism in place for producing and publishing
delta-centric
dumps for Wikidata?
There's https://phabricator.wikimedia.org/T72246
Magnus Manske used to maintain some biweekly dumps as part of its WDQ service, IIRC.
Federico
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- *Guillaume Lederrey* (he/him) Engineering Manager Wikimedia Foundation https://wikimediafoundation.org/
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata