GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps - time traveling queries - full dumps - The federation of wikibase SHOULD BE possible since it stored in a history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exist the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Maybe I should ask for fund at mediawiki?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports
- how long will this last us. Fundamental - what funding is available to
sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Sounds interesting, is there a github repo?
On Fri, May 3, 2019 at 8:19 PM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exist the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Maybe I should ask for fund at mediawiki?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports
- how long will this last us. Fundamental - what funding is available to
sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Le sam. 4 mai 2019 à 04:00, Yuri Astrakhan yuriastrakhan@gmail.com a écrit :
Sounds interesting, is there a github repo?
Thanks for your interest. This is still a work-in-progress. I made a prototype that demos that the history significance measure allows to do time travelling queries in the v0 branch. Right now, I am working on getting it all together.
https://github.com/awesome-data-distribution/datae/tree/master/docs/SCHEME20...
On Fri, May 3, 2019 at 8:19 PM Amirouche Boubekki <
amirouche.boubekki@gmail.com> wrote:
GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exit the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Request:
- Is it possible to have logs of read queries done against blazegraph with timings? - Is it possible to have logs of write queries done against mysql with timings?
In the best of the worlds, it would be best to have logs to replicate the workload of the producion databases.
Maybe I should ask for fund at mediawiki?
What about this? Any possibility to have my project funded by the foundation somehow?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports - how long will this last us. Fundamental - what funding is available to sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Amirouche,
The version history and time-travel features sound a lot like the "integrated versioning system" of Freebase, circa 2009 when they (Metaweb) presented at WWW. As Freebase's data was transferred to Wikidata, this sounds a little circular; I wonder what advantages datae would offer vis-a-vis Freebase. Disclaimer: this is coming from a Wikidata lurker who just happens to like the Freebase approach to versioning of knowledge graphs, similar to what you have described.
Josh
On Sat, May 4, 2019 at 5:49 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le sam. 4 mai 2019 à 04:00, Yuri Astrakhan yuriastrakhan@gmail.com a écrit :
Sounds interesting, is there a github repo?
Thanks for your interest. This is still a work-in-progress. I made a prototype that demos that the history significance measure allows to do time travelling queries in the v0 branch. Right now, I am working on getting it all together.
https://github.com/awesome-data-distribution/datae/tree/master/docs/SCHEME20...
On Fri, May 3, 2019 at 8:19 PM Amirouche Boubekki <
amirouche.boubekki@gmail.com> wrote:
GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exit the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Request:
- Is it possible to have logs of read queries done against blazegraph
with timings?
- Is it possible to have logs of write queries done against mysql with
timings?
In the best of the worlds, it would be best to have logs to replicate the workload of the producion databases.
Maybe I should ask for fund at mediawiki?
What about this? Any possibility to have my project funded by the foundation somehow?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports - how long will this last us. Fundamental - what funding is available to sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Joshua,
Thanks for your input.
Le jeu. 16 mai 2019 à 17:02, Joshua Shinavier josh@fortytwo.net a écrit :
Hi Amirouche,
The version history and time-travel features sound a lot like the "integrated versioning system" of Freebase, circa 2009 when they (Metaweb) presented at WWW.
Reading through [0] it seems freebase only allowed undo, whereas in datae it will be possible to query full history and undo commits.
[0] https://www.aaai.org/Papers/AAAI/2007/AAAI07-355.pdf
As Freebase's data was transferred to Wikidata, this sounds a little circular; I wonder what advantages datae would offer vis-a-vis Freebase.
Freebase data was transfered to wikidata. What I am looking for is replacing wikibase + blazegraph that is both edition and query would happen against the same database. Making it much easier to setup and maintain.
Disclaimer: this is coming from a Wikidata lurker who just happens to like the Freebase approach to versioning of knowledge graphs, similar to what you have described.
Josh
On Sat, May 4, 2019 at 5:49 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le sam. 4 mai 2019 à 04:00, Yuri Astrakhan yuriastrakhan@gmail.com a écrit :
Sounds interesting, is there a github repo?
Thanks for your interest. This is still a work-in-progress. I made a prototype that demos that the history significance measure allows to do time travelling queries in the v0 branch. Right now, I am working on getting it all together.
https://github.com/awesome-data-distribution/datae/tree/master/docs/SCHEME20...
On Fri, May 3, 2019 at 8:19 PM Amirouche Boubekki <
amirouche.boubekki@gmail.com> wrote:
GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exit the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Request:
- Is it possible to have logs of read queries done against blazegraph
with timings?
- Is it possible to have logs of write queries done against mysql
with timings?
In the best of the worlds, it would be best to have logs to replicate the workload of the producion databases.
Maybe I should ask for fund at mediawiki?
What about this? Any possibility to have my project funded by the foundation somehow?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports - how long will this last us. Fundamental - what funding is available to sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I agree that the paper is pretty short on details, but from my chat with Jamie Taylor way back then, I gathered that one could not only roll back to any point in transaction time, but also query from any point in time. The Datomic https://www.datomic.com/ database, which appeared later, has similar functionality.
Josh
On Thu, May 16, 2019 at 8:19 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Hi Joshua,
Thanks for your input.
Le jeu. 16 mai 2019 à 17:02, Joshua Shinavier josh@fortytwo.net a écrit :
Hi Amirouche,
The version history and time-travel features sound a lot like the "integrated versioning system" of Freebase, circa 2009 when they (Metaweb) presented at WWW.
Reading through [0] it seems freebase only allowed undo, whereas in datae it will be possible to query full history and undo commits.
[0] https://www.aaai.org/Papers/AAAI/2007/AAAI07-355.pdf
As Freebase's data was transferred to Wikidata, this sounds a little circular; I wonder what advantages datae would offer vis-a-vis Freebase.
Freebase data was transfered to wikidata. What I am looking for is replacing wikibase + blazegraph that is both edition and query would happen against the same database. Making it much easier to setup and maintain.
Disclaimer: this is coming from a Wikidata lurker who just happens to like the Freebase approach to versioning of knowledge graphs, similar to what you have described.
Josh
On Sat, May 4, 2019 at 5:49 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le sam. 4 mai 2019 à 04:00, Yuri Astrakhan yuriastrakhan@gmail.com a écrit :
Sounds interesting, is there a github repo?
Thanks for your interest. This is still a work-in-progress. I made a prototype that demos that the history significance measure allows to do time travelling queries in the v0 branch. Right now, I am working on getting it all together.
https://github.com/awesome-data-distribution/datae/tree/master/docs/SCHEME20...
On Fri, May 3, 2019 at 8:19 PM Amirouche Boubekki <
amirouche.boubekki@gmail.com> wrote:
GerardM post triggered my interest to post to the mailing list. As you might know I am working on functional quadstore that is quadstore that keeps around old version of data, like a wiki but in direct-acyclic-graph. It only stores differences between commits. It rely on snapshot of the latest version for fast reads. My ultimate goal is to build somekind of portable knowlege base. That is something like WikiBase + blazegraph but that you spinup on regular machine with the press of button.
Enought brag about me. I wont't reply to all the message of the threads one by one but:
Here is what SHOULD BE possible:
- incremental dumps
- time traveling queries
- full dumps
- The federation of wikibase SHOULD BE possible since it stored in a
history like GIT and git pull git push are planned in the ROADMAP
And online edition of the quadstore.
Access Control List are not designed yet, I except that this should be enforced by the application layer.
I planned start working on Data Management System (something like CKAN) with search featrure. But I would gadly work with wikimedia instead.
Also, given it modeled after git, one can do merge-request like features, ie. exit the massive import that is crippled.
What I would need is logs possibly with timing of queries (read and write) to do benchmarks.
Request:
- Is it possible to have logs of read queries done against
blazegraph with timings?
- Is it possible to have logs of write queries done against mysql
with timings?
In the best of the worlds, it would be best to have logs to replicate the workload of the producion databases.
Maybe I should ask for fund at mediawiki?
What about this? Any possibility to have my project funded by the foundation somehow?
FWIW, I got 2 times faster than blazegraph on microbenchmark.
Hoi, Wikidata grows like mad. This is something we all experience in the really bad response times we are suffering. It is so bad that people are asked what kind of updates they are running because it makes a difference in the lag times there are.
Given that Wikidata is growing like a weed, it follows that there are two issues. Technical - what is the maximum that the current approach supports - how long will this last us. Fundamental - what funding is available to sustain Wikidata.
For the financial guys, growth like Wikidata is experiencing is not something you can reliably forecast. As an organisation we have more money than we need to spend, so there is no credible reason to be stingy.
For the technical guys, consider our growth and plan for at least one year. When the impression exists that the current architecture will not scale beyond two years, start a project to future proof Wikidata.
It will grow and the situation will get worse before it gets better. Thanks, GerardM
PS I know about phabricator tickets, they do not give the answers to the questions we need to address.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Yes, Freebase search supported "as_of_time" in its MQL syntax...(it didn't however during its first 5 months of life if I recall using however, but was added later and loved by the community for helping with abuse mitigation) https://developers.google.com/freebase/v1/search
I created another draft proposal to create a *prototype* to scale wikidata, using the tools I have been building, that goes beyond only scaling WikiData Query Service. The first quarter should be reserved to WDQS.
As you might have seen, the first proposal https://meta.wikimedia.org/wiki/Grants:Project/WDQS_On_FoundationDB is 6 months and in this proposal WDQS should be replaced in 4 months. I take that into account in the last quarter that is supposed to be reserved for bugfixing.
https://meta.wikimedia.org/wiki/Grants:Project/Iamamz3/Prototype_A_Scalable_...
Feedback welcome!
Le jeu. 16 mai 2019 à 19:03, Thad Guidry thadguidry@gmail.com a écrit :
Yes, Freebase search supported "as_of_time" in its MQL syntax...(it didn't however during its first 5 months of life if I recall using however, but was added later and loved by the community for helping with abuse mitigation) https://developers.google.com/freebase/v1/search
Thad https://www.linkedin.com/in/thadguidry/
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata