Article revision numbers

List overview All Threads
Download

newer

older

Problem with saving edits

suggestion: replace CAPTCHA with...

Adam Wight

17 Jul 2012 17 Jul '12

12:22 a.m.

Hello comrades, I've run into a challenge too interesting to keep to myself ;) My immediate goal is to prototype an "offline" wikipedia, similar to Kiwix, which allows the end-user to make edits and synchronize them back to a central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With linear revision numbering, I can't imagine a natural representation of the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page protection, and would allow the formation of multiple, independent consensuses.

-Adam Wight

Show replies by date

Platonides

17 Jul 17 Jul

1:10 a.m.

On 17/07/12 00:22, Adam Wight wrote:

...

Hello comrades, I've run into a challenge too interesting to keep to myself ;) My immediate goal is to prototype an "offline" wikipedia, similar to Kiwix, which allows the end-user to make edits and synchronize them back to a central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With linear revision numbering, I can't imagine a natural representation of the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page protection, and would allow the formation of multiple, independent consensuses.

-Adam Wight

Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Adam Wight

1:49 a.m.

On 07/16/2012 04:10 PM, Platonides wrote:

...

On 17/07/12 00:22, Adam Wight wrote:

...
Hello comrades, I've run into a challenge too interesting to keep to myself ;) My immediate goal is to prototype an "offline" wikipedia, similar to Kiwix, which allows the end-user to make edits and synchronize them back to a central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With linear revision numbering, I can't imagine a natural representation of the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page protection, and would allow the formation of multiple, independent consensuses.

-Adam Wight

Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email. -adam

Platonides

2:08 a.m.

On 17/07/12 01:49, Adam Wight wrote:

...

...
Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email. -adam

Not really. The would be lost in favour of the target ones. You keep a list of rev_ids in the source wiki and the ones it gets in the target wiki, adjunting following rev_parent_id to the target wiki numbers. It could be a problem for merges after the first one, but it's good enough for the first version.

The nasty problem I see is how to determine the winner in a version conflict: B / A \ C

B and C both are revisions with common parent A. How do you handle the merge? What revision should be shown in the title?

Derric Atzrott

1:32 p.m.

...

...
Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email. -adam

This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

I wonder could this sort of feature be implemented in the existing Kiwix codebase? That would be ideal I think.

Thank you, Derric Atzrott

Chris McMahon

3:58 p.m.

...

This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

I wonder could this sort of feature be implemented in the existing Kiwix codebase? That would be ideal I think.

Ward is working on it. :) http://wardcunningham.github.com/ https://github.com/WardCunningham/Smallest-Federated-Wiki

Tilman Bayer

5:48 p.m.

On Tue, Jul 17, 2012 at 4:32 AM, Derric Atzrott < datzrott@alizeepathology.com> wrote:

...

...
...
Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email. -adam

This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

I have added this thread to https://en.wikipedia.org/wiki/User:HaeB/Timeline_of_distributed_Wikipedia_pr... .

...

I wonder could this sort of feature be implemented in the existing Kiwix codebase? That would be ideal I think.

Thank you, Derric Atzrott

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB

Lars Aronsson

20 Jul 20 Jul

11:22 p.m.

On 2012-07-17 07:32, Derric Atzrott wrote:

...

This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

It always surprises me when people express enthusiasm for this kind of idea, since my instinct assumption is the exact opposite: that this couldn't possibly be feasible or practical.

Just out of curiosity, how large are the git-managed projects that you have successfully handled this way? Number of files, lines of code, bytes or commits per day? Did you ever run into a software project where a fully decentralized git solution was impractical, e.g. because pulling in the daily updates took more than an hour on your available bandwidth?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Derric Atzrott

23 Jul 23 Jul

1:25 p.m.

...

...
This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

It always surprises me when people express enthusiasm for this kind of idea, since my instinct assumption is the exact opposite: that this couldn't possibly be feasible or practical.

Just out of curiosity, how large are the git-managed projects that you have successfully handled this way? Number of files, lines of code, bytes or commits per day? Did you ever run into a software project where a fully decentralized git solution was impractical, e.g. because pulling in the daily updates took more than an hour on your available bandwidth?

I can't say that I've handled an large git-managed projects this way, but I am to understand that this is the very thing for which git was designed. Given this I would hope that a git like model would be good for decentralized editing.

Thank you, Derric Atzrott

Chad

1:32 p.m.

On Mon, Jul 23, 2012 at 7:25 AM, Derric Atzrott datzrott@alizeepathology.com wrote:

...

...
...
This is all a fantastic idea. Distributing Wikipedia in a fashion similar to git will make it a lot easier to use in areas where Internet connections are not so common.

It always surprises me when people express enthusiasm for this kind of idea, since my instinct assumption is the exact opposite: that this couldn't possibly be feasible or practical.

Just out of curiosity, how large are the git-managed projects that you have successfully handled this way? Number of files, lines of code, bytes or commits per day? Did you ever run into a software project where a fully decentralized git solution was impractical, e.g. because pulling in the daily updates took more than an hour on your available bandwidth?

I can't say that I've handled an large git-managed projects this way, but I am to understand that this is the very thing for which git was designed. Given this I would hope that a git like model would be good for decentralized editing.

It's really not. Things that are (relatively) simple in the database tend to require walking the entire revision tree in Git in order to figure the same data out.

Git is awesome for software development, but trying to use it as an article development tool is really a bad solution in search of a problem. We could've had the same argument years ago and said "why use a database, SVN stores information in a linear history that's useful for articles." Having diverging articles may be cool/ desired, but using Git is not the answer.

-Chad

Derric Atzrott

2:13 p.m.

...

It's really not. Things that are (relatively) simple in the database tend to require walking the entire revision tree in Git in order to figure the same data out.

Git is awesome for software development, but trying to use it as an article development tool is really a bad solution in search of a problem. We could've had the same argument years ago and said "why use a database, SVN stores information in a linear history that's useful for articles." Having diverging articles may be cool/ desired, but using Git is not the answer.

-Chad

Fair enough. I learn something new every day. I definitely think that distributed article editing is a great idea, even if a git-like system is not the answer to it.

Thank you, Derric Atzrott

Adam Wight

6:59 p.m.

...

...
It's really not. Things that are (relatively) simple in the database tend to require walking the entire revision tree in Git in order to figure the same data out.

Git is awesome for software development, but trying to use it as an article development tool is really a bad solution in search of a problem. We could've had the same argument years ago and said "why use a database, SVN stores information in a linear history that's useful for articles." Having diverging articles may be cool/ desired, but using Git is not the answer.

-Chad

Fair enough. I learn something new every day. I definitely think that distributed article editing is a great idea, even if a git-like system is not the answer to it.

Thank you, Derric Atzrott

Git is almost never used in a truly decentralized fashion, so it isn't optimized for that type of use. See git "hub", for example. Actual peer-to-peer is infinitely more scalable ;) because you don't have one poor enterprise Java server getting hit by everyone in the world, instead individuals are distributing the load among themselves.

That would be a difficult model for Wikipedia however, because maintaining an authoritative edition would require centralized cryptography, at the least.

Allowing articles on our central server to diverge temporarily is easily achievable, with very little overhead. In fact, when you consider the savings in revert wars, maybe there is a net gain.

I'm interested in writing a mediawiki extension to allow us to experiment with this idea.

-Adam

Gabriel Wicke

17 Jul 17 Jul

8:06 p.m.

On 07/16/2012 04:49 PM, Adam Wight wrote:

...

Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email.

And that would be the core problem of asynchronous optimistic replication ;) Simple last-write-wins or union (for shopping carts..) strategies are still manageable, but merging textual changes is harder. Manual intervention will often be needed.

The editor rather than some unsuspecting reader should be best equipped to resolve these conflicts, so some degree of synchrony in the 'push' stage might make sense to provide an opportunity for editor-guided merging.

Gabriel

Adam Wight

20 Jul 20 Jul

7:01 p.m.

wicke@wikidev.net:

...

On 07/16/2012 04:49 PM, Adam Wight wrote:

...
Cool! That's a nice solution because it's transparent to the end-user's system. However, if we use the current schema as you're describing, we would have to reconcile rev_id conflicts during the merge. This seems like a nasty problem if the merge is asynchronous, for example a batched changeset sent in email.

And that would be the core problem of asynchronous optimistic replication ;) Simple last-write-wins or union (for shopping carts..) strategies are still manageable, but merging textual changes is harder. Manual intervention will often be needed.

The editor rather than some unsuspecting reader should be best equipped to resolve these conflicts, so some degree of synchrony in the 'push' stage might make sense to provide an opportunity for editor-guided merging.

Gabriel

Although it might be simpler for the original editor to merge their own changes, that's not always what we want. The most flexible arrangement would be to separate the process into three workflows: edit, synchronize, and merge. Different people could perform each stage, or they can be folded together when appropriate.

On protected pages, for example, we specifically want some amount of peer review before deciding to merge. This could be seen as positive feedback also, if each successfully merged change comes with a bit of validation by the community.

Even a simple branching model will offer some delicious low-hanging fruit, for example, editors could "Save Draft" for any article and resume editing later.

-adam

Adam Wight

25 Jul 25 Jul

7:51 p.m.

New subject: branched article history extension [Re: Article revision numbers]

Hi, I've started working on an extension to manage branching history, calling it "Nonlinear". Here's the crude code, https://github.com/adamwight/Nonlinear

Screenshot of the effect on revision history: mediawiki screenshot

On 07/16/2012 04:10 PM, Platonides wrote:

...

On 17/07/12 00:22, Adam Wight wrote:

...
Hello comrades, I've run into a challenge too interesting to keep to myself ;) My immediate goal is to prototype an "offline" wikipedia, similar to Kiwix, which allows the end-user to make edits and synchronize them back to a central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With linear revision numbering, I can't imagine a natural representation of the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page protection, and would allow the formation of multiple, independent consensuses.

-Adam Wight

Actually, the revision table allows for non-linear development (it stores from which version you edited the article). You could even make to "win" a version different than the one with the latest timestamp (by changing page_rev) one. You will need to change the way of viewing history, however, and add a system to keep track of "heads" and "merges". There may be some assumtions accross the codebase about the latest revision being the active one, too.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asheesh Laroia

26 Jul 26 Jul

8:37 a.m.

Excerpts from Adam Wight's message of Mon Jul 16 18:22:22 -0400 2012:

...

Hello comrades, I've run into a challenge too interesting to keep to myself ;) My immediate goal is to prototype an "offline" wikipedia, similar to Kiwix, which allows the end-user to make edits and synchronize them back to a central repository like enwiki.

The catch is, how to insert these changes without edit conflicts? With linear revision numbering, I can't imagine a natural representation of the data, only some kind of ad-hoc sandbox solution.

Extending the article revision numbering to represent a branching history would be the natural way to handle optimistic replication.

Non-linear revisioning might also facilitate simpler models for page protection, and would allow the formation of multiple, independent consensuses.

There is a tool for managing non-linear history in mediawiki data sets. It's actually a combination of git, the version control system, and the MediaWiki API. It's called git-remote-mediawiki.

First, I'll quote its documentation:

<quote> Getting started with Git-Mediawiki

Then, the first operation you should do is cloning the remote mediawiki. To do so, run the command

git clone mediawiki::http://yourwikiadress.com

You can commit your changes locally as usual with the command

git commit </quote>

I've been enjoying it lately, though it has some rough edges. It is under periodic development, and in the near future I plan to make more of a user community around it.

It is probably entirely unwiedly to use on English Wikipedia directly, but it could be adjusted to permit the importing of database dumps, and then let people branch off those.

-- Asheesh.

4530

Age (days ago)

4540

Last active (days ago)

wikitech-l@lists.wikimedia.org

15 comments

10 participants

tags (0)

participants (10)

Adam Wight
Adam Wight
Asheesh Laroia
Chad
Chris McMahon
Derric Atzrott
Gabriel Wicke
Lars Aronsson
Platonides
Tilman Bayer