Hi Platonides,
Good point, thanks!
2017-08-31 19:40 GMT+02:00 Platonides <platonides@gmail.com>:
> On Thu, Aug 31, 2017 at 3:10 PM, Jérémie Roquet <jroquet@arkanosis.net>
> wrote:
>>
>> PS : what could be incredibly useful to dive into articles histories
>> would be to import them in git², as it would allow the user to see
>> diffs between revisions the way you see them online, to look for when
>> a given sentence has been added / removed, etc. There are some very
>> user-friendly tools to present the histories to non-technical users
>> once the import has been made.
>
> Not as much as you think. I did that once, but the results were worse than
> expected. git (and other scms) diffing is line-based. You have many
> relatively-independent lines of code, and diff based on that. Whereas on
> wikipedia articles, each line is a full paragraph, Thus, as soon as someone
> added a sentence (or a word), the full paragraph showed as changed.
Did you try with git's builtin diff UI, or with some other frontend? I
have never tried on Wikimedia dumps (I really should!) but I have to
diff XML files with horribly long lines on a regular basis — which is
something I naively believe to be very close to what diffing Wikimedia
dumps would look like — and diff-so-fancy and vimdiff do wonders with
that. Unfortunately, “user-friendly” GUIs like GitKraken, which I'd
have recommended to non-technical users, appear to handle diffs as
poorly as git builtin UI…
Best regards,
--
Jérémie