On Thu, Aug 31, 2017 at 7:50 PM, Jérémie Roquet <jroquet(a)arkanosis.net>
wrote:
> Hi Platonides,
>
> 2017-08-31 19:40 GMT+02:00 Platonides <platonides(a)gmail.com>:
> > On Thu, Aug 31, 2017 at 3:10 PM, Jérémie Roquet <jroquet(a)arkanosis.net>
> > wrote:
> >>
> >> PS : what could be incredibly useful to dive into articles histories
> >> would be to import them in git², as it would allow the user to see
> >> diffs between revisions the way you see them online, to look for when
> >> a given sentence has been added / removed, etc. There are some very
> >> user-friendly tools to present the histories to non-technical users
> >> once the import has been made.
> >
> > Not as much as you think. I did that once, but the results were worse
> than
> > expected. git (and other scms) diffing is line-based. You have many
> > relatively-independent lines of code, and diff based on that. Whereas on
> > wikipedia articles, each line is a full paragraph, Thus, as soon as
> someone
> > added a sentence (or a word), the full paragraph showed as changed.
>
> Good point, thanks!
>
> Did you try with git's builtin diff UI, or with some other frontend? I
> have never tried on Wikimedia dumps (I really should!) but I have to
> diff XML files with horribly long lines on a regular basis — which is
> something I naively believe to be very close to what diffing Wikimedia
> dumps would look like — and diff-so-fancy and vimdiff do wonders with
> that. Unfortunately, “user-friendly” GUIs like GitKraken, which I'd
> have recommended to non-technical users, appear to handle diffs as
> poorly as git builtin UI…
>
> Best regards,
>
> --
> Jérémie
>
I think I attempted to use git gui blame, and perhaps git bisect. Not sure
how I finally handled whatever I was looking for. It has been a long time
ago.
You might be able to get better results with some preprocessing, though.
Cheers