On Thu, Aug 31, 2017 at 7:50 PM, Jérémie Roquet <jroquet(a)arkanosis.net>
> Hi Platonides,
> 2017-08-31 19:40 GMT+02:00 Platonides <platonides(a)gmail.com>:
> > On Thu, Aug 31, 2017 at 3:10 PM, Jérémie Roquet <jroquet(a)arkanosis.net>
> > wrote:
> >> PS : what could be incredibly useful to dive into articles histories
> >> would be to import them in git², as it would allow the user to see
> >> diffs between revisions the way you see them online, to look for when
> >> a given sentence has been added / removed, etc. There are some very
> >> user-friendly tools to present the histories to non-technical users
> >> once the import has been made.
> > Not as much as you think. I did that once, but the results were worse
> > expected. git (and other scms) diffing is line-based. You have many
> > relatively-independent lines of code, and diff based on that. Whereas on
> > wikipedia articles, each line is a full paragraph, Thus, as soon as
> > added a sentence (or a word), the full paragraph showed as changed.
> Good point, thanks!
> Did you try with git's builtin diff UI, or with some other frontend? I
> have never tried on Wikimedia dumps (I really should!) but I have to
> diff XML files with horribly long lines on a regular basis — which is
> something I naively believe to be very close to what diffing Wikimedia
> dumps would look like — and diff-so-fancy and vimdiff do wonders with
> that. Unfortunately, “user-friendly” GUIs like GitKraken, which I'd
> have recommended to non-technical users, appear to handle diffs as
> poorly as git builtin UI…
> Best regards,
I think I attempted to use git gui blame, and perhaps git bisect. Not sure
how I finally handled whatever I was looking for. It has been a long time
You might be able to get better results with some preprocessing, though.
My name is Sean Buchanan. I am a professor of Business Administration at the Asper School of Business at the University of Manitoba.
My colleagues and I are trying to collect a certain type of data from Wikipedia and would like some advice on the most efficient and user friendly way of collecting this data.
We are looking to collect data on the difference between revisions over the lifetime of a three Wikipedia pages (see attached screenshot)
We haven't found a way to do that through the channels on the web page and were wondering if you have any ideas on how such data could be collected?
We are interested in the revision history for the following pages:
> Thank you for your help! I look forward to hearing from you.
> Sean Buchanan