Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best Physikerwelt (Moritz Schubotz)
This isn't helpful now, but your use case is relevant to something I hope to pursue in the future: comprehensive mediawiki change events, including content. I don't have a great place yet for collecting these use cases, so I added it to Modern Event Platform parent ticket https://phabricator.wikimedia.org/T185233 so I don't forget. :)
On Thu, Jul 1, 2021 at 8:17 AM Physikerwelt wiki@physikerwelt.de wrote:
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On Thu, Jul 1, 2021 at 3:10 PM Andrew Otto otto@wikimedia.org wrote:
This isn't helpful now, but your use case is relevant to something I hope to pursue in the future: comprehensive mediawiki change events, including content. I don't have a great place yet for collecting these use cases, so I added it to Modern Event Platform parent ticket https://phabricator.wikimedia.org/T185233 so I don't forget. :)
I don't think this is the use-case at all. As someone else already pointed out, diffs don't always give you the context and might be unparsable wikitext. So what you can do is either: 1) Send always the full content of the page changed in the stream, along with the diff. This is IMHO extremely wasteful, but it's also easy to implement 2) find a way to analyze the edits and emit specialized event tags that define what has changed. This is the correct way to go forward, IMHO, but it requires much more engineering time.
I don't think there is really a big value in adding the full content of the page to every edit event. I'd rather suggest that people fetch the parsoid HTML from the API, and ensure we do good edge-side caching.
Cheers,
Giuseppe P.S. Please note that I'm only referring to streams offered to tools and in general to the public internet. Internally to the production cluster the use of content in events might (or might not) prove directly useful in some cases.
I’m no expert, but I believe the only way to get a diff via the API is through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any great degree, though, so I’m afraid I can’t help beyond pointing you in that direction.
From: Physikerwelt wiki@physikerwelt.de Sent: July 1, 2021 8:17 AM To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: andre.greiner-petter andre.greiner-petter@zbmath.org; Aaron Halfaker ahalfaker@wikimedia.org Subject: [Wikitech-l] Stream of recent changes diffs
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best Physikerwelt (Moritz Schubotz)
I'm not sure diffs are going to be useful here. For example, this diff https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox&diff=1031413484&oldid=1031413445&diffmode=source ostensibly introduces an error in the math markup, but due to the way I've formatted the wikisource, it's not obvious from the diff that this is within <math>...</math> tags.
You might end up having to do this using the database dumps https://meta.wikimedia.org/wiki/Data_dumps, which is going to entail looking at a lot more data (extreme understatement) than the recent changes stream.
On Jul 1, 2021, at 9:18 AM, Robin Hood RobinHood70@LIVE.CA wrote:
I’m no expert, but I believe the only way to get a diff via the API is throughhttps://www.mediawiki.org/wiki/API:Compare https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any great degree, though, so I’m afraid I can’t help beyond pointing you in that direction.
From: Physikerwelt <wiki@physikerwelt.de mailto:wiki@physikerwelt.de> Sent: July 1, 2021 8:17 AM To: Wikimedia developers <wikitech-l@lists.wikimedia.org mailto:wikitech-l@lists.wikimedia.org> Cc: andre.greiner-petter <andre.greiner-petter@zbmath.org mailto:andre.greiner-petter@zbmath.org>; Aaron Halfaker <ahalfaker@wikimedia.org mailto:ahalfaker@wikimedia.org> Subject: [Wikitech-l] Stream of recent changes diffs
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org mailto:wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org mailto:wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Note on ORES as one of its maintainers: ORES doesn't use recent changes for getting content and scoring edits. It hits the API.
HTH
On Thu, Jul 1, 2021 at 3:18 PM Robin Hood RobinHood70@live.ca wrote:
I’m no expert, but I believe the only way to get a diff via the API is through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any great degree, though, so I’m afraid I can’t help beyond pointing you in that direction.
*From:* Physikerwelt wiki@physikerwelt.de *Sent:* July 1, 2021 8:17 AM *To:* Wikimedia developers wikitech-l@lists.wikimedia.org *Cc:* andre.greiner-petter andre.greiner-petter@zbmath.org; Aaron Halfaker ahalfaker@wikimedia.org *Subject:* [Wikitech-l] Stream of recent changes diffs
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best
Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hi all,
thank you for your feedback.
@Roy this is a good point, I honestly did not think about it before. However, in the back of my mind, I remembered this problem had been solved before. There is a beta feature called visual diffs.
If you enable this feature and navigate to https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox&type=re...
you see the text "math formula changed" this is exactly what I was looking for. Unfortunately, I was not yet able to figure out if there is an API to get the visual diffs. I had expected it to be in RESTbase, but nothing there.
If I can get to that API the problem would be simple enough to start with the implementation.
All the best Moritz
On Thu, Jul 1, 2021 at 3:39 PM Amir Sarabadani ladsgroup@gmail.com wrote:
Note on ORES as one of its maintainers: ORES doesn't use recent changes for getting content and scoring edits. It hits the API.
HTH
On Thu, Jul 1, 2021 at 3:18 PM Robin Hood RobinHood70@live.ca wrote:
I’m no expert, but I believe the only way to get a diff via the API is through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any great degree, though, so I’m afraid I can’t help beyond pointing you in that direction.
From: Physikerwelt wiki@physikerwelt.de Sent: July 1, 2021 8:17 AM To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: andre.greiner-petter andre.greiner-petter@zbmath.org; Aaron Halfaker ahalfaker@wikimedia.org Subject: [Wikitech-l] Stream of recent changes diffs
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best
Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
I'm afraid that the visual differ isn't helpfully set up for this. Its approach is to fetch the parsoid HTML for the revisions to compare, and then generate the comparison output client-side. It's not terribly reusable outside of the VisualEditor context -- all of the describing of changes that it does leans heavily on interrogating VisualEditor's data model for information about the things that changed.
The best I can say about this for your purposes is that using the parsoid HTML *would* relieve you of having to parse wikitext to work out whether the contents of a math tag were what changed. 🤷🏻
If you do want to dig into this further, check out: https://doc.wikimedia.org/VisualEditor/master/js/source/ve.init.mw.DiffLoade...
~David
On Mon, Jul 5, 2021 at 11:48 AM Physikerwelt wiki@physikerwelt.de wrote:
Hi all,
thank you for your feedback.
@Roy this is a good point, I honestly did not think about it before. However, in the back of my mind, I remembered this problem had been solved before. There is a beta feature called visual diffs.
If you enable this feature and navigate to
https://en.wikipedia.org/w/index.php?title=User:RoySmith/sandbox&type=re...
you see the text "math formula changed" this is exactly what I was looking for. Unfortunately, I was not yet able to figure out if there is an API to get the visual diffs. I had expected it to be in RESTbase, but nothing there.
If I can get to that API the problem would be simple enough to start with the implementation.
All the best Moritz
On Thu, Jul 1, 2021 at 3:39 PM Amir Sarabadani ladsgroup@gmail.com wrote:
Note on ORES as one of its maintainers: ORES doesn't use recent changes for getting content and scoring edits.
It hits the API.
HTH
On Thu, Jul 1, 2021 at 3:18 PM Robin Hood RobinHood70@live.ca wrote:
I’m no expert, but I believe the only way to get a diff via the API is
through https://www.mediawiki.org/wiki/API:Compare. I haven’t worked with it to any great degree, though, so I’m afraid I can’t help beyond pointing you in that direction.
From: Physikerwelt wiki@physikerwelt.de Sent: July 1, 2021 8:17 AM To: Wikimedia developers wikitech-l@lists.wikimedia.org Cc: andre.greiner-petter andre.greiner-petter@zbmath.org; Aaron
Halfaker ahalfaker@wikimedia.org
Subject: [Wikitech-l] Stream of recent changes diffs
Dear all,
we have developed a tool that is (in some cases) capable of checking if
formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way
to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get
the diffs from the recent changes stream without additional requests.
All the best
Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On that topic, I'll share some of my experience.
First, parsing wikitext is way more difficult than you probably imagine. People are often tempted to do a poor-man's job of it with regular expressions and the like. Down that path lies madness. Don't go there.
There's only two rational ways I know of to parse wikitext.
Parsoid is one. It's complicated to get your head around, but it is the one true officially supported way.
The other is mwparserfromhell https://github.com/earwig/mwparserfromhell. It has the advantage of being much simpler to use. It has the disadvantage of not getting every possible edge case correct. It also is only usable in Python, which is fine if you're using Python and a problem otherwise.
In either case, once you've got parsed versions of two revisions, you'll then be faced with the problem of diffing them. That's going to be non-trivial.
On Jul 8, 2021, at 7:01 PM, David Lynch dlynch@wikimedia.org wrote:
The best I can say about this for your purposes is that using the parsoid HTML would relieve you of having to parse wikitext to work out whether the contents of a math tag were what changed. 🤷🏻
Parsoid has a linter extension https://www.mediawiki.org/wiki/Help:Extension:Linter which is well suited for something like this and was effectively developed with something like this in mind. It is currently enabled on *all* parses, but in the future, depending on how expensives lints become, we may find alternate ways of running this.
As it turns out, Parsoid's extension API exposes a lintHandler entry point https://www.mediawiki.org/wiki/Parsoid/Extension_API#ExtensionTagHandler_abstract_class for extensions to run their lints. So, of course, you can only support this in the context of Parsoid's parses. The other caveat is that we haven't fully thought through all the details, but if you are interested, this would be a good use case to explore this. But, see Cite's ref-tag's implementation for this. https://github.com/wikimedia/parsoid/blob/20a4384b1f81366ccd72b1153c8a342f46a0318f/src/Ext/Cite/Ref.php#L59-L81 This implementation effectively calls Parsoid's default handlers on the wikitext encountered in the <ref> tag, but, extensions which might deal with their own wikitext might do other processing here without calling the default handler.
This is my recommended approach rather than mess with change streams, diffs, etc.
Subbu.
On 7/1/21 7:16 AM, Physikerwelt wrote:
Dear all,
we have developed a tool that is (in some cases) capable of checking if formulae in <math/>-tags in the context of a wikitext fragment are likely to be correct or not. We would like to test the tool on the recent changes. From
https://www.mediawiki.org/wiki/API:Recent_changes_stream https://www.mediawiki.org/wiki/API:Recent_changes_stream
we can get the stream of recent changes. However, I did not find a way to get the diff (either in HTML or Wikitext) to figure out how the content was changed. The only option I see is to request the revision text manually additionally. This would be a few unnecessary requests since most of the changes do not change <math/>-tags. I assume that others, i.e., ORES
https://www.mediawiki.org/wiki/ORES https://www.mediawiki.org/wiki/ORES,
compute the diffs anyhow and wonder if there is an easier way to get the diffs from the recent changes stream without additional requests.
All the best Physikerwelt (Moritz Schubotz)
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
wikitech-l@lists.wikimedia.org