Hi Scott,
Thank you so much for your reply and offer to help with Parsoid. I used DizzyLogic as an easy parser to get Wikipedia articles' content stripped off the wiki markup. The results were in plain text files. I used it to parse the whole English and Arabic Wikipedia dumps back in January. It was easy to use because my coding knowledge is limited.
I read the link you kindly provided about Parsoid and I think it can help me with parsing. However, I'm not sure how to start on testing this.
Thank you :)
Best,
ReemOn 11 November 2016 at 19:55, C. Scott Ananian <cananian@wikimedia.org> wrote:It was removed from that article recently (19 Oct 2016: https://www.mediawiki.org/w/index.php?title=Alternativ ) with the following comment:e_parsers&type=revision&diff= 2265815&oldid=2247632 "That link has been dead for over a year now as per this stackoverflow comment: http://stackoverflow.com/questions/13546254/whats-a-fast- "way-to-parse-a-wikipedia-xml- dump-for-article-content-and- populate If you'd like to explain what you would have used DizzyLogic for, I'd love to help you figure out how to use Parsoid to accomplish your goals. It's an officially-supported WMF parser which has much better correctness that any 'alternative' parser out there, implements a friendly API similar to mwparserfromhell (see https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi ), and has a well-documented AST (https://www.mediawiki.org/wik i/Specs/HTML/1.2.1 ) which can be directly fetched via the REST api (cf https://en.wikipedia.org/api/ ). I believe dumps have also been planned, but I'm not sure what the current status is.--scottOn Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <reemalkashif@gmail.com> wrote:Hi Pine,
Thank you for your reply. It is an alternative parser. I believe I first saw on MediaWiki (here).
Best,
ReemOn 11 November 2016 at 09:47, Pine W <wiki.pine@gmail.com> wrote:Was this something on Labs? If so, it might have been purged during one of the Labs cleanups.PineOn Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <reemalkashif@gmail.com> wrote:______________________________Hi,
I'm just wondering if anybody knows what happened to DizzyLogic wiki parser? The website and program vanished. I used it in January 2016 so I know it was there at this time.
Best,
Reem--Kind regards,
Reem Al-Kashif_________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--Kind regards,
Reem Al-Kashif
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--Kind regards,
Reem Al-Kashif