Hi Scott,
Thank you so much for your reply and offer to help with Parsoid. I used
DizzyLogic as an easy parser to get Wikipedia articles' content stripped
off the wiki markup. The results were in plain text files. I used it to
parse the whole English and Arabic Wikipedia dumps back in January. It was
easy to use because my coding knowledge is limited.
I read the link you kindly provided about Parsoid and I think it can help
me with parsing. However, I'm not sure how to start on testing this.
Thank you :)
Best,
Reem
On 11 November 2016 at 19:55, C. Scott Ananian <cananian(a)wikimedia.org>
wrote:
It was removed from that article recently (19 Oct
2016:
https://www.mediawiki.org/w/index.php?title=Alternativ
e_parsers&type=revision&diff=2265815&oldid=2247632) with the following
comment:
"That link has been dead for over a year now as per this stackoverflow
comment:
http://stackoverflow.com/questions/13546254/whats-a-fast-
way-to-parse-a-wikipedia-xml-dump-for-article-content-and-populate"
If you'd like to explain what you would have used DizzyLogic for, I'd
love to help you figure out how to use Parsoid to accomplish your goals.
It's an officially-supported WMF parser which has much better correctness
that any 'alternative' parser out there, implements a friendly API similar
to mwparserfromhell (see
https://doc.wikimedia.org
/Parsoid/master/#!/guide/jsapi), and has a well-documented AST (
https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly
fetched via the REST api (cf
https://en.wikipedia.org/api/ ). I believe
dumps have also been planned, but I'm not sure what the current status is.
--scott
On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <reemalkashif(a)gmail.com>
wrote:
Hi Pine,
Thank you for your reply. It is an alternative parser. I believe I first
saw on MediaWiki (here
<http://t.sidekickopen68.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs7gbG1nW4WYnHT8q-c7CVRbxS056dC2Qf1b_0xC02?t=https%3A%2F%2Fwww.mediawiki.org%2Fwiki%2FAlternative_parsers&si=5334612837924864&pi=be9d881d-b222-408c-e571-5331aacb58c8>
).
Best,
Reem
On 11 November 2016 at 09:47, Pine W <wiki.pine(a)gmail.com> wrote:
Was this something on Labs? If so, it might have
been purged during one
of the Labs cleanups.
Pine
On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <reemalkashif(a)gmail.com>
wrote:
> Hi,
>
> I'm just wondering if anybody knows what happened to DizzyLogic wiki
> parser? The website and program vanished. I used it in January 2016 so I
> know it was there at this time.
>
> Best,
> Reem
>
> --
>
> *Kind regards,Reem Al-Kashif*
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Kind regards,Reem Al-Kashif*
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
(
http://cscott.net)
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics