Hi Scott,
Thank you so much for your reply and offer to help with Parsoid. I used
DizzyLogic as an easy parser to get Wikipedia articles' content stripped
off the wiki markup. The results were in plain text files. I used it to
parse the whole English and Arabic Wikipedia dumps back in January. It was
easy to use because my coding knowledge is limited.
I read the link you kindly provided about Parsoid and I think it can help
me with parsing. However, I'm not sure how to start on testing this.
Thank you :)
Best,
Reem
On 11 November 2016 at 19:55, C. Scott Ananian <cananian(a)wikimedia.org>
wrote:
It was removed from that article recently (19 Oct
2016:
https://www.mediawiki.org/w/index.php?title=Alternativ
e_parsers&type=revision&diff=2265815&oldid=2247632) with the following
comment:
"That link has been dead for over a year now as per this stackoverflow
comment:
http://stackoverflow.com/questions/13546254/whats-a-fast-way
-to-parse-a-wikipedia-xml-dump-for-article-content-and-populate"
If you'd like to explain what you would have used DizzyLogic for, I'd
love to help you figure out how to use Parsoid to accomplish your goals.
It's an officially-supported WMF parser which has much better correctness
that any 'alternative' parser out there, implements a friendly API similar
to mwparserfromhell (see
https://doc.wikimedia.org
/Parsoid/master/#!/guide/jsapi), and has a well-documented AST (
https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly
fetched via the REST api (cf
https://en.wikipedia.org/api/ ). I
believe dumps have also been planned, but I'm not sure what the current
status is.
--scott
On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <reemalkashif(a)gmail.com>
wrote:
Hi Pine,
Thank you for your reply. It is an alternative parser. I believe I
first saw on MediaWiki (here
<http://t.sidekickopen68.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs7gbG1nW4WYnHT8q-c7CVRbxS056dC2Qf1b_0xC02?t=https%3A%2F%2Fwww.mediawiki.org%2Fwiki%2FAlternative_parsers&si=5334612837924864&pi=be9d881d-b222-408c-e571-5331aacb58c8>
).
Best,
Reem
On 11 November 2016 at 09:47, Pine W <wiki.pine(a)gmail.com> wrote:
> Was this something on Labs? If so, it might have been purged during
> one of the Labs cleanups.
>
> Pine
>
>
> On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <reemalkashif(a)gmail.com
> > wrote:
>
>> Hi,
>>
>> I'm just wondering if anybody knows what happened to DizzyLogic wiki
>> parser? The website and program vanished. I used it in January 2016 so I
>> know it was there at this time.
>>
>> Best,
>> Reem
>>
>> --
>>
>> *Kind regards,Reem Al-Kashif*
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
--
*Kind regards,Reem Al-Kashif*
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
(
http://cscott.net)
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics