Hi Scott,

Thank you very much. This does the job! I'm wondering if this existed and I missed it back in January because I remember that I looked at the book creator back then and there were lesser options (or maybe I simply missed them).

I will probably have to figure out a way to remove the references, external links, and notes sections. Regular expressions could be probably help (other ideas/suggestions are welcome), but Dizzy Logic had this cool thing where they added #Article at the beginning of each article to mark them. That would be a great feature to consider adding to book creator.

Best,
Reem

On 18 November 2016 at 17:17, C. Scott Ananian <cananian@wikimedia.org> wrote:

OCG contains a "plaintext" backend which generates quite nice plain-text versions of WP articles.  Try clicking "create a book" in the enwiki sidebar, "start book creator", go to some article, click "add this page to your book" in the header then "show book", then change the format in the drop down to "Word processor (plain text)" and click "download".

You can also take the "download as PDF" link, something like
https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=Jack+Bosden&returnto=Jack+Bosden&oldid=741271566&writer=rdf2latex
and replace the 'writer=rdf2latex' part at the end with 'writer=rdf2text', like:
https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=Jack+Bosden&returnto=Jack+Bosden&oldid=741271566&writer=rdf2text

These tools can be used from the command-line, as described at
https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer

I hope that helps!
  --scott

On Fri, Nov 18, 2016 at 3:15 AM, Reem Al-Kashif <reemalkashif@gmail.com> wrote:
Hi Scott,

Thank you so much for your reply and offer to help with Parsoid. I used DizzyLogic as an easy parser to get Wikipedia articles' content stripped off the wiki markup. The results were in plain text files. I used it to parse the whole English and Arabic Wikipedia dumps back in January. It was easy to use because my coding knowledge is limited.
I read the link you kindly provided about Parsoid and I think it can help me with parsing. However, I'm not sure how to start on testing this. 

Thank you :)

Best,
Reem

On 11 November 2016 at 19:55, C. Scott Ananian <cananian@wikimedia.org> wrote:
It was removed from that article recently (19 Oct 2016: https://www.mediawiki.org/w/index.php?title=Alternative_parsers&type=revision&diff=2265815&oldid=2247632) with the following comment:


If you'd like to explain what you would have used DizzyLogic for, I'd love to help you figure out how to use Parsoid to accomplish your goals.  It's an officially-supported WMF parser which has much better correctness that any 'alternative' parser out there, implements a friendly API similar to mwparserfromhell (see https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi), and has a well-documented AST (https://www.mediawiki.org/wiki/Specs/HTML/1.2.1) which can be directly fetched via the REST api (cf https://en.wikipedia.org/api/ ).  I believe dumps have also been planned, but I'm not sure what the current status is.
 --scott


On Fri, Nov 11, 2016 at 7:57 AM, Reem Al-Kashif <reemalkashif@gmail.com> wrote:
Hi Pine,

Thank you for your reply. It is an alternative parser. I believe I first saw on MediaWiki (here).

Best,
Reem

On 11 November 2016 at 09:47, Pine W <wiki.pine@gmail.com> wrote:
Was this something on Labs? If so, it might have been purged during one of the Labs cleanups.

Pine


On Tue, Nov 8, 2016 at 2:33 PM, Reem Al-Kashif <reemalkashif@gmail.com> wrote:
Hi,

I'm just wondering if anybody knows what happened to DizzyLogic wiki parser? The website and program vanished. I used it in January 2016 so I know it was there at this time.

Best,
Reem

--
Kind regards,
Reem Al-Kashif


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Kind regards,
Reem Al-Kashif


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Kind regards,
Reem Al-Kashif




--



--
Kind regards,
Reem Al-Kashif