Hello everyone,
how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json.
Just clean text.
I can't seem to find this exact solution.
Best regards
Mikoto
Il 13/12/2014 13:30, misaka83@hush.com ha scritto:
Hello everyone,
how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json.
Just clean text.
I can't seem to find this exact solution.
Best regards
Mikoto
Use the TextExtracts API https://www.mediawiki.org/wiki/Extension:TextExtracts#API. For example, this query https://en.wikipedia.org/w/api.php?action=query&titles=Douglas+Adams&prop=extracts&explaintext=1&exintro=1 returns the lead section of the English Wikipedia article Douglas Adams https://en.wikipedia.org/wiki/Douglas_Adams in plain text.
Sorry, this is not what I was looking for. I need it without the json stuff and everything. But thank you. On 13.12.2014 at 3:46 PM, "Ricordisamoa" wrote: Il 13/12/2014 13:30, misaka83@hush.com ha scritto: Hello everyone, how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json. Just clean text. I can't seem to find this exact solution. Best regards Mikoto Use the TextExtracts API. For example, this query returns the lead section of the English Wikipedia article Douglas Adams in plain text.
Without JSON? Do you mean action=raw https://en.wikipedia.org/w/index.php?title=Douglas+Adams&action=raw, maybe? But that contains wiki markup.
Il 13/12/2014 15:16, misaka83@hush.com ha scritto:
Sorry, this is not what I was looking for. I need it without the json stuff and everything. But thank you.
On 13.12.2014 at 3:46 PM, "Ricordisamoa" ricordisamoa@openmailbox.org wrote:
Il 13/12/2014 13:30, misaka83@hush.com ha scritto: Hello everyone, how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json. Just clean text. I can't seem to find this exact solution. Best regards Mikoto Use the TextExtracts API <https://www.mediawiki.org/wiki/Extension:TextExtracts#API>. For example, this query <https://en.wikipedia.org/w/api.php?action=query&titles=Douglas+Adams&prop=extracts&explaintext=1&exintro=1> returns the lead section of the English Wikipedia article Douglas Adams <https://en.wikipedia.org/wiki/Douglas_Adams> in plain text.
Yes something like that but without the markup, just clean text. I can't imagine that there is no setting like this. There has to be hasn't it?
On 13.12.2014 at 5:05 PM, "Ricordisamoa" wrote: Without JSON? Do you mean action=raw, maybe? But that contains wiki markup. Il 13/12/2014 15:16, misaka83@hush.com ha scritto: Sorry, this is not what I was looking for. I need it without the json stuff and everything. But thank you. On 13.12.2014 at 3:46 PM, "Ricordisamoa" wrote: Il 13/12/2014 13:30, misaka83@hush.com ha scritto: Hello everyone, how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json. Just clean text. I can't seem to find this exact solution. Best regards Mikoto Use the TextExtracts API. For example, this query returns the lead section of the English Wikipedia article Douglas Adams in plain text.
Sorry, as far as I know it's not possible to achieve what you want via the API. Can't you use one of the supported output formats https://www.mediawiki.org/wiki/API:Data_formats#Output? If you absolutely need raw text, a Labs tool could be set up for that.
Il 13/12/2014 16:15, misaka83@hush.com ha scritto:
Yes something like that but without the markup, just clean text. I can't imagine that there is no setting like this. There has to be hasn't it?
On 13.12.2014 at 5:05 PM, "Ricordisamoa" ricordisamoa@openmailbox.org wrote:
Without JSON? Do you mean action=raw <https://en.wikipedia.org/w/index.php?title=Douglas+Adams&action=raw>, maybe? But that contains wiki markup. Il 13/12/2014 15:16, misaka83@hush.com ha scritto: Sorry, this is not what I was looking for. I need it without the json stuff and everything. But thank you. On 13.12.2014 at 3:46 PM, "Ricordisamoa" <ricordisamoa@openmailbox.org> wrote: Il 13/12/2014 13:30, misaka83@hush.com ha scritto: Hello everyone, how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json. Just clean text. I can't seem to find this exact solution. Best regards Mikoto Use the TextExtracts API <https://www.mediawiki.org/wiki/Extension:TextExtracts#API>. For example, this query <https://en.wikipedia.org/w/api.php?action=query&titles=Douglas+Adams&prop=extracts&explaintext=1&exintro=1> returns the lead section of the English Wikipedia article Douglas Adams <https://en.wikipedia.org/wiki/Douglas_Adams> in plain text.
Well I could do a workaround for json but it's not the best solution after all, you know. I thank you really deeply for your attention and help. If you've got another idea just let me know.
Best regards On 13.12.2014 at 5:36 PM, "Ricordisamoa" wrote: Sorry, as far as I know it's not possible to achieve what you want via the API. Can't you use one of the supported output formats? If you absolutely need raw text, a Labs tool could be set up for that. Il 13/12/2014 16:15, misaka83@hush.com ha scritto: Yes something like that but without the markup, just clean text. I can't imagine that there is no setting like this. There has to be hasn't it? On 13.12.2014 at 5:05 PM, "Ricordisamoa" wrote: Without JSON? Do you mean action=raw, maybe? But that contains wiki markup. Il 13/12/2014 15:16, misaka83@hush.com ha scritto: Sorry, this is not what I was looking for. I need it without the json stuff and everything. But thank you. On 13.12.2014 at 3:46 PM, "Ricordisamoa" wrote: Il 13/12/2014 13:30, misaka83@hush.com ha scritto: Hello everyone, how can I extract just clean plain text from a Wikipedia article? Without wiki-stuff, without html, without pictures, without json. Just clean text. I can't seem to find this exact solution. Best regards Mikoto Use the TextExtracts API. For example, this query returns the lead section of the English Wikipedia article Douglas Adams in plain text.
mediawiki-api@lists.wikimedia.org