Hi,
It's really hard to scrape the English Wikiquote for quotes, as pages contain different types of non-quote items: credits, external links, citatons, quotes in their original language.
The French Wikiquote seems careful about always using the Template:Citation for actual quotes, which clearly marks them, and can be consumed by machines easily.
Have you considered doing the same for the English Wikiquote? (I understand it's a big task to convert all pages, but the convention could still be adopted by a gradual process)
On Tue, Apr 5, 2016 at 11:34 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Ori Avtalion, 05/04/2016 22:18:
The French Wikiquote seems careful about always using the Template:Citation for actual quotes, which clearly marks them, and can be consumed by machines easily.
Was there any assessments of what was gained and lost by this method?
I'm sorry, I don't have any information about that process, specifically what goes on inside fr Wikiquote.
I was made aware of the difference after using the python-wikiquote [1] library, which has completely different methods for parsing en and fr Wikiquote.
[1] https://github.com/federicotdn/python-wikiquotes/
I was hoping this mailing list be aware of what's going on there. Is there an official channel to contact them and share this information?
Ori Avtalion, 05/04/2016 22:51:
I was made aware of the difference after using the python-wikiquote [1] library, which has completely different methods for parsing en and fr Wikiquote.
Thank you, I was not aware of this tool. I've added it to http://wikipapers.referata.com/wiki/Python-wikiquotes .
I was hoping this mailing list be aware of what's going on there. Is there an official channel to contact them and share this information?
Who is "them"? Usually, Wikiquote wikis perfectly know that their content is difficult to parse programmatically, but have other priorities. For instance, the Italian Wikiquote is intentionally self-constrained to a very low usage of templates, even the most common ones, in order to help new users. The French Wikiquote adopted the template system mainly for reasons of "control", AFAIK. The matter has been discussed many times, but the most recent discussion of some size on the matter has been https://meta.wikimedia.org/wiki/Structured_Wikiquote . If I see correctly, the one and only Wikiquote editor who expressed interest in this approach so far is BD2412 (most Wikiquote users probably felt like Ningauble). I can't see a discussion about [[m:Structured_Wikiquote]] other than https://en.wikiquote.org/wiki/Wikiquote:Village_pump_archive_42#Updates_to_W... on the English Wikiquote, so you may open one at the village pump. If you want to seriously work in this field, I can provide some suggestions if requested. It's probably easier to propose such big reforms on smaller and somehow dynamic Wikiquotes, such as Czech or Vietnamese, or on bigger ghost towns like the German Wikiquote.
Nemo
On Wed, Apr 6, 2016 at 12:34 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Ori Avtalion, 05/04/2016 22:51:
I was hoping this mailing list be aware of what's going on there. Is there an official channel to contact them and share this information?
Who is "them"?
I was referring to the French Wikiquote editors.
If you want to seriously work in this field, I can provide some
suggestions if requested. It's probably easier to propose such big reforms on smaller and somehow dynamic Wikiquotes, such as Czech or Vietnamese, or on bigger ghost towns like the German Wikiquote.
Thanks for offering links to existing discussions. Since the idea has already been rejected, for good reasons, I don't see the point in bringing it up again. I'd rather focus any efforts on improving the parsing of existing pages using the Python library I mentioned.
Ori Avtalion, 06/04/2016 00:08:
I'd rather focus any efforts on improving the parsing of existing pages using the Python library I mentioned.
That's a very useful work too, comparable to DBpedia or the many Wiktionary-based thesauri! Let us know how the work proceeds and whether there are small changes which might make your work easier. (Maybe some wikitext conventions, fixable by a bot?)
Nemo
Hey guys,
Just to come back to the usage of templates on Wikiquote in French: it is linked to the reopening of this Wikiquote instance a few years ago. Indeed, it was agreed to reopen fr.wq only under the condition that all quotes would be appropriately referenced. We decided early on to use templates to facilitate this, as those templates can automatically identify and highlight incomplete references. On top of that, they ensure that all pages have identical formatting. AFAIK, this hasn't been emulated on other Wikiquotes.
Cheers, Matthieu
On Wed, Apr 6, 2016 at 12:13 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Ori Avtalion, 06/04/2016 00:08:
I'd rather focus any efforts on improving the parsing of existing pages using the Python library I mentioned.
That's a very useful work too, comparable to DBpedia or the many Wiktionary-based thesauri! Let us know how the work proceeds and whether there are small changes which might make your work easier. (Maybe some wikitext conventions, fixable by a bot?)
Nemo
Wikiquote-l mailing list Wikiquote-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikiquote-l
Hi, "AFAIK, this hasn't been emulated on other Wikiquotes."ca.wikiquote also uses templates for quotes ({{cita}} or {{dita}} for popular quotes) although there are some pages that still need to be converted (53, https://ca.wikiquote.org/wiki/Categoria:P%C3%A0gines_que_no_usen_la_plantill...).
El Jueves 7 de abril de 2016 16:12, Matthieu André matth.andre@gmail.com escribió:
Hey guys,
Just to come back to the usage of templates on Wikiquote in French: it is linked to the reopening of this Wikiquote instance a few years ago. Indeed, it was agreed to reopen fr.wq only under the condition that all quotes would be appropriately referenced. We decided early on to use templates to facilitate this, as those templates can automatically identify and highlight incomplete references. On top of that, they ensure that all pages have identical formatting. AFAIK, this hasn't been emulated on other Wikiquotes.
Cheers, Matthieu
On Wed, Apr 6, 2016 at 12:13 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Ori Avtalion, 06/04/2016 00:08:
I'd rather focus any efforts on improving the parsing of existing pages using the Python library I mentioned.
That's a very useful work too, comparable to DBpedia or the many Wiktionary-based thesauri! Let us know how the work proceeds and whether there are small changes which might make your work easier. (Maybe some wikitext conventions, fixable by a bot?)
Nemo
_______________________________________________ Wikiquote-l mailing list Wikiquote-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikiquote-l
_______________________________________________ Wikiquote-l mailing list Wikiquote-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikiquote-l
wikiquote-l@lists.wikimedia.org