<span class="Apple-style-span" style>Thanks you all for the tips.</span><div style><br></div><div style>Maybe you have a good tip to get the parsed/html version of these entries in an easy way?</div><div style><br></div><div style>
Thanks again</div><div style><br></div><div style>Sebastien</div><br><div class="gmail_quote">On 9 January 2012 10:51, Jérémie Roquet <span dir="ltr"><<a href="mailto:arkanosis@gmail.com">arkanosis@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Sebastien,<br>
<br>
2012/1/9 Sébastien Druon <<a href="mailto:druon.sebastien@gmail.com">druon.sebastien@gmail.com</a>>:<br>
<div class="im">> How is it possible to get the list of all the entries (words) of a<br>
> wiktionary?<br>
> For example, for the russian wiktionary, I want to get the list of all the<br>
> russian entries (no other languages)<br>
<br>
</div>You can download the last dump¹ and pass it through a simple awk<br>
script that prints the titles of the pages that contain the {{-ru-}}<br>
template, ie. something like:<br>
<br>
----8<----<br>
<br>
BEGIN {<br>
ru = 0<br>
}<br>
<br>
END {<br>
if (ru) {<br>
print title<br>
}<br>
}<br>
<br>
/<title>.*?</ {<br>
if (ru) {<br>
print title<br>
}<br>
<br>
title = substr($0, 12, length($0) - 19)<br>
ru = 0<br>
}<br>
<br>
tolower($0) ~ /{{-ru-}}/ {<br>
ru = 1<br>
}<br>
<br>
----8<----<br>
<br>
You'd still have to filter the output to only keep titles in the main namespace.<br>
<br>
It should be possible using categories too, but this wouldn't be any<br>
easier nor more reliable and would be much slower.<br>
<br>
Best regards,<br>
<br>
¹<a href="http://dumps.wikimedia.org/ruwiktionary/20120107/ruwiktionary-20120107-pages-articles.xml.bz2" target="_blank">http://dumps.wikimedia.org/ruwiktionary/20120107/ruwiktionary-20120107-pages-articles.xml.bz2</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Jérémie<br>
</font></span></blockquote></div><br>