<span class="Apple-style-span" style>Thanks you all for the tips.</span><div style><br></div><div style>Maybe you have a good tip to get the parsed/html version of these entries in an easy way?</div><div style><br></div><div style>

Thanks again</div><div style><br></div><div style>Sebastien</div><br><div class="gmail_quote">On 9 January 2012 10:51, Jérémie Roquet <span dir="ltr">&lt;<a href="mailto:arkanosis@gmail.com">arkanosis@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Sebastien,<br>

<br>

2012/1/9 Sébastien Druon &lt;<a href="mailto:druon.sebastien@gmail.com">druon.sebastien@gmail.com</a>&gt;:<br>

<div class="im">&gt; How is it possible to get the list of all the entries (words) of a<br>

&gt; wiktionary?<br>

&gt; For example, for the russian wiktionary, I want to get the list of all the<br>

&gt; russian entries (no other languages)<br>

<br>

</div>You can download the last dumpš and pass it through a simple awk<br>

script that prints the titles of the pages that contain the {{-ru-}}<br>

template, ie. something like:<br>

<br>

----8&lt;----<br>

<br>

BEGIN {<br>

  ru = 0<br>

}<br>

<br>

END {<br>

  if (ru) {<br>

    print title<br>

  }<br>

}<br>

<br>

/&lt;title&gt;.*?&lt;/ {<br>

  if (ru) {<br>

    print title<br>

  }<br>

<br>

  title = substr($0, 12, length($0) - 19)<br>

  ru = 0<br>

}<br>

<br>

tolower($0) ~ /{{-ru-}}/ {<br>

  ru = 1<br>

}<br>

<br>

----8&lt;----<br>

<br>

You&#39;d still have to filter the output to only keep titles in the main namespace.<br>

<br>

It should be possible using categories too, but this wouldn&#39;t be any<br>

easier nor more reliable and would be much slower.<br>

<br>

Best regards,<br>

<br>

š<a href="http://dumps.wikimedia.org/ruwiktionary/20120107/ruwiktionary-20120107-pages-articles.xml.bz2" target="_blank">http://dumps.wikimedia.org/ruwiktionary/20120107/ruwiktionary-20120107-pages-articles.xml.bz2</a><br>


<span class="HOEnZb"><font color="#888888"><br>

--<br>

Jérémie<br>

</font></span></blockquote></div><br>