Hi Cristina,
Happy to see you here :) Just to add on top of Jaime's answer, here you
have an example for python-based app
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Flask_OAuth_tool>
in Toolforge.
Hope this helps,
Best,
Diego
On Fri, Sep 17, 2021 at 3:12 PM Jaime Crespo <jcrespo(a)wikimedia.org> wrote:
On Fri, Sep 17, 2021 at 3:03 PM Cristina Gava via
Analytics <
analytics(a)lists.wikimedia.org> wrote:
Hi Jaime,
Thank you so much for the thorough reply :) All the references are super
useful and I'll go through them now. I'll start with Toolforge, since it
seems there is consensus on it being the most appropriate tool, and leave
the dumps for later if needed.
I'll keep you posted.
It will depend a lot on the type of research needed. For example, ( to be
the devil's advocate, with a simple example) if you wanted to count the
total number of words written in Wikipedia and observe its frequency-
(meaning reading all edits in history), dumps would be a way better option
in this case, as wikireplicas only have access to medatada, not the actual
data. On top of that, reading sequentially all edits will be much faster
from a downloaded bundle, while on the live MariaDB database the access is
faster for small portions with specific conditions or small to medium
ranges.
I think starting with wikireplicas and later going for the dumps if you
see it not working for you is a totally reasonable decision, in general, as
it will require less investment on your local setup.
--
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Analytics mailing list -- analytics(a)lists.wikimedia.org
To unsubscribe send an email to analytics-leave(a)lists.wikimedia.org