Hi,
trying to improve the mess of our docs for developers on mediawiki.org, I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth), ignores pages which include {{Outdated}} or {{Historical}} templates, ignores pages in certain namespaces like "Talk:" or "User:", and ignores pages which are just translations (like "PageName/qqx"). Or at least some of all this. :)
Thanks in advance for any ideas! andre
Andre Klapper, 21/11/2017 17:15:
I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth)
The closest thing I can think of is Erik's chart of category links, generated with a script which is published somewhere and could be adapted at least for simple regex filters. https://stats.wikimedia.org/EN/CategoryOverviewIndex.htm https://stats.wikimedia.org/wikimedia/pageviews/categorized/
There's also http://www.chrisharrison.net/index.php/Visualizations/ClusterBall and a graph of links between user pages, which was made perhaps in 2014.
Federico
Hi Andre, I'm not aware of any tool as you describe. I however think it would be super useful ! I'll think a bout it some more and possibly draft a ticket. Cheers Joseph
On Tue, Nov 21, 2017 at 4:29 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Andre Klapper, 21/11/2017 17:15:
I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth)
The closest thing I can think of is Erik's chart of category links, generated with a script which is published somewhere and could be adapted at least for simple regex filters. https://stats.wikimedia.org/EN/CategoryOverviewIndex.htm https://stats.wikimedia.org/wikimedia/pageviews/categorized/
There's also http://www.chrisharrison.net/index.php/Visualizations/Clust erBall and a graph of links between user pages, which was made perhaps in 2014.
Federico
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Andre. Jaime's query is a good starting point, it would get you the data you need for one wiki. We can import the templatelinks table and then we can run it on Hadoop and get all wikis at once (we already have the other tables).
But once we got that, we'd have a graph with millions of nodes and edges. That's not possible to consume in visual form, so you could serve slices of the data and visualize parts of the graph. The question is, then, what purpose would this visualization have? If that's well defined, maybe we can figure out what slices of the data would be most useful.
On Tue, Nov 21, 2017 at 2:04 PM, Joseph Allemandou < jallemandou@wikimedia.org> wrote:
Hi Andre, I'm not aware of any tool as you describe. I however think it would be super useful ! I'll think a bout it some more and possibly draft a ticket. Cheers Joseph
On Tue, Nov 21, 2017 at 4:29 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Andre Klapper, 21/11/2017 17:15:
I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth)
The closest thing I can think of is Erik's chart of category links, generated with a script which is published somewhere and could be adapted at least for simple regex filters. https://stats.wikimedia.org/EN/CategoryOverviewIndex.htm https://stats.wikimedia.org/wikimedia/pageviews/categorized/
There's also http://www.chrisharrison.net/index.php/Visualizations/Clust erBall and a graph of links between user pages, which was made perhaps in 2014.
Federico
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- *Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
A problem with the category hierarchy is that any rather out of place subcategory brings in a full branch of anomalous subjects below it.
Thus making a report like https://stats.wikimedia.org/wikimedia/pageviews/categorized/wp-en/2015-06/pa... involved cyclic pruning of weird subbranches, manually building a blacklist of nodes not to follow, that I could feed into the script.
The other project, I gave up https://stats.wikimedia.org/EN/CategoryOverviewIndex.htm as it became too unwieldy. Also, sometimes a category re-appeared as a great-grandchild of itself. I had to detect that, in order to avoid loops.
Those are two of the pitfalls.
As for navigation: Even a smallish ('concise') example of how my second project https://stats.wikimedia.org/EN/CategoryOverview_EN_Concise.htm makes it daunting. What I often see is a dynamic navigator that shows one level up or down, which makes me feel I'm in a maze.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu Sent: Monday, November 27, 2017 16:14 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. analytics@lists.wikimedia.org Subject: Re: [Analytics] Tool to visualize which wiki pages link to which wiki pages?
Hi Andre. Jaime's query is a good starting point, it would get you the data you need for one wiki. We can import the templatelinks table and then we can run it on Hadoop and get all wikis at once (we already have the other tables).
But once we got that, we'd have a graph with millions of nodes and edges. That's not possible to consume in visual form, so you could serve slices of the data and visualize parts of the graph. The question is, then, what purpose would this visualization have? If that's well defined, maybe we can figure out what slices of the data would be most useful.
On Tue, Nov 21, 2017 at 2:04 PM, Joseph Allemandou <jallemandou@wikimedia.org mailto:jallemandou@wikimedia.org > wrote:
Hi Andre,
I'm not aware of any tool as you describe.
I however think it would be super useful !
I'll think a bout it some more and possibly draft a ticket.
Cheers
Joseph
On Tue, Nov 21, 2017 at 4:29 PM, Federico Leva (Nemo) <nemowiki@gmail.com mailto:nemowiki@gmail.com > wrote:
Andre Klapper, 21/11/2017 17:15:
I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth)
The closest thing I can think of is Erik's chart of category links, generated with a script which is published somewhere and could be adapted at least for simple regex filters. https://stats.wikimedia.org/EN/CategoryOverviewIndex.htm https://stats.wikimedia.org/wikimedia/pageviews/categorized/
There's also http://www.chrisharrison.net/index.php/Visualizations/ClusterBall and a graph of links between user pages, which was made perhaps in 2014.
Federico
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Andre,
I do not have a tool, but maybe I can give you a query on quarry to (start) do that:
https://quarry.wmflabs.org/query/23197
Obviously it requires some iterations, but recursive queries are not yet available on MariaDB, so a script should do that for you.
On Tue, Nov 21, 2017 at 4:15 PM, Andre Klapper aklapper@wikimedia.org wrote:
Hi,
trying to improve the mess of our docs for developers on mediawiki.org, I've been wondering if anyone's aware of any visualization tool that draws a graph showing which wiki pages are linked from which other wiki pages (up to a certain depth), ignores pages which include {{Outdated}} or {{Historical}} templates, ignores pages in certain namespaces like "Talk:" or "User:", and ignores pages which are just translations (like "PageName/qqx"). Or at least some of all this. :)
Thanks in advance for any ideas! andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics