It depends of the size of the dataset, if you already know the pages or
users you want to compare (limited dataset with reasonable quantity) then
there is no scalability issue and it should not be too difficult to
implement. Otherwise it require a lot of resources to merge the numbers for
every pages or users over the period with a large dataset.
2017-07-31 16:59 GMT+02:00 יגאל חיטרון <khitron(a)gmail.com>:
> Thank you. I will not say that I understood your explanation, but I'll try:
> If you have number of viewers of some page for every year, can't you get
> the sum of them and compare with another page to sort them? And the same
> for the number of some user's edits?
> Igal
>
> On Jul 31, 2017 17:19, "Akeron" <akeron.wp(a)gmail.com> wrote:
>
> > Hi Igal,
> > All suggestions are welcome :)
> > Supporting this feature shouldn't be too difficult in theory because it
> is
> > already working with this kind of aggregation (month are built from days,
> > years from months...). The main problem is scalability for stats which
> > require uniqueness like number of users or number of edits *per page*.
> > That's why yearly stats can actually be disabled on some big wikis. So it
> > would be feasible but with edits limitations for the range (like 3-5
> > millions) and it would be very slow to load with lots of edits.
> >
> > Akeron
> >
> > 2017-07-31 14:29 GMT+02:00 יגאל חיטרון <khitron(a)gmail.com>:
> >
> > > Hello. It's amazing, thank you very much!
> > > Could I suggest one more feature, please? With it, the tool will be
> > > perfect. I'm talking about aggregation. Any kind of historical
> statistics
> > > for some day, month or year can be also shown as range of time. For
> > > example, if we have month statistics, we could fill From field to be
> Jan
> > > 2008 and To field to be May 2011, and get the aggregated numbers for
> this
> > > range. Is it possible?
> > > Thank you very much again,
> > > Igal (User:IKhitron)
> > >
> > > On Jul 30, 2017 22:18, "Pine W" <wiki.pine(a)gmail.com> wrote:
> > >
> > > > Wikiscan is an interesting tool for statistics fans. I suggest
> briefly
> > > > reading this IEG page
> > > > <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>,
> then
> > > > playing with the tool on https://wikiscan.org/
> > > >
> > > > Pine
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > Wikitech-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > Wikitech-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Good idea but the * char is allowed and sometime used in user names. Only
remaining available chars seems to be "# < > [ ] | { } / @".
Another possibility is to use a new dedicated field which would include all
user names starting with the entered string.
2017-07-31 10:27 GMT+02:00 Sibi Kanagaraj <commonssibi(a)gmail.com>:
> Hi Team ,
>
> @User:akeron
>
> Would like to know if there is any possibility for giving Wild card
> entries in the Users category .
>
> Example :
> Over here
>
> https://ta.wikiscan.org/users
>
> Will we be able to filer out users whose user name starts with TNSE -
>
> Say something like TNSE * or with a particular pattern . Say TNSE * ABC
> *XYZ
>
> Regards,
> K.Sibi
>
>
>
> On Mon, Jul 31, 2017 at 12:47 AM, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Wikiscan is an interesting tool for statistics fans. I suggest briefly
>> reading this IEG page
>> <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then
>> playing with the tool on https://wikiscan.org/
>>
>> Pine
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
Hi Sibi,
It is safe to skip the SSL warning, the free certificate I use for all
subdomains is limited, actually it include only the biggest wikis.
The good news is wildcard certificates should be available for free in jan
2018
https://letsencrypt.org/2017/07/06/wildcard-certificates-coming-jan-2018.ht…
Akeron
2017-07-31 10:10 GMT+02:00 Sibi Kanagaraj <commonssibi(a)gmail.com>:
> Hi team and User:Akeron ,
>
> A great effort . But when I try to go for my home wiki - Tamil Wikipedia
> stats , it prompts an unsecure connection warning .
>
> https://ta.wikiscan.org/
>
> Though I accept the risk and move ahead , not sure why an unsecure
> connection warning pops up for Tamil and not for English / Commons .
>
> -Sibi
>
> On Mon, Jul 31, 2017 at 12:47 AM, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Wikiscan is an interesting tool for statistics fans. I suggest briefly
>> reading this IEG page
>> <https://meta.wikimedia.org/wiki/Grants:IEG/Wikiscan_multi-wiki>, then
>> playing with the tool on https://wikiscan.org/
>>
>> Pine
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
Hi Everyone,
The next Research Showcase will be live-streamed this Wednesday, July 26,
2017 at 11:30 AM (PST) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=yC1jgK8C8aQ
As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#July_2017>.
This month's presentation:
Freedom versus Standardization: Structured Data Generation in a Peer
Production CommunityBy *Andrew Hall*In addition to encyclopedia articles
and software, peer production communities produce *structured data*, e.g.,
Wikidata and OpenStreetMap’s metadata. Structured data from peer production
communities has become increasingly important due to its use by
computational applications, such as CartoCSS, MapBox, and Wikipedia
infoboxes. However, this structured data is usable by applications only if
it follows *standards.* We did an interview study focused on
OpenStreetMap’s knowledge production processes to investigate how – and how
successfully – this community creates and applies its data standards. Our
study revealed a fundamental tension between the need to produce structured
data in a standardized way and OpenStreetMap’s tradition of contributor
freedom. We extracted six themes that manifested this tension and three
overarching concepts, *correctness, community,* and *code,* which help make
sense of and synthesize the themes. We also offer suggestions for improving
OpenStreetMap’s knowledge production processes, including new data models,
sociotechnical tools, and community practices.
Kindly,
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodlund(a)wikimedia.org
Dear list,
I'm posting a recent conversation with Dan below, as well as a few follow-up questions.
Dan was kind enough to point out this list. I apologize that the post is "backward" (in
email-thread format) due to my ignorance, will use this list from now on.
Thanks, Daniel
----
Hi Dan
Thanks for getting back to me so quickly!
>Thanks for writing. In general these questions are best asked on our public list, so other
>people can see and benefit from any answers: https://lists.wikimedia.org/mailman/listinfo/
>analytics
Thanks, I've joined this list and will ask subsequent questions there.
>* pairs of pages: we have two datasets that are mentioned in this task https://
>phabricator.wikimedia.org/T158972 which should be very interesting for this purpose. They
>aren't being updated right now, and the task is to do just that. We'll probably get to
>that within the next 3 months, but a bunch of us are on paternity leave this summer, so
>things are a little slower than normal
This seems close to what I need. From the descriptions I gather the linkage is by session.
Is there also a linkage by ip (with IP's removed of course)?
>* country data for pageviews: for privacy reasons we only allow access to this with an
>NDA. We have good data on it, but you need to sign this NDA and use our cluster to access
>it, being careful about what you report about it to the world at large. Here's information
>on that: https://wikitech.wikimedia.org/wiki/Volunteer_NDA
I've read this and am happy to sign an NDA. I understand it is best to be as specific as
possible about the reasoning, intentions with the data, and permissions required. For me to
figure this out it would be useful to know the relevant parts of the database schema, and
perhaps a hint as to which data might be most interesting there. Would you be able to point
me towards that?
>Hope that helps, and feel free to write back to the public list in the future.
Definitely, very helpful and thank you!
Best, Daniel
On Wed, Jul 19, 2017 at 9:51 AM, Oberski, D.L. (Daniel) <d.l.oberski(a)uu.nl> wrote:
Dear Dan,
My name is Daniel Oberski, I'm an associate professor of data science methodology in the
department of statistics at Utrecht University in the Netherlands.
I've been using your incredibly useful pageviews API to study correlations between the
amount of interest people show in a topic (pageviews) with other data such as political
party preference over time. That has yielded some interesting results (which I have yet to
write up).
However, to do a better study it would be very helpful to have slightly more information
than is in the API. Specifically, it would be very useful to be able to query, for each
_pair_ of pages, how many people (or IP's) viewed _both_ of those pages. That way I can find
out which pages are really indicative of interest in a specific common topic, rather than
just correlated by accident. In addition, I've found it hard to figure out pageviews for
specific pages by country rather than language.
My question is, would you happen to know if is there any way to obtain this information?
(does not necessarily have to be through the API.) Or do you know if there are people to
whom I might talk about this?
Thanks for reading (to) the end and best regards,
Daniel