we will be taking up more Wikimedia bandwidth
Please, note that from the operations side of things (Disclaimer: I am
*not* a netops), I have the understanding that pure bandwidth usage is
currently a non-issue (it is mostly a fixed cost, rather than a variable
one). Hitting repeatedly a server is way more "costly" (all things
considered, such as server purchase and maintenance) that a 1-time dump
download. All dump users; use as much as you need (without wasting it) to
meet your goals and do not worry too much about bandwidth.
Also want to say that we're very thankful for the
work you all are doing
publishing this dataset, it's enormously useful for
entity popularity in
our search engine for publishers <https://graphiq.com/search>.
My personal opinion is that, indeed, Analytics' work is very important for
our mission (free knowledge spreading) and they are doing it great. I do
not know if that is said enough.
On Tue, Aug 16, 2016 at 11:06 PM, Dylan Wenzlau <dylan(a)graphiq.com> wrote:
> Thank you for the update. No one from our team is on the mailing list, and
> we have not viewed the /other/analytics page before (only the pagecounts-all-sites
> page
> <https://wikitech.wikimedia.org/wiki/Analytics/Data/Pagecounts-all-sites>
> and pages linked from there), which explains why we didn't know about this.
> I do see you recently added a link to Phabricator issue though, which is
> helpful!
>
> I am currently rewriting our scripts to utilize the new pagecounts-ez
> format, although I think that this new format means that we will be taking
> up more Wikimedia bandwidth than we did previously, since we will have to
> re-downoad this merged daily file once per hour in order to utilize the
> hourly stats. Previously, we only had to download ~100MB per hour, and now
> it seems we'll be downloading ~350MB per hour. Please correct me if I'm
> missing something obvious here!
>
Also want to say that we're very thankful for the
work you all are doing
> publishing this dataset, it's enormously useful for
entity popularity in
> our search engine for publishers <https://graphiq.com/search>.
>
> On Tue, Aug 16, 2016 at 1:48 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
> wrote:
>
>> Dylan, there's also been a deprecation message on the page that links to
>> these datasets, since last winter:
https://dumps.wikimedi
>> a.org/other/analytics/
>>
>> If you know of other places that these datasets are referenced, I'd be
>> happy to update the docs and add links to the email threads. We usually
>> publish information about this kind of deprecation on this list well in
>> advance, but are open to reaching out in other ways.
>>
>> On Tue, Aug 16, 2016 at 4:13 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
>>
>>>
>>> Dylan,
>>>
>>> (cc-ing analytics@ public list)
>>>
>>> Please see announcement about deprecation of datasets:
>>>
https://lists.wikimedia.org/pipermail/analytics/2016-August/005339.html
>>>
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 16, 2016 at 12:53 PM, Dylan Wenzlau <dylan(a)graphiq.com>
>>> wrote:
>>>
>>>> It seems the pagecounts-all-sites dumps have completely stopped
>>>> updating, and I don't see any warning or message about why this is
the case
>>>> or whether it's currently being resolved. Our company relies pretty
heavily
>>>> on this data, as I imagine other projects & companies do as well, so
I
>>>> think it would be useful to at least display a big warning message on
the
>>>> documentation pages explaining why these are no longer updating.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Dylan Wenzlau* | Director of Engineering |
>>>>
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>
>
> --
> *Dylan Wenzlau* | Director of Engineering |
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
--
Jaime Crespo
<http://wikimedia.org>