Petr, I played with your library a bit. Its has some interesting and
creative pieces and uses some cool tech (love Roslyn). Might need a bit of
love and polishing, as I think the syntax is too verbose, but that's
irrelevant here.
This is your code to list link titles from all non-redirect pages in a wiki.
var source = wiki.Query.allpages()
.Where(p => p.filterredir == allpagesfilterredir.nonredirects)
.Pages
.Select(p => PageResult.Create(p.info, p.links().Select(l =>
l.title).ToEnumerable()));;
foreach ( var page in source.Take(2000)) // just the first 10 pages
foreach( var linkTitle in page.Data.Take(1)) // first 1 link from each
page
Console.WriteLine(linkTitle);
The "page" foreach starts by getting
http://en.wikipedia.org/w/api.php: action=query & meta=siteinfo &
siprop=namespaces
The linkTitle foreach causes 18 more api calls to start getting the links,
all with plcontinue, before it yeilds even a single link.
And the reason for it, as Brad correctly noted, is that links are sorted in
a different order from titles. At this point, you are half way through the
current block, you have made 19 fairly expensive api calls, and if (and
that's a big if) you decide to continue with the next gapcontinue, based on
the first link you get, you still need to do each "plcontinue" so that you
don't miss any pages.
The only thing you can really do, with minimal calls is -- get a block of
data, take a RANDOM page with links on it, check the first link, and
decide to go on to the next block. I see absolutelly no sense in this use.
In short - there are no way you can say "next page" until you iterate
through every plcontinue in the current set. EXCEPT! if you go one page at
a time (gaplimit=1) - in which case you can safely skip to the next
gapcontinue. But this is exactly what I am trying to avoid, because it does
not give any benefit whatsoever in using the generator. I might even
suspect that it costs much more - because running generator, even with
limit=1 has a bigger cost than just querying one specific page info and
filling it out.
On Tue, Dec 18, 2012 at 3:59 PM, Petr Onderka <gsvick(a)gmail.com> wrote:
Well, I can't tell you any use cases from my
library users, because
there aren't any
(like I said, I didn't actually publicize it yet).
And my library would solve most of those cases the way I explained before:
IEnumerable inside IEnumerable (the exact shape depends on the user).
In the case of more than one prop being used,
it always continues all props, even if the user iterates only one of them.
Petr Onderka
[[en:User:Svick]]
On Tue, Dec 18, 2012 at 9:00 PM, Yuri Astrakhan <yuriastrakhan(a)gmail.com>
wrote:
Petr, thanks, I will look closely at your library
and post my thoughts.
Could you take look at
http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Recipes and see how
your
library would solve these? Also, if you can think
of other common use
cases
from your library users (not your library
internals, as it is just an
intermediary), please post them too. I posted cases I saw in interwiki &
casechecker bots.
Thanks!
On Tue, Dec 18, 2012 at 2:25 PM, Petr Onderka <gsvick(a)gmail.com> wrote:
>
> On Tue, Dec 18, 2012 at 5:39 PM, Yuri Astrakhan <
yuriastrakhan(a)gmail.com>
> wrote:
> > Same goes for iterating through a collection - none of the programming
> > languages offering IEnumerable have stream control functionality - too
> > complicated without clear benefits.
>
> Actually in my C# library [1] (I plan to publicize it more later)
> a query like generator=allpages&prop=links might result in something
> like IEnumerable<IEnumerable<Link>> [2].
> And iterating the outer IEnumerable corresponds to iterating
gapcontinue,
> while iterating the inner IEnumerable
corresponds to plcontinue
> (of course it's not that simple, since I'm not using limit=1, but I
> hope you get the idea).
>
> And while this means some more work for the library writer (in this
case,
me)
than your alternative, it also means the user has more control over
what exactly is retrieved.
Petr Onderka
[[en:User:Svick]]
[1]
https://github.com/svick/LINQ-to-Wiki/
[2] Or, more realistically, IEnumerable<Tuple<Page,
IEnumerable<Link>>>,
but I didn't want to complicate it with even more generics.
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api