Pywikipedia-l March 2012

pywikipedia-l@lists.wikimedia.org

17 participants
30 discussions

by Bináris

Hi, I am a bit confused now with namespaces. What is the relationship between wikipedia_family.py and family.py? Here is a request for kv namespaces: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3512044&group_… I have put it into wikipedia_family.py, is that correct? -- Bináris

12 years

Reading scaled up duplicate images through Wikipedia API (Python impl)

by Jenith

Is it possible to get duplicate images of the original image (scaled up) by giving input the smaller version in a program? Any Python Wikipedia API like hashlib? Thanks Y. Jenith Sent from BlackBerry® on Airtel

12 years

Euro sign

by Bináris

Hi, I use replace.py and the Euro sign appears as a gray question mark (?) (that means the default in transliteration.py, but not even yellow) instead of a yellow "*E*". I checked with copy and paste that it is literally shown in 230th line of transliteration.py. I also checked Japanese yen that appears correctly (just 2 lines below Euro in transliteration.py). Ukrainian гривня (₴) is not listed, it appears as a *yellow ?*. Help me, where to begin debugging? -- Bináris

12 years

Interactive Pywiki

by Bináris

Hi, I just want to share my new experience. Maybe this is trivial for everybody except me, but was new for me. So I started a Python command line interpreter from the Pywikipedia dictionary, and it began to work immediately. import wikipedia site=wikipedia.getSite() and you may begin to do anything interactively. It's a big fun! :-)) My task was to delete every second section title from my subpage, and it could be done by a few commands without saving any script. I was never aware of what a plenty of information Page.put() returns but in the interpreter environment returned values are displayed automatically without print so I know it now. :-) That's what I always wanted for quick one-time tasks. I don't know why I never experimented wit this earlier. -- Bináris

12 years, 1 month

Webbrowser.open as a page method?

by Bináris

Good morning (to whom it may concern in their time zone), last night Merlijn had a maintenance day and closed so many bugs that I could not even read the mails in less than three sessions. :-) Thank you! Merlijn copied "open in browser" from replace.py to add_text.py: http://www.mediawiki.org/wiki/Special:Code/pywikipedia/10034 Just a day before I was to do the same with solve_disambiguation.py, but I had no time yet. What about rather introducing a browseropen() method to class Page? I think any error message during this process shall appear in webbrowser, not the pywiki script, am I right? -- Bináris

12 years, 1 month

Error formatting (was: [Pywikipedia-bugs] [ pywikipediabot-Patches-3092870 ] non ascii in system messages and max retry)

by Merlijn van Deen

Currently, we have some places in the code where we wikipedia.output("%s" % e) with e and exception. This breaks if the exception (whose information is of type str) is printed through .output (which requires unicode). See my comments on this issue below; is there a reason to print errors through .output instead of using the built-in python functions? ---------- Forwarded message ---------- Subject: [Pywikipedia-bugs] [ pywikipediabot-Patches-3092870 ] non ascii in system messages and max retry https://sourceforge.net/tracker/?func=detail&atid=603140&aid=3092870&group_… >Comment By: Merlijn S. van Deen (valhallasw) Date: 2012-03-21 09:49 Message: I think we should either a) skip the entire output() machinery and use traceback.print_exc() instead or b) write a wrapper for that does what you propose here (but which can also be used for traceback.format_exc). and replace all exception printing with one of those two options. ----------------------------------------------------------------------

12 years, 1 month

Re: [Pywikipedia-l] [Pywikipedia-svn] SVN: [10028] trunk/pywikipedia/pywikibot/textlib.py

by Bináris

That's a great idea, and hopefully helps Chris. 2012/3/19 <xqt(a)svn.wikimedia.org> > http://www.mediawiki.org/wiki/Special:Code/pywikipedia/10028 > > Revision: 10028 > Author: xqt > Date: 2012-03-19 12:15:04 +0000 (Mon, 19 Mar 2012) > Log Message: > ----------- > Debugging information for IndexError > > Modified Paths: > -------------- > trunk/pywikipedia/pywikibot/textlib.py > > Modified: trunk/pywikipedia/pywikibot/textlib.py > =================================================================== > --- trunk/pywikipedia/pywikibot/textlib.py 2012-03-18 18:05:09 UTC > (rev 10027) > +++ trunk/pywikipedia/pywikibot/textlib.py 2012-03-19 12:15:04 UTC > (rev 10028) > @@ -172,9 +172,14 @@ > break > groupID = groupMatch.group('name') or \ > int(groupMatch.group('number')) > - replacement = replacement[:groupMatch.start()] + \ > - match.group(groupID) + \ > - replacement[groupMatch.end():] > + try: > + replacement = replacement[:groupMatch.start()] + \ > + match.group(groupID) + \ > + replacement[groupMatch.end():] > + except IndexError: > + print '\nInvalid group reference:', groupID > + print 'Groups found:\n' match.groups() > + raise IndexError > text = text[:match.start()] + replacement + text[match.end():] > > # continue the search on the remaining text > > > _______________________________________________ > Pywikipedia-svn mailing list > Pywikipedia-svn(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-svn > -- Bináris

12 years, 1 month

debugging a replace.py command

by Chris Watkins

I'm having trouble with this script, which I'm running on Appropedia.org... it's not a huge deal if it doesn't work, but I'd appreciate if anyone has the patience to help me understand how to debug this, or *why* it doesn't work. I've narrowed it down to the \2 in the replace term, as the problem disappears when I remove it: python replace.py -regex '(?si)\b(WordPress)\b(.*$)' '\1\2\n[[Category:Appropedia WordPress site]]' -excepttext:'(?si)\[\[\s*Category:\s*Appropedia WordPress site' -excepttext:'(?si)(\#redirect\s*\[\[)' -namespace:4 -namespace:12 -summary:'add [[Category:Appropedia WordPress site]] based on search and manual check.' -log:CategoryAdd -xml:currentdump.xml Output is: Reading XML dump... Traceback (most recent call last): File "/home/cwg23/pwb/pagegenerators.py", line 1182, in __iter__ for page in self.wrapped_gen: File "/home/cwg23/pwb/pagegenerators.py", line 1039, in NamespaceFilterPageGenerator for page in generator: File "/home/cwg23/pwb/pagegenerators.py", line 1084, in DuplicateFilterPageGenerator for page in generator: File "replace.py", line 217, in __iter__ new_text = pywikibot.replaceExcept(new_text, old, new, self.excsInside, self.site) File "/home/cwg23/pwb/pywikibot/textlib.py", line 175, in replaceExcept match.group(groupID) + \ IndexError: no such group no such group 0 pages were changed. And then it gets interesting... to speed things up while debugging, I made a modified replace script called replace2.py which only loads 2 pages at a time (by setting "maxquerysize = 2" in that file). Funny thing - I can run exactly the same command but with "replace2.py" and it works... up until it gets to a particular page. Then I press n and get the error. (Btw, I've run versions of this bot in the past with only the match & replace text changed, with no problems, so it makes sense that the error only occurs in specific conditions.) The last page that it gives me is Appropedia:A Humourless Lot staging area<http://www.appropedia.org/Appropedia:A_Humourless_Lot_staging_area>- I assume the page where the problem occurs is one of the next 2 being loaded, and I don't know how to tell which pages they are. I can't see how the order of pages is determined, as it changed during my debugging/testing. Thanks for any ideas. -- Chris Watkins Appropedia.org - Sharing knowledge to build rich, sustainable lives.

12 years, 1 month

Fwd: [Wikitech-l] New committer

by Merlijn van Deen

On 19 March 2012 05:59, Sumana Harihareswara <sumanah(a)wikimedia.org> wrote: > Gave Hannes Röst (hroest) access to work on pywikipediabot (pywikibot). See http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/12469 for more information. Welcome, Hannes! Best, Merlijn

12 years, 1 month

TREC KBA - Mining Content Streams to Recommend Page Updates to Editors

by John R. Frank

PyWikipedians, Regardless of whether this talk gets accepted at Wikimania, we would like to talk with anyone interested in building this kind of bot. Please contact me off-list if this is your kind of thing. -jrf http://wikimania2012.wikimedia.org/wiki/Submissions/TREC-KBA-Mining-Content… TREC KBA - Mining Content Streams to Recommend Page Updates to Editors Abstract: We have organized a new session in NIST's Text Retrieval Conference (TREC) called Knowledge Base Acceleration (KBA). TREC KBA challenges computer science researchers to develop algorithms that mine content streams, such as news and blogs, to recommend edits to knowledge bases (KB), such as Wikipedia. We consider a KB to be "large" if the number of entities described by the KB is larger than the number of humans maintaining the KB. As entities change and evolve in the real world, large KBs often lag behind by months or years. Such large KBs are an increasingly important tool in several industries, including biomedical research, law enforcement, and financial services. TREC KBA aims to develop algorithms for helping KB editors stay abreast of changes to the organizations, people, proteins, and other entities described by their KBs. In this talk, we will give an overview of the TREC KBA data sets and tasks for 2012 and future years. In addition to developing text analytics, we are also working on a wikipedia bot for connecting KBA-type systems to users' talk pages in mediawiki. After presenting the current state of our bot development, we hope to engage the audience in an open discussion about how such algorithms might be most fruitfully employed in the Wikipedia community. http://trec-kba.org/ (If you want this talk to get accepted in Wikimania this July, consider putting your name on the "interested" list in the wiki linked above.)

12 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l March 2012