lists.wikimedia.org
Sign In
Sign Up
Sign In
Sign Up
Manage this list
×
Keyboard Shortcuts
Thread View
j
: Next unread message
k
: Previous unread message
j a
: Jump to all threads
j l
: Jump to MailingList overview
2024
December
November
October
September
August
July
June
May
April
March
February
January
2023
December
November
October
September
August
July
June
May
April
March
February
January
2022
December
November
October
September
August
July
June
May
April
March
February
January
2021
December
November
October
September
August
July
June
May
April
March
February
January
2020
December
November
October
September
August
July
June
May
April
March
February
January
2019
December
November
October
September
August
July
June
May
April
March
February
January
2018
December
November
October
September
August
July
June
May
April
March
February
January
2017
December
November
October
September
August
July
June
May
April
March
February
January
2016
December
November
October
September
August
July
June
May
April
March
February
January
2015
December
November
October
September
August
July
June
May
April
March
February
January
2014
December
November
October
September
August
July
June
May
April
March
February
January
2013
December
November
October
September
August
July
June
May
April
March
February
January
2012
December
November
October
September
August
July
June
May
April
March
February
January
2011
December
November
October
September
August
July
June
May
April
March
February
January
2010
December
November
October
September
August
July
June
May
April
March
February
January
2009
December
November
October
September
August
July
June
May
April
List overview
Download
Pywikipedia-svn
----- 2024 -----
December 2024
November 2024
October 2024
September 2024
August 2024
July 2024
June 2024
May 2024
April 2024
March 2024
February 2024
January 2024
----- 2023 -----
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
----- 2022 -----
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
----- 2021 -----
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
----- 2020 -----
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
----- 2019 -----
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
----- 2018 -----
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
----- 2017 -----
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
----- 2016 -----
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
----- 2015 -----
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
----- 2014 -----
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
----- 2013 -----
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
----- 2012 -----
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
----- 2011 -----
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
----- 2010 -----
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
----- 2009 -----
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
pywikipedia-svn@lists.wikimedia.org
5163 discussions
Start a n
N
ew thread
SVN: [11632] branches/rewrite/pywikibot/family.py
by xqtďĽ svn.wikimedia.org
09 Jun '13
09 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11632
Revision: 11632 Author: xqt Date: 2013-06-09 11:30:27 +0000 (Sun, 09 Jun 2013) Log Message: ----------- mw 1.22wmf5 Modified Paths: -------------- branches/rewrite/pywikibot/family.py Modified: branches/rewrite/pywikibot/family.py =================================================================== --- branches/rewrite/pywikibot/family.py 2013-06-08 22:38:42 UTC (rev 11631) +++ branches/rewrite/pywikibot/family.py 2013-06-09 11:30:27 UTC (rev 11632) @@ -937,7 +937,7 @@ """Return Wikimedia projects version number as a string.""" # Don't use this, use versionnumber() instead. This only exists # to not break family files. - return '1.22wmf4' + return '1.22wmf5' def shared_image_repository(self, code): return ('commons', 'commons')
1
0
0
0
SVN: [11631] branches/rewrite/tests/ui_tests.py
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11631
Revision: 11631 Author: valhallasw Date: 2013-06-08 22:38:42 +0000 (Sat, 08 Jun 2013) Log Message: ----------- + unix coloring tests Modified Paths: -------------- branches/rewrite/tests/ui_tests.py Modified: branches/rewrite/tests/ui_tests.py =================================================================== --- branches/rewrite/tests/ui_tests.py 2013-06-08 22:25:52 UTC (rev 11630) +++ branches/rewrite/tests/ui_tests.py 2013-06-08 22:38:42 UTC (rev 11631) @@ -232,8 +232,40 @@ self.assertIsInstance(returned, unicode) self.assertEqual(returned, "n") + class TestTerminalOutputColorUnix(unittest.TestCase): + def setUp(self): + patch() + newstdout.truncate(0) + newstderr.truncate(0) + newstdin.truncate(0) + def tearDown(self): + unpatch() + def testOutputColorizedText(self): + pywikibot.config.colorized_output = True + pywikibot.output(u"normal text \03{lightpurple}light purple text\03{default} normal text") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "normal text \x1b[35;1mlight purple text\x1b[0m normal text\n\x1b[0m") + + @unittest.expectedFailure + def testOutputNoncolorizedText(self): + pywikibot.config.colorized_output = False + pywikibot.output(u"normal text \03{lightpurple}light purple text\03{default} normal text") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "normal text light purple text normal text ***") + + @unittest.expectedFailure + def testOutputColorCascade(self): + pywikibot.config.colorized_output = True + pywikibot.output(u"normal text \03{lightpurple} light purple \03{lightblue} light blue \03{default} light purple \03{default} normal text") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "normal text \x1b[35;1m light purple \x1b[94;1m light blue \x1b[35;1m light purple \x1b[0m normal text\n\x1b[0m") + + + + + try: try: unittest.main()
1
0
0
0
SVN: [11630] branches/rewrite/tests/ui_tests.py
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11630
Revision: 11630 Author: valhallasw Date: 2013-06-08 22:25:52 +0000 (Sat, 08 Jun 2013) Log Message: ----------- + UI input tests Modified Paths: -------------- branches/rewrite/tests/ui_tests.py Modified: branches/rewrite/tests/ui_tests.py =================================================================== --- branches/rewrite/tests/ui_tests.py 2013-06-08 22:20:26 UTC (rev 11629) +++ branches/rewrite/tests/ui_tests.py 2013-06-08 22:25:52 UTC (rev 11630) @@ -50,7 +50,7 @@ 'caller_line': 0, 'newline': "\n"} - class TestTerminalUI(unittest.TestCase): + class TestTerminalOutput(unittest.TestCase): def setUp(self): patch() newstdout.truncate(0) @@ -161,7 +161,79 @@ self.assertNotEqual(stderrlines[-1], "\n") + class TestTerminalInput(unittest.TestCase): + def setUp(self): + patch() + newstdout.truncate(0) + newstderr.truncate(0) + newstdin.truncate(0) + def tearDown(self): + unpatch() + + def testInput(self): + newstdin.write("input to read\n") + newstdin.seek(0) + + returned = pywikibot.input("question") + + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "question ") + + self.assertIsInstance(returned, unicode) + self.assertEqual(returned, u"input to read") + + @unittest.expectedFailure + def testInputChoiceDefault(self): + newstdin.write("\n") + newstdin.seek(0) + + returned = pywikibot.inputChoice("question", ["answer 1", "answer 2", "answer 3"], ["A", "N", "S"], "A") + + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "question ([A]nswer 1, a[N]swer 2, an[S]wer 3) ") + + self.assertIsInstance(returned, unicode) + self.assertEqual(returned, "a") + + def testInputChoiceCapital(self): + newstdin.write("N\n") + newstdin.seek(0) + + returned = pywikibot.inputChoice("question", ["answer 1", "answer 2", "answer 3"], ["A", "N", "S"], "A") + + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "question ([A]nswer 1, a[N]swer 2, an[S]wer 3) ") + + self.assertIsInstance(returned, unicode) + self.assertEqual(returned, "n") + + def testInputChoiceNonCapital(self): + newstdin.write("n\n") + newstdin.seek(0) + + returned = pywikibot.inputChoice("question", ["answer 1", "answer 2", "answer 3"], ["A", "N", "S"], "A") + + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "question ([A]nswer 1, a[N]swer 2, an[S]wer 3) ") + + self.assertIsInstance(returned, unicode) + self.assertEqual(returned, "n") + + def testInputChoiceIncorrectAnswer(self): + newstdin.write("X\nN\n") + newstdin.seek(0) + + returned = pywikibot.inputChoice("question", ["answer 1", "answer 2", "answer 3"], ["A", "N", "S"], "A") + + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "question ([A]nswer 1, a[N]swer 2, an[S]wer 3) "*2) + + self.assertIsInstance(returned, unicode) + self.assertEqual(returned, "n") + + + try: try: unittest.main()
1
0
0
0
SVN: [11629] trunk/pywikipedia/catimages.py
by drtrigonďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11629
Revision: 11629 Author: drtrigon Date: 2013-06-08 22:20:26 +0000 (Sat, 08 Jun 2013) Log Message: ----------- improvement; add 24 'Created with ...' categories (from 'Metadata') Modified Paths: -------------- trunk/pywikipedia/catimages.py Modified: trunk/pywikipedia/catimages.py =================================================================== --- trunk/pywikipedia/catimages.py 2013-06-08 22:09:06 UTC (rev 11628) +++ trunk/pywikipedia/catimages.py 2013-06-08 22:20:26 UTC (rev 11629) @@ -189,159 +189,194 @@ def _detect_HeaderAndMetadata(self): # check/look into the file by midnight commander (mc) - #
https://pypi.python.org/pypi/hachoir-metadata
+ # use exif as first hint - in fact gives also image-size, streams, ... -#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### -#try: -# from hachoir_core.error import error, HachoirError -# from hachoir_core.cmd_line import unicodeFilename -# from hachoir_core.i18n import getTerminalCharset, _ -# from hachoir_core.benchmark import Benchmark -# from hachoir_core.stream import InputStreamError -# from hachoir_core.tools import makePrintable -# from hachoir_parser import createParser, ParserList -# import hachoir_core.config as hachoir_config -# from hachoir_metadata import config -#except ImportError, err: -# raise -# print >>sys.stderr, "Unable to import an Hachoir module: %s" % err -# sys.exit(1) -#from optparse import OptionGroup, OptionParser -#from hachoir_metadata import extractMetadata -#from hachoir_metadata.metadata import extractors as metadata_extractors + exif = self._util_get_DataTags_EXIF() + #print exif + result = { 'Software': exif['Software'] if 'Software' in exif else u'-', + 'Output_Extension': exif['Output_extension'] if 'Output_extension' in exif else u'-', + 'Desc': exif['Desc'] if 'Desc' in exif else u'-', + 'DescProducer': exif['DescProducer'] if 'DescProducer' in exif else u'-', + 'DescCreator': exif['DescCreator'] if 'DescCreator' in exif else u'-', + 'Comment': exif['Comment'] if 'Comment' in exif else u'-', + 'Producer': exif['Producer'] if 'Producer' in exif else u'-',} + #'Comments': exif['Comments'] if 'Comments' in exif else u'-', + #'WorkDesc': exif['WorkDescription'] if 'WorkDescription' in exif else u'-', + ##'Dimensions': tuple(map(int, exif['ImageSize'].split(u'x'))),} + #'Dimensions': tuple(exif['ImageSize'].split(u'x')) if 'ImageSize' in exif else (None, None),} + #'Mode': exif['ColorType'], } + +# TODO: vvv +#* metadata template in commons has to be worked out and code adopted +#* like in 'Streams' a nice content listing of MIDI (exif or music21 - if needed at all?) +#* docu all this stuff in commons +#* docu and do all open things on "commons TODO list" # # -#def parseOptions(): -# parser = OptionParser(usage="%prog [options] files") -# parser.add_option("--type", help=_("Only display file type (description)"), -# action="store_true", default=False) -# parser.add_option("--mime", help=_("Only display MIME type"), -# action="store_true", default=False) -# parser.add_option("--level", -# help=_("Quantity of information to display from 1 to 9 (9 is the maximum)"), -# action="store", default="9", type="choice", -# choices=[ str(choice) for choice in xrange(1,9+1) ]) -# parser.add_option("--raw", help=_("Raw output"), -# action="store_true", default=False) -# parser.add_option("--bench", help=_("Run benchmark"), -# action="store_true", default=False) -# parser.add_option("--force-parser",help=_("List all parsers then exit"), -# type="str") -# parser.add_option("--profiler", help=_("Run profiler"), -# action="store_true", default=False) -# parser.add_option("--quality", help=_("Information quality (0.0=fastest, 1.0=best, and default is 0.5)"), -# action="store", type="float", default="0.5") -# parser.add_option("--maxlen", help=_("Maximum string length in characters, 0 means unlimited (default: %s)" % config.MAX_STR_LENGTH), -# type="int", default=config.MAX_STR_LENGTH) -# parser.add_option("--verbose", help=_("Verbose mode"), -# default=False, action="store_true") -# parser.add_option("--debug", help=_("Debug mode"), -# default=False, action="store_true") # -# values, filename = parser.parse_args() -# if len(filename) == 0: -# parser.print_help() -# sys.exit(1) -# -# # Update limits -# config.MAX_STR_LENGTH = values.maxlen -# if values.raw: -# config.RAW_OUTPUT = True -# -# return values, filename -# -#def processFile(values, filename, -#display_filename=False, priority=None, human=True, display=True): -# charset = getTerminalCharset() -# filename, real_filename = unicodeFilename(filename, charset), filename -# -# # Create parser -# try: -# if values.force_parser: -# tags = [ ("id", values.force_parser), None ] -# else: -# tags = None -# parser = createParser(filename, real_filename=real_filename, tags=tags) -# help(parser) -# print parser.getParserTags() -# print parser.PARSER_TAGS -# for i, item in enumerate(parser.createFields()): -# print item -# if i > 5: -# break -# except InputStreamError, err: -# error(unicode(err)) -# return False -# if not parser: -# error(_("Unable to parse file: %s") % filename) -# return False -# -# # Extract metadata -# extract_metadata = not(values.mime or values.type) -# if extract_metadata: -# try: -# metadata = extractMetadata(parser, values.quality) -# except HachoirError, err: -# error(unicode(err)) -# metadata = None -# if not metadata: -# parser.error(_("Hachoir can't extract metadata, but is able to parse: %s") -# % filename) -# return False -# -# if display: -# # Display metadatas on stdout -# if extract_metadata: -# text = metadata.exportPlaintext(priority=priority, human=human) -# if not text: -# text = [_("(no metadata, priority may be too small)")] -# if display_filename: -# for line in text: -# line = "%s: %s" % (filename, line) -# print makePrintable(line, charset) -# else: -# for line in text: -# print makePrintable(line, charset) -# else: -# if values.type: -# text = parser.description -# else: -# text = parser.mime_type -# if display_filename: -# text = "%s: %s" % (filename, text) -# print text -# return True -# -#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### -# -# def processFiles(values, filenames, display=True): -# human = not(values.raw) -# ok = True -# priority = int(values.level)*100 + 99 -# display_filename = (1 < len(filenames)) -# for filename in filenames: -# ok &= processFile(values, filename, display_filename, priority, human, display) -# return ok -# -# try: -# # Parser options and initialize Hachoir -# values, filenames = parseOptions() -# -# ok = processFiles(values, filenames) -# except KeyboardInterrupt: -# print _("Program interrupted (CTRL+C).") -# ok = False -# sys.exit(int(not ok)) -# -#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### +#(* initial audio midi support (music21)) +#[TODO: docu on Commons ... / template ...] - pass +# TODO: if '_detect_History' is not needed here, moveit back into _JpegFile !!! + #print "self._detect_History()" + #print self._detect_History() + #
https://pypi.python.org/pypi/hachoir-metadata
(needs 'core' and 'parser') + # + #from hachoir_core.error import HachoirError + #from hachoir_core.stream import InputStreamError + #from hachoir_parser import createParser + #import hachoir_core.config as hachoir_config + # + #from hachoir_metadata import extractMetadata + # + #hachoir_config.debug = True + #hachoir_config.verbose = True + #hachoir_config.quiet = True + # + ## Create parser + #try: + # parser = createParser(self.file_name.decode('utf-8'), + # real_filename=self.file_name.encode('utf-8'), + # tags=None) + # #print [val for val in enumerate(parser.createFields())] + # desc = parser.description + # ptags = parser.getParserTags() + #except (InputStreamError, AttributeError): + # desc = u'-' + # ptags = {} + # + ## Extract metadata + #try: + # # quality: 0.0 fastest, 1.0 best, and default is 0.5 + # metadata = extractMetadata(parser, 0.5) + # #mtags = dict([(key, metadata.getValues(key)) + # mtags = dict([(key, metadata.getValues(key)) # get, getItem, getItems, getText + # for key in metadata._Metadata__data.keys()#]) + # if metadata.getValues(key)]) + #except (HachoirError, AttributeError): + # mtags = {} + # + ##result = {'parser_desc': desc, 'parserdata': ptags, 'metadata': mtags} + ##print result + #print {'parser_desc': desc, 'parserdata': ptags, 'metadata': mtags} + # + ### Display metadatas on stdout + ##text = metadata.exportPlaintext(priority=None, human=False) + ##if not text: + ## text = [u"(no metadata, priority may be too small, try priority=999)"] + ##print u'\n'.join(text) + + self._properties['Metadata'] = [result] + #print self._properties['Metadata'] + return + def _detect_Properties(self): # get mime-type file-size, ... pass + def _util_get_DataTags_EXIF(self): + #
http://tilloy.net/dev/pyexiv2/tutorial.html
+ # (is UNFORTUNATELY NOT ABLE to handle all tags, e.g. 'FacesDetected', ...) + + if hasattr(self, '_buffer_EXIF'): + return self._buffer_EXIF + res = {} + enable_recovery() # enable recovery from hard crash + try: + if hasattr(pyexiv2, 'ImageMetadata'): + metadata = pyexiv2.ImageMetadata(self.file_name) + metadata.read() + + for key in metadata.exif_keys: + res[key] = metadata[key] + + for key in metadata.iptc_keys: + res[key] = metadata[key] + + for key in metadata.xmp_keys: + res[key] = metadata[key] + else: + image = pyexiv2.Image(self.file_name) + image.readMetadata() + + for key in image.exifKeys(): + res[key] = image[key] + + for key in image.iptcKeys(): + res[key] = image[key] + + #for key in image.xmpKeys(): + # res[key] = image[key] + except IOError: + pass + except RuntimeError: + pass + disable_recovery() # disable since everything worked out fine + + + #
http://www.sno.phy.queensu.ca/~phil/exiftool/
+ # MIGHT BE BETTER TO USE AS PYTHON MODULE; either by wrapper or perlmodule: + #
http://search.cpan.org/~gaas/pyperl-1.0/perlmodule.pod
+ # (or use C++ with embbedded perl to write a python module) + data = Popen("exiftool -j %s" % self.file_name, + shell=True, stdout=PIPE).stdout.read() + if not data: + raise ImportError("exiftool not found!") + try: # work-a-round for badly encoded exif data (from pywikibot/comms/http.py) + data = unicode(data, 'utf-8', errors = 'strict') + except UnicodeDecodeError: + data = unicode(data, 'utf-8', errors = 'replace') + #res = {} + data = re.sub("(?<!\")\(Binary data (?P<size>\d*) bytes\)", "\"(Binary data \g<size> bytes)\"", data) # work-a-round some issue + for item in json.loads(data): + res.update( item ) + #print res + self._buffer_EXIF = res + + return self._buffer_EXIF + + def _detect_History(self): + res = self._util_get_DataTags_EXIF() + + #a = [] + #for k in res.keys(): + # if 'history' in k.lower(): + # a.append( k ) + #for item in sorted(a): + # print item + #
http://tilloy.net/dev/pyexiv2/api.html#pyexiv2.xmp.XmpTag
+ #print [getattr(res['Xmp.xmpMM.History'], item) for item in ['key', 'type', 'name', 'title', 'description', 'raw_value', 'value', ]] + result = [] + i = 1 + while (('Xmp.xmpMM.History[%i]' % i) in res): + data = { 'ID': i, + 'Software': u'-', + 'Timestamp': u'-', + 'Action': u'-', + 'Info': u'-', } + if ('Xmp.xmpMM.History[%i]/stEvt:softwareAgent'%i) in res: + data['Software'] = res['Xmp.xmpMM.History[%i]/stEvt:softwareAgent'%i].value + data['Timestamp'] = res['Xmp.xmpMM.History[%i]/stEvt:when'%i].value + data['Action'] = res['Xmp.xmpMM.History[%i]/stEvt:action'%i].value + if ('Xmp.xmpMM.History[%i]/stEvt:changed'%i) in res: + data['Info'] = res['Xmp.xmpMM.History[%i]/stEvt:changed'%i].value + #print res['Xmp.xmpMM.History[%i]/stEvt:instanceID'%i].value + result.append( data ) + elif ('Xmp.xmpMM.History[%i]/stEvt:parameters'%i) in res: + data['Action'] = res['Xmp.xmpMM.History[%i]/stEvt:action'%i].value + data['Info'] = res['Xmp.xmpMM.History[%i]/stEvt:parameters'%i].value + #data['Action'] = data['Info'].split(' ')[0] + result.append( data ) + else: + pass + i += 1 + + self._features['History'] = result + return + + class _JpegFile(_UnknownFile): # for '_detect_Trained' cascade_files = [(u'Legs', 'haarcascade_lowerbody.xml'), @@ -1965,6 +2000,7 @@ #self._util_drawAxes(mat, 250, 350, im) #self._util_drawAxes(mat, 50, 50, im) +# TODO: compare face and chessboard pose estimations and unify them, then document everything (template in wiki, ...) pywikibot.output(u'result for calibrated camera:\n rot=%s\n perp=%s\n perp2D=%s' % (rot.transpose()[0], perp[:,2], ortho)) pywikibot.output(u'nice would be to do the same for uncalibrated/default cam settings') @@ -2045,68 +2081,6 @@ # cv2.line(im, (x,y), (x+D2norm[0].astype(int),y+D2norm[1].astype(int)), color[i], 1) # cv2.putText(im, label[i], (x+D2norm[0].astype(int),y+D2norm[1].astype(int)), cv2.FONT_HERSHEY_PLAIN, 1., color[i]) - def _util_get_DataTags_EXIF(self): - #
http://tilloy.net/dev/pyexiv2/tutorial.html
- # (is UNFORTUNATELY NOT ABLE to handle all tags, e.g. 'FacesDetected', ...) - - if hasattr(self, '_buffer_EXIF'): - return self._buffer_EXIF - - res = {} - enable_recovery() # enable recovery from hard crash - try: - if hasattr(pyexiv2, 'ImageMetadata'): - metadata = pyexiv2.ImageMetadata(self.image_path) - metadata.read() - - for key in metadata.exif_keys: - res[key] = metadata[key] - - for key in metadata.iptc_keys: - res[key] = metadata[key] - - for key in metadata.xmp_keys: - res[key] = metadata[key] - else: - image = pyexiv2.Image(self.image_path) - image.readMetadata() - - for key in image.exifKeys(): - res[key] = image[key] - - for key in image.iptcKeys(): - res[key] = image[key] - - #for key in image.xmpKeys(): - # res[key] = image[key] - except IOError: - pass - except RuntimeError: - pass - disable_recovery() # disable since everything worked out fine - - - #
http://www.sno.phy.queensu.ca/~phil/exiftool/
- # MIGHT BE BETTER TO USE AS PYTHON MODULE; either by wrapper or perlmodule: - #
http://search.cpan.org/~gaas/pyperl-1.0/perlmodule.pod
- # (or use C++ with embbedded perl to write a python module) - data = Popen("exiftool -j %s" % self.image_path, - shell=True, stdout=PIPE).stdout.read() - if not data: - raise ImportError("exiftool not found!") - try: # work-a-round for badly encoded exif data (from pywikibot/comms/http.py) - data = unicode(data, 'utf-8', errors = 'strict') - except UnicodeDecodeError: - data = unicode(data, 'utf-8', errors = 'replace') - #res = {} - data = re.sub("(?<!\")\(Binary data (?P<size>\d*) bytes\)", "\"(Binary data \g<size> bytes)\"", data) # work-a-round some issue - for item in json.loads(data): - res.update( item ) - #print res - self._buffer_EXIF = res - - return self._buffer_EXIF - def _detect_Faces_EXIF(self): res = self._util_get_DataTags_EXIF() @@ -2282,46 +2256,7 @@ self._features['Faces'] += data return - - def _detect_History(self): - res = self._util_get_DataTags_EXIF() - #a = [] - #for k in res.keys(): - # if 'history' in k.lower(): - # a.append( k ) - #for item in sorted(a): - # print item - #
http://tilloy.net/dev/pyexiv2/api.html#pyexiv2.xmp.XmpTag
- #print [getattr(res['Xmp.xmpMM.History'], item) for item in ['key', 'type', 'name', 'title', 'description', 'raw_value', 'value', ]] - result = [] - i = 1 - while (('Xmp.xmpMM.History[%i]' % i) in res): - data = { 'ID': i, - 'Software': u'-', - 'Timestamp': u'-', - 'Action': u'-', - 'Info': u'-', } - if ('Xmp.xmpMM.History[%i]/stEvt:softwareAgent'%i) in res: - data['Software'] = res['Xmp.xmpMM.History[%i]/stEvt:softwareAgent'%i].value - data['Timestamp'] = res['Xmp.xmpMM.History[%i]/stEvt:when'%i].value - data['Action'] = res['Xmp.xmpMM.History[%i]/stEvt:action'%i].value - if ('Xmp.xmpMM.History[%i]/stEvt:changed'%i) in res: - data['Info'] = res['Xmp.xmpMM.History[%i]/stEvt:changed'%i].value - #print res['Xmp.xmpMM.History[%i]/stEvt:instanceID'%i].value - result.append( data ) - elif ('Xmp.xmpMM.History[%i]/stEvt:parameters'%i) in res: - data['Action'] = res['Xmp.xmpMM.History[%i]/stEvt:action'%i].value - data['Info'] = res['Xmp.xmpMM.History[%i]/stEvt:parameters'%i].value - #data['Action'] = data['Info'].split(' ')[0] - result.append( data ) - else: - pass - i += 1 - - self._features['History'] = result - return - def _util_merge_Regions(self, regs, sub=False, overlap=False, close=False): # sub=False, overlap=False, close=False ; level 0 ; similar regions, similar position (default) # sub=True, overlap=False, close=False ; level 1 ; region contained in other, any shape/size @@ -2964,15 +2899,19 @@ return self._features def _detect_HeaderAndMetadata(self): - result = {} + #_UnknownFile._detect_HeaderAndMetadata(self) + #result = {'Desc': self._properties['Metadata'][0]['Desc'].splitlines()} + result = {'Desc': []} + # extract data from midi file #
http://valentin.dasdeck.com/midi/midifile.htm
#
http://stackoverflow.com/questions/3943149/reading-and-interpreting-data-fr…
ba = bytearray(open(self.file_name, 'rb').read()) i = -1 - for key, data in [('Text', '\x01'), ('Copyright', '\x02'), ('Lyrics', '\x05')]: - result[key] = [] + for key, data in [('Text', '\x01'), ('Copyright', '\x02')]:#, ('Lyrics', '\x05')]: + key = 'Desc' + #result[key] = [] while True: i = ba.find('\xff%s' % data, i+1) if i < 0: # something found? @@ -2981,7 +2920,10 @@ if ba[e] != 0: # length match with string end (00)? e = ba.find('\x00', (i+3+ba[i+2])) result[key].append(ba[i+3:e].decode('latin-1').strip()) - result[key] = u'\n'.join(result[key]) + #result[key] = u'\n'.join(result[key]) + result[key] = u'\n'.join(result[key]) + if not result['Desc']: + result['Desc'] = u'-' ## find specific info in extracted data #print [item.strip() for item in re.findall('Generated .*?\n', result['Text'])] @@ -3076,19 +3018,22 @@ return +#
http://commons.wikimedia.org/wiki/File_formats
_FILETYPES = { '*': _UnknownFile, ( 'image', 'jpeg'): _JpegFile, ( 'image', 'png'): _PngFile, ( 'image', 'gif'): _GifFile, ( 'image', 'tiff'): _TiffFile, ( 'image', 'x-xcf'): _XcfFile, - ( 'image', 'svg+xml'): _SvgFile, + ( 'image', 'svg+xml'): _SvgFile, # unify/merge them? + ('application', 'xml'): _SvgFile, # ('application', 'pdf'): _PdfFile, # djvu: python-djvulibre or python-djvu for djvu support #
http://pypi.python.org/pypi/python-djvulibre/0.3.9
# ( 'image', 'vnd.djvu'): DjvuFile, - ('application', 'ogg'): _OggFile, - ( 'audio', 'midi'): _MidiFile,} + ( 'audio', 'midi'): _MidiFile, + ('application', 'ogg'): _OggFile,} +# ( '?', '?'): _WebMFile,} def GenericFile(file_name): # 'magic' (libmagic) @@ -3327,11 +3272,257 @@ # Category:MIDI files created with GNU LilyPond def _cat_meta_MIDIfilescreatedwithGNULilyPond(self): result = self._info_filter['Metadata'] - relevance = (u"Generated automatically by: GNU LilyPond" in - result[0]['Text']) + relevance = len(result) and ('Desc' in result[0]) and \ + (u"Generated automatically by: GNU LilyPond" in + result[0]['Desc']) return (u'MIDI files created with GNU LilyPond', bool(relevance)) + # Category:Bitmap_from_Inkscape (png) + def _cat_meta_BitmapfromInkscape(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"
www.inkscape.org
" in + result[0]['Software'].lower()) + + return (u'Bitmap from Inkscape', bool(relevance)) + + # Category:Created_with_Inkscape (svg) + def _cat_meta_CreatedwithInkscape(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Output_Extension' in result[0]) and \ + (u"org.inkscape.output.svg.inkscape" in + result[0]['Output_Extension'].lower()) + + return (u'Created with Inkscape', bool(relevance)) + + # Category:Created_with_MATLAB (png) + # Category:Created_with_MATLAB (svg) + def _cat_meta_CreatedwithMATLAB(self): + result = self._info_filter['Metadata'] + relevance = len(result) and \ + ((('Software' in result[0]) and \ + (u"MATLAB, The Mathworks, Inc." in + result[0]['Software'])) \ + or \ + (('Desc' in result[0]) and \ + (u"Matlab Figure" in + result[0]['Desc'])) ) + + return (u'Created with MATLAB', bool(relevance)) + + # Category:Created_with_PLOT2SVG (svg) [new] + def _cat_meta_CreatedwithPLOT2SVG(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Desc' in result[0]) and \ + (u"Converted by PLOT2SVG" in + result[0]['Desc']) + + return (u'Created with PLOT2SVG', bool(relevance)) + + # Category:Created_with_ImageMagick (jpg) + def _cat_meta_CreatedwithImageMagick(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"ImageMagick" in + result[0]['Software']) + + return (u'Created with ImageMagick', bool(relevance)) + + # Category:Created_with_Adobe_ImageReady (png) + def _cat_meta_CreatedwithAdobeImageReady(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Adobe ImageReady" in + result[0]['Software']) + + return (u'Created with Adobe ImageReady', bool(relevance)) + + # Category:Created_with_Adobe_Photoshop (jpg) + def _cat_meta_CreatedwithAdobePhotoshop(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Adobe Photoshop" in + result[0]['Software']) + + return (u'Created with Adobe Photoshop', bool(relevance)) + + # Category:Created_with_Picasa (jpg) + def _cat_meta_CreatedwithPicasa(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Picasa" in + result[0]['Software']) + + return (u'Created with Picasa', bool(relevance)) + + # Category:Created_with_Qtpfsgui (jpg) + def _cat_meta_CreatedwithQtpfsgui(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Created with opensource tool Qtpfsgui" in + result[0]['Software']) + + return (u'Created with Qtpfsgui', bool(relevance)) + + # Category:Created_with_Autopano (jpg) + def _cat_meta_CreatedwithAutopano(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Autopano" in + result[0]['Software']) + + return (u'Created with Autopano', bool(relevance)) + + # Category:Created_with_Xmgrace (png) + def _cat_meta_CreatedwithXmgrace(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Grace" in + result[0]['Software']) + + return (u'Created with Xmgrace', bool(relevance)) + + # Category:Created_with_darktable (jpg) + def _cat_meta_Createdwithdarktable(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"darktable" in + result[0]['Software'].lower()) + + return (u'Created with darktable', bool(relevance)) + + # Category:Created_with_easyHDR (jpg) + def _cat_meta_CreatedwitheasyHDR(self): + result = self._info_filter['Metadata'] + relevance = len(result) and \ + ((('Software' in result[0]) and \ + (u"easyHDR" in + result[0]['Software'])) \ + or \ + (('Comment' in result[0]) and \ + (u"easyHDR" in + result[0]['Comment'])) ) + + return (u'Created with easyHDR', bool(relevance)) + + # Category:Created_with_GIMP (jpg) [new] + def _cat_meta_CreatedwithGIMP(self): + result = self._info_filter['Metadata'] + relevance = len(result) and \ + ((('Software' in result[0]) and \ + (u"GIMP" in + result[0]['Software'])) \ + or \ + (('Comment' in result[0]) and \ + (u"Created with GIMP" in + result[0]['Comment'])) ) + + return (u'Created with GIMP', bool(relevance)) + + # Category:Created_with_R (svg) + def _cat_meta_CreatedwithR(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Desc' in result[0]) and \ + (u"R SVG" in + result[0]['Desc']) + + return (u'Created with R', bool(relevance)) + + # Category:Created_with_VectorFieldPlot (svg) + def _cat_meta_CreatedwithVectorFieldPlot(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Desc' in result[0]) and \ + (u"created with VectorFieldPlot" in + result[0]['Desc']) + + return (u'Created with VectorFieldPlot', bool(relevance)) + + # Category:Created_with_Chemtool (svg) + def _cat_meta_CreatedwithChemtool(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Desc' in result[0]) and \ + (u"Created with Chemtool" in + result[0]['Desc']) + + return (u'Created with Chemtool', bool(relevance)) + + # Category:Created_with_GNU_Octave (svg) + def _cat_meta_CreatedwithGNUOctave(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Desc' in result[0]) and \ + (u"Produced by GNUPLOT" in + result[0]['Desc']) + + return (u'Created with GNU Octave', bool(relevance)) + + # Category:Created_with_GeoGebra (svg) + def _cat_meta_CreatedwithGeoGebra(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('DescProducer' in result[0]) and \ + (u"geogebra.d.W" in + result[0]['DescProducer']) #and \ + #(u"FreeHEP Graphics2D Driver" in + # result[0]['DescCreator']) + + return (u'Created with GeoGebra', bool(relevance)) + + # Category:Created_with_Stella (png) + def _cat_meta_CreatedwithStella(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Comment' in result[0]) and \ + (u"Created using Stella4D" in + result[0]['Comment']) + + return (u'Created with Stella', bool(relevance)) + + # Category:Created_with_PhotoStitch (jpg) + def _cat_meta_CreatedwithPhotoStitch(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Comment' in result[0]) and \ + (u"LEAD Technologies Inc." in + result[0]['Comment']) + + return (u'Created with PhotoStitch', bool(relevance)) + + # Category:Created_with_Scribus (pdf) + def _cat_meta_CreatedwithScribus(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Producer' in result[0]) and \ + (u"Scribus PDF Library" in + result[0]['Producer']) + + return (u'Created with Scribus', bool(relevance)) + + #
Category:Created_with_OpenOffice.org
(pdf) + def _cat_meta_CreatedwithOpenOfficeorg(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Producer' in result[0]) and \ + (u"
OpenOffice.org
" in + result[0]['Producer']) + + return (u'Created with
OpenOffice.org
', bool(relevance)) + + # Category:Created_with_Tux_Paint (pdf) + def _cat_meta_CreatedwithTuxPaint(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Tux Paint" in + result[0]['Software']) + + return (u'Created with Tux Paint', bool(relevance)) + + # Category:Created_with_Microsoft_Image_Composite_Editor (jpg) + def _cat_meta_CreatedwithMicrosoftImageCompositeEditor(self): + result = self._info_filter['Metadata'] + relevance = len(result) and ('Software' in result[0]) and \ + (u"Microsoft ICE" in + result[0]['Software']) + + return (u'Created with Microsoft Image Composite Editor', bool(relevance)) + +# TODO: make '_cat_meta_general(self)' + # Category:Categorized by DrTrigonBot def _addcat_BOT(self): # - ALWAYS -
1
0
0
0
SVN: [11628] branches/rewrite/tests/ui_tests.py
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11628
Revision: 11628 Author: valhallasw Date: 2013-06-08 22:09:06 +0000 (Sat, 08 Jun 2013) Log Message: ----------- Additional UI tests, now using pywikibot.* functions Modified Paths: -------------- branches/rewrite/tests/ui_tests.py Modified: branches/rewrite/tests/ui_tests.py =================================================================== --- branches/rewrite/tests/ui_tests.py 2013-06-08 21:23:39 UTC (rev 11627) +++ branches/rewrite/tests/ui_tests.py 2013-06-08 22:09:06 UTC (rev 11628) @@ -100,7 +100,68 @@ self.assertEqual(newstdout.getvalue(), "") self.assertEqual(newstderr.getvalue(), "CRITICAL: CRITICAL\n") - + def test_output(self): + pywikibot.output("output", toStdout=False) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "output\n") + + def test_output(self): + pywikibot.output("output", toStdout=True) + self.assertEqual(newstdout.getvalue(), "output\n") + self.assertEqual(newstderr.getvalue(), "") + + def test_warning(self): + pywikibot.warning("warning") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "WARNING: warning\n") + + def test_error(self): + pywikibot.error("error") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "ERROR: error\n") + + def test_log(self): + pywikibot.log("log") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") + + def test_critical(self): + pywikibot.critical("critical") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "CRITICAL: critical\n") + + def test_debug(self): + pywikibot.debug("debug", "test") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") + + def test_exception(self): + class TestException(Exception): + pass + try: + raise TestException("Testing Exception") + except TestException: + pywikibot.exception("exception") + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "ERROR: TestException: Testing Exception\n") + + def test_exception(self): + class TestException(Exception): + pass + try: + raise TestException("Testing Exception") + except TestException: + pywikibot.exception("exception", tb=True) + self.assertEqual(newstdout.getvalue(), "") + stderrlines = newstderr.getvalue().split("\n") + self.assertEqual(stderrlines[0], "ERROR: TestException: Testing Exception") + self.assertEqual(stderrlines[1], "Traceback (most recent call last):") + self.assertEqual(stderrlines[3], """ raise TestException("Testing Exception")""") + self.assertEqual(stderrlines[4], "TestException: Testing Exception") + + self.assertNotEqual(stderrlines[-1], "\n") + + try: try: unittest.main()
1
0
0
0
SVN: [11627] branches/rewrite/tests/ui_tests.py
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11627
Revision: 11627 Author: valhallasw Date: 2013-06-08 21:23:39 +0000 (Sat, 08 Jun 2013) Log Message: ----------- Improved reporting when ui_tests are run as part of setup.py test Modified Paths: -------------- branches/rewrite/tests/ui_tests.py Modified: branches/rewrite/tests/ui_tests.py =================================================================== --- branches/rewrite/tests/ui_tests.py 2013-06-08 21:00:06 UTC (rev 11626) +++ branches/rewrite/tests/ui_tests.py 2013-06-08 21:23:39 UTC (rev 11627) @@ -15,95 +15,92 @@ import StringIO import logging -if __name__ != "__main__": - raise Exception('This test can only be run as single file due to heavy monkey patching') +if __name__ == "__main__": + import sys -import sys + oldstderr = sys.stderr + oldstdout = sys.stdout + oldstdin = sys.stdin -oldstderr = sys.stderr -oldstdout = sys.stdout -oldstdin = sys.stdin + newstdout = cStringIO.StringIO() + newstderr = cStringIO.StringIO() + newstdin = StringIO.StringIO() -newstdout = cStringIO.StringIO() -newstderr = cStringIO.StringIO() -newstdin = StringIO.StringIO() + def patch(): + sys.stdout = newstdout + sys.stderr = newstderr + sys.stdin = newstdin -def patch(): - sys.stdout = newstdout - sys.stderr = newstderr - sys.stdin = newstdin + def unpatch(): + sys.stdout = oldstdout + sys.stderr = oldstderr + sys.stdin = oldstdin -def unpatch(): - sys.stdout = oldstdout - sys.stderr = oldstderr - sys.stdin = oldstdin + try: + patch() + import pywikibot + finally: + unpatch() -try: - patch() - import pywikibot -finally: - unpatch() + from pywikibot.bot import DEBUG, VERBOSE, INFO, STDOUT, INPUT, WARNING, ERROR, CRITICAL -from pywikibot.bot import DEBUG, VERBOSE, INFO, STDOUT, INPUT, WARNING, ERROR, CRITICAL + logger = logging.getLogger('pywiki') + loggingcontext = {'caller_name': "ui_tests", + 'caller_file': "ui_tests", + 'caller_line': 0, + 'newline': "\n"} -logger = logging.getLogger('pywiki') -loggingcontext = {'caller_name': "ui_tests", - 'caller_file': "ui_tests", - 'caller_line': 0, - 'newline': "\n"} + class TestTerminalUI(unittest.TestCase): + def setUp(self): + patch() + newstdout.truncate(0) + newstderr.truncate(0) + newstdin.truncate(0) -class TestTerminalUI(unittest.TestCase): - def setUp(self): - patch() - newstdout.truncate(0) - newstderr.truncate(0) - newstdin.truncate(0) + def tearDown(self): + unpatch() - def tearDown(self): - unpatch() + def testOutputLevels_logging_debug(self): + logger.log(DEBUG, 'debug', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") - def testOutputLevels_logging_debug(self): - logger.log(DEBUG, 'debug', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "") + def testOutputLevels_logging_verbose(self): + logger.log(VERBOSE, 'verbose', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") - def testOutputLevels_logging_verbose(self): - logger.log(VERBOSE, 'verbose', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "") + def testOutputLevels_logging_info(self): + logger.log(INFO, 'info', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "info\n") - def testOutputLevels_logging_info(self): - logger.log(INFO, 'info', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "info\n") + def testOutputLevels_logging_stdout(self): + logger.log(STDOUT, 'stdout', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "stdout\n") + self.assertEqual(newstderr.getvalue(), "") - def testOutputLevels_logging_stdout(self): - logger.log(STDOUT, 'stdout', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "stdout\n") - self.assertEqual(newstderr.getvalue(), "") + def testOutputLevels_logging_input(self): + logger.log(INPUT, 'input', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "input\n") - def testOutputLevels_logging_input(self): - logger.log(INPUT, 'input', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "input\n") + def testOutputLevels_logging_WARNING(self): + logger.log(WARNING, 'WARNING', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "WARNING: WARNING\n") - def testOutputLevels_logging_WARNING(self): - logger.log(WARNING, 'WARNING', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "WARNING: WARNING\n") + def testOutputLevels_logging_ERROR(self): + logger.log(ERROR, 'ERROR', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "ERROR: ERROR\n") - def testOutputLevels_logging_ERROR(self): - logger.log(ERROR, 'ERROR', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "ERROR: ERROR\n") + def testOutputLevels_logging_CRITICAL(self): + logger.log(CRITICAL, 'CRITICAL', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "CRITICAL: CRITICAL\n") - def testOutputLevels_logging_CRITICAL(self): - logger.log(CRITICAL, 'CRITICAL', extra=loggingcontext) - self.assertEqual(newstdout.getvalue(), "") - self.assertEqual(newstderr.getvalue(), "CRITICAL: CRITICAL\n") - - -if __name__ == '__main__': + try: try: unittest.main() @@ -112,3 +109,10 @@ finally: unpatch() pywikibot.stopme() + +else: + class TestTerminalUI(unittest.TestCase): + @unittest.skip("Terminal UI tests can only be run by directly running tests/ui_tests.py") + def testCannotBeRun(self): + pass +
1
0
0
0
SVN: [11626] branches/rewrite/tests/ui_tests.py
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11626
Revision: 11626 Author: valhallasw Date: 2013-06-08 21:00:06 +0000 (Sat, 08 Jun 2013) Log Message: ----------- Initial test cases for user interface tests Added Paths: ----------- branches/rewrite/tests/ui_tests.py Added: branches/rewrite/tests/ui_tests.py =================================================================== --- branches/rewrite/tests/ui_tests.py (rev 0) +++ branches/rewrite/tests/ui_tests.py 2013-06-08 21:00:06 UTC (rev 11626) @@ -0,0 +1,114 @@ +# -*- coding: utf-8 -*- +""" +Tests for the page module. +""" +# +# (C) Pywikipedia bot team, 2008 +# +# Distributed under the terms of the MIT license. +# +__version__ = '$Id: page_tests.py 11625 2013-06-08 19:55:59Z valhallasw $' + + +import unittest +import cStringIO +import StringIO +import logging + +if __name__ != "__main__": + raise Exception('This test can only be run as single file due to heavy monkey patching') + +import sys + +oldstderr = sys.stderr +oldstdout = sys.stdout +oldstdin = sys.stdin + +newstdout = cStringIO.StringIO() +newstderr = cStringIO.StringIO() +newstdin = StringIO.StringIO() + +def patch(): + sys.stdout = newstdout + sys.stderr = newstderr + sys.stdin = newstdin + +def unpatch(): + sys.stdout = oldstdout + sys.stderr = oldstderr + sys.stdin = oldstdin + +try: + patch() + import pywikibot +finally: + unpatch() + +from pywikibot.bot import DEBUG, VERBOSE, INFO, STDOUT, INPUT, WARNING, ERROR, CRITICAL + +logger = logging.getLogger('pywiki') +loggingcontext = {'caller_name': "ui_tests", + 'caller_file': "ui_tests", + 'caller_line': 0, + 'newline': "\n"} + +class TestTerminalUI(unittest.TestCase): + def setUp(self): + patch() + newstdout.truncate(0) + newstderr.truncate(0) + newstdin.truncate(0) + + def tearDown(self): + unpatch() + + def testOutputLevels_logging_debug(self): + logger.log(DEBUG, 'debug', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") + + def testOutputLevels_logging_verbose(self): + logger.log(VERBOSE, 'verbose', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "") + + def testOutputLevels_logging_info(self): + logger.log(INFO, 'info', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "info\n") + + def testOutputLevels_logging_stdout(self): + logger.log(STDOUT, 'stdout', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "stdout\n") + self.assertEqual(newstderr.getvalue(), "") + + def testOutputLevels_logging_input(self): + logger.log(INPUT, 'input', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "input\n") + + def testOutputLevels_logging_WARNING(self): + logger.log(WARNING, 'WARNING', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "WARNING: WARNING\n") + + def testOutputLevels_logging_ERROR(self): + logger.log(ERROR, 'ERROR', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "ERROR: ERROR\n") + + def testOutputLevels_logging_CRITICAL(self): + logger.log(CRITICAL, 'CRITICAL', extra=loggingcontext) + self.assertEqual(newstdout.getvalue(), "") + self.assertEqual(newstderr.getvalue(), "CRITICAL: CRITICAL\n") + + +if __name__ == '__main__': + try: + try: + unittest.main() + except SystemExit: + pass + finally: + unpatch() + pywikibot.stopme()
1
0
0
0
SVN: [11625] branches/rewrite
by valhallaswďĽ svn.wikimedia.org
08 Jun '13
08 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11625
Revision: 11625 Author: valhallasw Date: 2013-06-08 19:55:59 +0000 (Sat, 08 Jun 2013) Log Message: ----------- Improved extensionability of ItemPage * fromPage now returns the most specialised form in the class hieriarchy, e.g. MyItemPage.fromPage will return MyItemPage objects instead of ItemPage objects. (includes test) * instead of using ParentClass.function, use super(ThisClass, self).function Modified Paths: -------------- branches/rewrite/pywikibot/page.py branches/rewrite/tests/page_tests.py Modified: branches/rewrite/pywikibot/page.py =================================================================== --- branches/rewrite/pywikibot/page.py 2013-06-07 23:51:26 UTC (rev 11624) +++ branches/rewrite/pywikibot/page.py 2013-06-08 19:55:59 UTC (rev 11625) @@ -2415,21 +2415,20 @@ site=pywikibot.DataSite & title=Q42 site=pywikibot.Site & title=Main Page """ - WikibasePage.__init__(self, site, title, ns=0) + super(ItemPage, self).__init__(site, title, ns=0) self.id = title - @staticmethod - def fromPage(page): + @classmethod + def fromPage(cls, page): """ Get the ItemPage based on a Page that links to it """ repo = page.site.data_repository() - i = ItemPage(repo, 'null') + i = cls(repo, 'null') del i.id i._site = page.site i._title = page.title() return i - #return ItemPage(page.site, page.title()) def __make_site(self, dbname): """ @@ -2447,7 +2446,7 @@ args are the values of props """ if force or not hasattr(self, '_content'): - WikibasePage.get(self, force=force, *args) + super(ItemPage, self).get(force=force, *args) #claims self.claims = {} Modified: branches/rewrite/tests/page_tests.py =================================================================== --- branches/rewrite/tests/page_tests.py 2013-06-07 23:51:26 UTC (rev 11624) +++ branches/rewrite/tests/page_tests.py 2013-06-08 19:55:59 UTC (rev 11625) @@ -311,8 +311,11 @@ self.assertEqual(prop.getType(), 'wikibase-item') self.assertEqual(prop.namespace(), 120) + def testItemPageExtensionability(self): + class MyItemPage(pywikibot.ItemPage): + pass + self.assertIsInstance(MyItemPage.fromPage(mainpage), MyItemPage) - # methods that still need tests implemented or expanded: ## def autoFormat(self):
1
0
0
0
SVN: [11624] trunk/pywikipedia/catimages.py
by drtrigonďĽ svn.wikimedia.org
07 Jun '13
07 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11624
Revision: 11624 Author: drtrigon Date: 2013-06-07 23:51:26 +0000 (Fri, 07 Jun 2013) Log Message: ----------- improvement; most methods renamed, wrapper GenericFile introduced Modified Paths: -------------- trunk/pywikipedia/catimages.py Modified: trunk/pywikipedia/catimages.py =================================================================== --- trunk/pywikipedia/catimages.py 2013-06-07 22:29:18 UTC (rev 11623) +++ trunk/pywikipedia/catimages.py 2013-06-07 23:51:26 UTC (rev 11624) @@ -144,9 +144,10 @@ # all detection and recognition methods - bindings to other classes, modules and libs -class UnknownFile(object): - def __init__(self, filename, *args, **kwargs): - self.filename = filename +class _UnknownFile(object): + def __init__(self, file_name, file_mime, *args, **kwargs): + self.file_name = file_name + self.file_mime = file_mime self.image_size = (None, None) # available file properties and metadata @@ -183,12 +184,157 @@ return self._properties def getFeatures(self): - pywikibot.warning(u"File format '%s/%s' not supported (yet)!" % tuple(self.image_mime[:2])) + pywikibot.warning(u"File format '%s/%s' not supported (yet)!" % tuple(self.file_mime[:2])) return self._features def _detect_HeaderAndMetadata(self): # check/look into the file by midnight commander (mc) #
https://pypi.python.org/pypi/hachoir-metadata
+ +#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### +#try: +# from hachoir_core.error import error, HachoirError +# from hachoir_core.cmd_line import unicodeFilename +# from hachoir_core.i18n import getTerminalCharset, _ +# from hachoir_core.benchmark import Benchmark +# from hachoir_core.stream import InputStreamError +# from hachoir_core.tools import makePrintable +# from hachoir_parser import createParser, ParserList +# import hachoir_core.config as hachoir_config +# from hachoir_metadata import config +#except ImportError, err: +# raise +# print >>sys.stderr, "Unable to import an Hachoir module: %s" % err +# sys.exit(1) +#from optparse import OptionGroup, OptionParser +#from hachoir_metadata import extractMetadata +#from hachoir_metadata.metadata import extractors as metadata_extractors +# +# +#def parseOptions(): +# parser = OptionParser(usage="%prog [options] files") +# parser.add_option("--type", help=_("Only display file type (description)"), +# action="store_true", default=False) +# parser.add_option("--mime", help=_("Only display MIME type"), +# action="store_true", default=False) +# parser.add_option("--level", +# help=_("Quantity of information to display from 1 to 9 (9 is the maximum)"), +# action="store", default="9", type="choice", +# choices=[ str(choice) for choice in xrange(1,9+1) ]) +# parser.add_option("--raw", help=_("Raw output"), +# action="store_true", default=False) +# parser.add_option("--bench", help=_("Run benchmark"), +# action="store_true", default=False) +# parser.add_option("--force-parser",help=_("List all parsers then exit"), +# type="str") +# parser.add_option("--profiler", help=_("Run profiler"), +# action="store_true", default=False) +# parser.add_option("--quality", help=_("Information quality (0.0=fastest, 1.0=best, and default is 0.5)"), +# action="store", type="float", default="0.5") +# parser.add_option("--maxlen", help=_("Maximum string length in characters, 0 means unlimited (default: %s)" % config.MAX_STR_LENGTH), +# type="int", default=config.MAX_STR_LENGTH) +# parser.add_option("--verbose", help=_("Verbose mode"), +# default=False, action="store_true") +# parser.add_option("--debug", help=_("Debug mode"), +# default=False, action="store_true") +# +# values, filename = parser.parse_args() +# if len(filename) == 0: +# parser.print_help() +# sys.exit(1) +# +# # Update limits +# config.MAX_STR_LENGTH = values.maxlen +# if values.raw: +# config.RAW_OUTPUT = True +# +# return values, filename +# +#def processFile(values, filename, +#display_filename=False, priority=None, human=True, display=True): +# charset = getTerminalCharset() +# filename, real_filename = unicodeFilename(filename, charset), filename +# +# # Create parser +# try: +# if values.force_parser: +# tags = [ ("id", values.force_parser), None ] +# else: +# tags = None +# parser = createParser(filename, real_filename=real_filename, tags=tags) +# help(parser) +# print parser.getParserTags() +# print parser.PARSER_TAGS +# for i, item in enumerate(parser.createFields()): +# print item +# if i > 5: +# break +# except InputStreamError, err: +# error(unicode(err)) +# return False +# if not parser: +# error(_("Unable to parse file: %s") % filename) +# return False +# +# # Extract metadata +# extract_metadata = not(values.mime or values.type) +# if extract_metadata: +# try: +# metadata = extractMetadata(parser, values.quality) +# except HachoirError, err: +# error(unicode(err)) +# metadata = None +# if not metadata: +# parser.error(_("Hachoir can't extract metadata, but is able to parse: %s") +# % filename) +# return False +# +# if display: +# # Display metadatas on stdout +# if extract_metadata: +# text = metadata.exportPlaintext(priority=priority, human=human) +# if not text: +# text = [_("(no metadata, priority may be too small)")] +# if display_filename: +# for line in text: +# line = "%s: %s" % (filename, line) +# print makePrintable(line, charset) +# else: +# for line in text: +# print makePrintable(line, charset) +# else: +# if values.type: +# text = parser.description +# else: +# text = parser.mime_type +# if display_filename: +# text = "%s: %s" % (filename, text) +# print text +# return True +# +#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### +# +# def processFiles(values, filenames, display=True): +# human = not(values.raw) +# ok = True +# priority = int(values.level)*100 + 99 +# display_filename = (1 < len(filenames)) +# for filename in filenames: +# ok &= processFile(values, filename, display_filename, priority, human, display) +# return ok +# +# try: +# # Parser options and initialize Hachoir +# values, filenames = parseOptions() +# +# ok = processFiles(values, filenames) +# except KeyboardInterrupt: +# print _("Program interrupted (CTRL+C).") +# ok = False +# sys.exit(int(not ok)) +# +#### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### + pass def _detect_Properties(self): @@ -196,8 +342,8 @@ pass -class JpegFile(UnknownFile): - # for '_detect_Trained_CV' +class _JpegFile(_UnknownFile): + # for '_detect_Trained' cascade_files = [(u'Legs', 'haarcascade_lowerbody.xml'), (u'Torsos', 'haarcascade_upperbody.xml'), (u'Ears', 'haarcascade_mcs_leftear.xml'), @@ -212,12 +358,11 @@ # ('Hands' does not behave very well, in fact it detects any kind of skin and other things...) #(u'Aeroplanes', 'haarcascade_aeroplane.xml'),] # e.g. for 'Category:Unidentified aircraft' - def __init__(self, filename, *args, **kwargs): - UnknownFile.__init__(self, filename) + def __init__(self, file_name, file_mime, *args, **kwargs): + _UnknownFile.__init__(self, file_name, file_mime) - self.image_filename = os.path.split(self.filename)[-1] - self.image_fileext = os.path.splitext(self.image_filename)[1] - self.image_path = self.filename + self.image_filename = os.path.split(self.file_name)[-1] + self.image_path = self.file_name self.image_path_JPEG = self.image_path + '.jpg' self._convert() @@ -235,7 +380,7 @@ # Faces (extract EXIF data) self._detect_Faces_EXIF() # Faces and eyes (opencv pre-trained haar) - self._detect_Faces_CV() + self._detect_Faces() # TODO: test and use or switch off # Face via Landmark(s) # self._detect_FaceLandmark_xBOB() @@ -245,36 +390,36 @@ del self._features['Faces'][i] # Segments and colors - self._detect_SegmentColors_JSEGnPIL() + self._detect_SegmentColors() # Average color - self._detect_AverageColor_PILnCV() + self._detect_AverageColor() # People/Pedestrian (opencv pre-trained hog and haarcascade) - self._detect_People_CV() + self._detect_People() # Geometric object (opencv hough line, circle, edges, corner, ...) - self._detect_Geometry_CV() + self._detect_Geometry() # general (opencv pre-trained, third-party and self-trained haar # and cascade) classification #
http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartrainin…
for cf in self.cascade_files: - self._detect_Trained_CV(*cf) + self._detect_Trained(*cf) # barcode and Data Matrix recognition (libdmtx/pydmtx, zbar, gocr?) - self._recognize_OpticalCodes_dmtxNzbar() + self._recognize_OpticalCodes() # Chessboard (opencv reference detector) - self._detect_Chessboard_CV() + self._detect_Chessboard() # general (self-trained) detection WITH classification # BoW: uses feature detection (SIFT, SURF, ...) AND classification (SVM, ...) -# self._detectclassify_ObjectAll_CV() +# self._detectclassify_ObjectAll() # Wavelet: uses wavelet transformation AND classification (machine learning) # self._detectclassify_ObjectAll_PYWT() # general file EXIF history information - self._detect_History_EXIF() + self._detect_History() return self._features @@ -320,7 +465,7 @@ try: i = Image.open(self.image_path) except IOError: - pywikibot.warning(u'unknown file type [JpegFile]') + pywikibot.warning(u'unknown file type [_JpegFile]') return #
http://mail.python.org/pipermail/image-sig/1999-May/000740.html
@@ -351,8 +496,8 @@ 'Palette': str(len(i.palette.palette)) if i.palette else u'-', 'Pages': pc, 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), }) + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), }) #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) @@ -360,7 +505,7 @@ # .../opencv/samples/c/facedetect.cpp #
http://opencv.willowgarage.com/documentation/python/genindex.html
- def _detect_Faces_CV(self): + def _detect_Faces(self): """Converts an image to grayscale and prints the locations of any faces found""" #
http://python.pastebin.com/m76db1d6b
@@ -429,10 +574,10 @@ # how small and how many features are detected as faces (or eyes) scale = max([1., np.average(np.array(img.shape)[0:2]/500.)]) except IOError: - pywikibot.warning(u'unknown file type [_detect_Faces_CV]') + pywikibot.warning(u'unknown file type [_detect_Faces]') return except AttributeError: - pywikibot.warning(u'unknown file type [_detect_Faces_CV]') + pywikibot.warning(u'unknown file type [_detect_Faces]') return #detectAndDraw( image, cascade, nestedCascade, scale ); @@ -808,7 +953,7 @@ # .../opencv/samples/cpp/peopledetect.cpp # + Haar/Cascade detection - def _detect_People_CV(self): + def _detect_People(self): #
http://stackoverflow.com/questions/10231380/graphic-recognition-of-people
#
https://code.ros.org/trac/opencv/ticket/1298
#
http://opencv.itseez.com/modules/gpu/doc/object_detection.html
@@ -829,10 +974,10 @@ scale = max([1., np.average(np.array(img.shape)[0:2]/400.)]) #scale = max([1., np.average(np.array(img.shape)[0:2]/300.)]) except IOError: - pywikibot.warning(u'unknown file type [_detect_People_CV]') + pywikibot.warning(u'unknown file type [_detect_People]') return except AttributeError: - pywikibot.warning(u'unknown file type [_detect_People_CV]') + pywikibot.warning(u'unknown file type [_detect_People]') return # similar to face detection @@ -902,7 +1047,7 @@ self._features['People'] = result return - def _detect_Geometry_CV(self): + def _detect_Geometry(self): result = self._util_get_Geometry_CVnSCIPY() self._features['Geometry'] = [{'Lines': result['Lines'], @@ -934,10 +1079,10 @@ # how small and how many features are detected scale = max([1., np.average(np.array(img.shape)[0:2]/500.)]) except IOError: - pywikibot.warning(u'unknown file type [_detect_Geometry_CV]') + pywikibot.warning(u'unknown file type [_detect_Geometry]') return self._buffer_Geometry except AttributeError: - pywikibot.warning(u'unknown file type [_detect_Geometry_CV]') + pywikibot.warning(u'unknown file type [_detect_Geometry]') return self._buffer_Geometry # similar to face or people detection @@ -1065,7 +1210,7 @@ return self._buffer_Geometry # .../opencv/samples/cpp/bagofwords_classification.cpp - def _detectclassify_ObjectAll_CV(self): + def _detectclassify_ObjectAll(self): """Uses the 'The Bag of Words model' for detection and classification""" # CAN ALSO BE USED FOR: TEXT, ... @@ -1165,7 +1310,7 @@ #
http://library.wolfram.com/infocenter/Demos/5725/#downloads
#
http://code.google.com/p/pymeanshift/wiki/Examples
# (
http://pythonvision.org/basic-tutorial
,
http://luispedro.org/software/mahotas
,
http://packages.python.org/pymorph/
) - def _detect_SegmentColors_JSEGnPIL(self): # may be SLIC other other too... + def _detect_SegmentColors(self): # may be SLIC other other too... try: #im = Image.open(self.image_path).convert(mode = 'RGB') im = Image.open(self.image_path_JPEG) @@ -1179,7 +1324,7 @@ (l, t) = (0, 0) i = im except IOError: - pywikibot.warning(u'unknown file type [_detect_SegmentColors_JSEGnPIL]') + pywikibot.warning(u'unknown file type [_detect_SegmentColors]') return result = [] @@ -1193,7 +1338,7 @@ ##(pic, scale) = self._util_detect_ColorSegments_JSEG(pic) # (final split) #hist = self._util_get_ColorSegmentsHist_PIL(i, pic, scale) # except TypeError: - pywikibot.warning(u'unknown file type [_detect_SegmentColors_JSEGnPIL]') + pywikibot.warning(u'unknown file type [_detect_SegmentColors]') return i = 0 # (may be do an additional region merge according to same color names...) @@ -1221,14 +1366,14 @@ #
http://code.google.com/p/python-colormath/
#
http://en.wikipedia.org/wiki/Color_difference
#
http://www.farb-tabelle.de/en/table-of-color.htm
- def _detect_AverageColor_PILnCV(self): + def _detect_AverageColor(self): try: # we need to have 3 channels (but e.g. grayscale 'P' has only 1) #i = Image.open(self.image_path).convert(mode = 'RGB') i = Image.open(self.image_path_JPEG) h = i.histogram() except IOError: - pywikibot.warning(u'unknown file type [_detect_AverageColor_PILnCV]') + pywikibot.warning(u'unknown file type [_detect_AverageColor]') return result = self._util_average_Color_colormath(h) @@ -1321,7 +1466,7 @@ tmpjpg = os.path.join(scriptdir, "cache/jseg_buf.jpg") tmpgif = os.path.join(scriptdir, "cache/jseg_buf.gif") - # same scale func as in '_detect_Faces_CV' + # same scale func as in '_detect_Faces' scale = max([1., np.average(np.array(im.size)[0:2]/200.)]) #print np.array(im.size)/scale, scale try: @@ -1482,7 +1627,7 @@ return im # Category:... (several; look at self.gatherFeatures for more hints) - def _detect_Trained_CV(self, info_desc, cascade_file, maxdim=500.): + def _detect_Trained(self, info_desc, cascade_file, maxdim=500.): # general (self trained) classification (e.g. people, ...) #
http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartrainin…
@@ -1511,10 +1656,10 @@ # how small and how many features are detected scale = max([1., np.average(np.array(img.shape)[0:2]/maxdim)]) except IOError: - pywikibot.warning(u'unknown file type [_detect_Trained_CV]') + pywikibot.warning(u'unknown file type [_detect_Trained]') return except AttributeError: - pywikibot.warning(u'unknown file type [_detect_Trained_CV]') + pywikibot.warning(u'unknown file type [_detect_Trained]') return # similar to face detection @@ -1541,7 +1686,7 @@ self._features[info_desc] = result return - def _recognize_OpticalCodes_dmtxNzbar(self): + def _recognize_OpticalCodes(self): # barcode and Data Matrix recognition (libdmtx/pydmtx, zbar, gocr?) #
http://libdmtx.wikidot.com/libdmtx-python-wrapper
#
http://blog.globalstomp.com/2011/09/decoding-qr-code-code-128-code-39.html
@@ -1571,7 +1716,7 @@ scale = max([1., np.average(np.array(img.size)/200.)]) except IOError: - pywikibot.warning(u'unknown file type [_recognize_OpticalCodes_dmtxNzbar]') + pywikibot.warning(u'unknown file type [_recognize_OpticalCodes]') return smallImg = img.resize( (int(img.size[0]/scale), int(img.size[1]/scale)) ) @@ -1608,7 +1753,7 @@ img = Image.open(self.image_path_JPEG).convert('L') width, height = img.size except IOError: - pywikibot.warning(u'unknown file type [_recognize_OpticalCodes_dmtxNzbar]') + pywikibot.warning(u'unknown file type [_recognize_OpticalCodes]') return scanner = zbar.ImageScanner() @@ -1636,7 +1781,7 @@ self._features['OpticalCodes'] = result return - def _detect_Chessboard_CV(self): + def _detect_Chessboard(self): # Chessboard (opencv reference detector) #
http://www.c-plusplus.de/forum/273920-full
#
http://www.youtube.com/watch?v=bV-jAnQ-tvw
@@ -1656,10 +1801,10 @@ #scale = max([1., np.average(np.array(im.shape)[0:2]/500.)]) #scale = max([1., np.average(np.array(im.shape)[0:2]/450.)]) except IOError: - pywikibot.warning(u'unknown file type [_detect_Chessboard_CV]') + pywikibot.warning(u'unknown file type [_detect_Chessboard]') return except AttributeError: - pywikibot.warning(u'unknown file type [_detect_Chessboard_CV]') + pywikibot.warning(u'unknown file type [_detect_Chessboard]') return smallImg = np.empty( (cv.Round(im.shape[1]/scale), cv.Round(im.shape[0]/scale)), dtype=np.uint8 ) @@ -1875,7 +2020,7 @@ coords2D = np.dot((cm), coords) perp = coords - origin if hacky: - # for '_detect_Chessboard_CV' but looks a bit strange ... may be wrong?! + # for '_detect_Chessboard' but looks a bit strange ... may be wrong?! mat = coords2D - origin2D mat = mat/max([np.linalg.norm(mat[:,i]) for i in range(3)]) else: @@ -2138,7 +2283,7 @@ self._features['Faces'] += data return - def _detect_History_EXIF(self): + def _detect_History(self): res = self._util_get_DataTags_EXIF() #a = [] @@ -2224,7 +2369,7 @@ drop.append( i1 ) elif (ar2 >= thsr) and (i1 not in drop): drop.append( i2 ) - # from '_detect_Faces_CV()' + # from '_detect_Faces()' if overlap: if (r2[0] <= c1[0] <= (r2[0] + r2[2])) and \ (r2[1] <= c1[1] <= (r2[1] + r2[3])) and (i2 not in drop): @@ -2243,17 +2388,17 @@ return (regs, drop) -class PngFile(JpegFile): +class _PngFile(_JpegFile): pass -class GifFile(JpegFile): +class _GifFile(_JpegFile): pass -class TiffFile(JpegFile): +class _TiffFile(_JpegFile): pass -class XcfFile(JpegFile): +class _XcfFile(_JpegFile): def _convert(self): # Very few programs other than GIMP read XCF files. This is by design # from the GIMP developers, the format is not really documented or @@ -2282,19 +2427,19 @@ as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - result = { 'Format': u'%s' % self.image_mime[1].upper(), + result = { 'Format': u'%s' % self.file_mime[1].upper(), # DO NOT use ImageMagick (identify) instead of PIL to get these info !! 'Pages': 0, 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), } #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) return -class SvgFile(JpegFile): +class _SvgFile(_JpegFile): def _convert(self): # SVG: rasterize the SVG to bitmap (MAY BE GET FROM WIKI BY DOWNLOAD?...) # (Mediawiki uses librsvg too:
http://commons.wikimedia.org/wiki/SVG#SVGs_in_MediaWiki
) @@ -2358,19 +2503,19 @@ # may be set {{validSVG}} also or do something in bot template to # recognize 'Format=SVG (valid)' ... 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), }) + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), }) #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) return -class PdfFile(JpegFile): +class _PdfFile(_JpegFile): def getFeatures(self): # optical and other text recognition (tesseract & ocropus, ...) - self._detect_EmbeddedText_poppler() -# self._recognize_OpticalText_ocropus() + self._detect_EmbeddedText() +# self._recognize_OpticalText() # (may be just classify as 'contains text', may be store text, e.g. to wikisource) return self._features @@ -2383,7 +2528,7 @@ #
http://vermeulen.ca/python-pdf.html
#
http://code.activestate.com/recipes/511465-pure-python-pdf-to-text-converte…
#
http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-t…
- if self.image_fileext == u'.pdf': + if os.path.splitext(self.image_filename)[1].lower() == u'.pdf': pass # MIME: 'application/pdf; charset=binary' @@ -2402,8 +2547,8 @@ 'Palette': u'-', 'Pages': pc, 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), } #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) @@ -2411,7 +2556,7 @@ # ./run-test (ocropus/ocropy) # (in fact all scripts/executables used here are pure python scripts!!!) - def _recognize_OpticalText_ocropus(self): + def _recognize_OpticalText(self): # optical text recognition (tesseract & ocropus, ...) # (no full recognition but - at least - just classify as 'contains text') #
http://www.claraocr.org/de/ocr/ocr-software/open-source-ocr.html
@@ -2475,7 +2620,7 @@ #print data pywikibot.output(data) - def _detect_EmbeddedText_poppler(self): + def _detect_EmbeddedText(self): # may be also:
http://www.reportlab.com/software/opensource/rl-toolkit/
# poppler pdftotext/pdfimages @@ -2538,10 +2683,10 @@ # pdfinterp.process_pdf(rsrcmgr, device, fp, set(), maxpages=0, password='', # caching=True, check_extractable=False) #except AssertionError: - # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') + # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText]') # return #except TypeError: - # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') + # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText]') # return #fp.close() #device.close() @@ -2561,17 +2706,17 @@ return -#class DjvuFile(JpegFile): +#class DjvuFile(_JpegFile): # pass -class OggFile(JpegFile): +class _OggFile(_JpegFile): def getFeatures(self): # general handling of all audio and video formats - self._detect_Streams_FFMPEG() + self._detect_Streams() # general audio feature extraction -# self._detect_AudioFeatures_YAAFE() +# self._detect_AudioFeatures() return self._features @@ -2588,14 +2733,14 @@ result = { 'Format': u'%s' % d['format']['format_name'].upper(), 'Pages': 0, 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), } #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) return - def _detect_Streams_FFMPEG(self): + def _detect_Streams(self): # audio and video streams files (ogv, oga, ...) d = self._util_get_DataStreams_FFMPEG() if not d: @@ -2661,7 +2806,7 @@ return self._buffer_FFMPEG - def _detect_AudioFeatures_YAAFE(self): + def _detect_AudioFeatures(self): #
http://yaafe.sourceforge.net/manual/tools.html
#
http://yaafe.sourceforge.net/manual/quickstart.html
- yaafe.py # ( help: yaafe.py -h / features: yaafe.py -l ) @@ -2813,9 +2958,9 @@ return -class MidiFile(UnknownFile): +class _MidiFile(_UnknownFile): def getFeatures(self): - self._detect_AudioFeatures_MUSIC21() # Audio + self._detect_AudioFeatures() # Audio return self._features def _detect_HeaderAndMetadata(self): @@ -2824,7 +2969,7 @@ # extract data from midi file #
http://valentin.dasdeck.com/midi/midifile.htm
#
http://stackoverflow.com/questions/3943149/reading-and-interpreting-data-fr…
- ba = bytearray(open(self.filename, 'rb').read()) + ba = bytearray(open(self.file_name, 'rb').read()) i = -1 for key, data in [('Text', '\x01'), ('Copyright', '\x02'), ('Lyrics', '\x05')]: result[key] = [] @@ -2853,7 +2998,7 @@ import _music21 as music21 try: - s = music21.converter.parse(self.filename) + s = music21.converter.parse(self.file_name) if s.metadata: pywikibot.output(unicode(s.metadata)) result.update(s.metadata) @@ -2869,27 +3014,27 @@ as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - result = { 'Format': u'%s' % self.image_mime[1].upper(), + result = { 'Format': u'%s' % self.file_mime[1].upper(), 'Pages': 0, 'Dimensions': self.image_size, - 'Filesize': os.path.getsize(self.filename), - 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } + 'Filesize': os.path.getsize(self.file_name), + 'MIME': u'%s/%s' % tuple(self.file_mime[:2]), } #self._properties['Properties'] = [result] self._properties['Properties'][0].update(result) return # midi audio feature extraction - def _detect_AudioFeatures_MUSIC21(self): + def _detect_AudioFeatures(self): import _music21 as music21 #music21.features.jSymbolic.getCompletionStats() try: #audiofile = '/home/ursin/Desktop/3_Ships.mid' - #s = music21.midi.translate.midiFilePathToStream(self.filename) - s = music21.converter.parse(self.filename) + #s = music21.midi.translate.midiFilePathToStream(self.file_name) + s = music21.converter.parse(self.file_name) except music21.midi.base.MidiException: - pywikibot.warning(u'unknown file type [_detect_AudioFeatures_MUSIC21]') + pywikibot.warning(u'unknown file type [_detect_AudioFeatures]') return #fs = music21.features.jSymbolic.extractorsById @@ -2931,21 +3076,34 @@ return -FILETYPES = { '*': UnknownFile, - ( 'image', 'jpeg'): JpegFile, - ( 'image', 'png'): PngFile, - ( 'image', 'gif'): GifFile, - ( 'image', 'tiff'): TiffFile, - ( 'image', 'x-xcf'): XcfFile, - ( 'image', 'svg+xml'): SvgFile, - ('application', 'pdf'): PdfFile, +_FILETYPES = { '*': _UnknownFile, + ( 'image', 'jpeg'): _JpegFile, + ( 'image', 'png'): _PngFile, + ( 'image', 'gif'): _GifFile, + ( 'image', 'tiff'): _TiffFile, + ( 'image', 'x-xcf'): _XcfFile, + ( 'image', 'svg+xml'): _SvgFile, + ('application', 'pdf'): _PdfFile, # djvu: python-djvulibre or python-djvu for djvu support #
http://pypi.python.org/pypi/python-djvulibre/0.3.9
-# ( 'image', 'vnd.djvu'): DjvuFile, - ('application', 'ogg'): OggFile, - ( 'audio', 'midi'): MidiFile,} +# ( 'image', 'vnd.djvu'): DjvuFile, + ('application', 'ogg'): _OggFile, + ( 'audio', 'midi'): _MidiFile,} +def GenericFile(file_name): + # 'magic' (libmagic) + m = magic.open(magic.MAGIC_MIME) # or 'magic.MAGIC_NONE' + m.load() + file_mime = re.split('[/;\s]', m.file(file_name)) + mime = mimetypes.guess_all_extensions('%s/%s' % tuple(file_mime[0:2])) + if mime and (os.path.splitext(file_name)[1].lower() not in mime): + pywikibot.warning(u'File extension does not match MIME type! File extension should be %s.' % mime) + # split detection and extraction according to file types; _JpegFile, ... + GenericFile = _FILETYPES.get(tuple(file_mime[:2]), _FILETYPES['*']) + return GenericFile(file_name, file_mime) + + # all classification and categorization methods and definitions - default variation # use simplest classification I can think of (self-made) and do categorization # mostly based on filtered/reported features @@ -2957,7 +3115,7 @@ #_thrshld_guesses = 0.1 _thrshld_default = 0.75 - # for '_detect_Trained_CV' + # for '_detect_Trained' cascade_files = [(u'Legs', 'haarcascade_lowerbody.xml'), (u'Torsos', 'haarcascade_upperbody.xml'), (u'Ears', 'haarcascade_mcs_leftear.xml'), @@ -3387,7 +3545,6 @@ pywikibot.output(u'Processing media %s ...' % self.image.title(asLink=True)) image_filename = os.path.split(self.image.fileUrl())[-1] - image_fileext = os.path.splitext(image_filename)[1] self.image_path = urllib2.quote(os.path.join(scriptdir, ('cache/' + image_filename[-128:]))) self._wikidata = self.image._latestInfo # all info wikimedia got from content (mime, sha1, ...) @@ -3411,15 +3568,6 @@ f.write( data ) f.close() - # 'magic' (libmagic) - m = magic.open(magic.MAGIC_MIME) # or 'magic.MAGIC_NONE' - m.load() - self.image_mime = re.split('[/;\s]', m.file(self.image_path)) - #self.image_size = (None, None) - mime = mimetypes.guess_all_extensions('%s/%s' % tuple(self.image_mime[0:2])) - if mime and (image_fileext.lower() not in mime): - pywikibot.warning(u'File extension does not match MIME type! File extension should be %s.' % mime) - # LOOK ALSO AT: checkimages.CatImagesBot.checkStep # (and category scripts/bots too...) def checkStep(self): @@ -3673,7 +3821,7 @@ return u" | %s = %s" % (key, self._output_format(value)) def _make_markerblock(self, res, size, structure=['Position'], line='solid'): - # same as in '_detect_Faces_CV' + # same as in '_detect_Faces' colors = [ (0,0,255), (0,128,255), (0,255,255), @@ -3739,17 +3887,13 @@ # gather data from all information interfaces def gatherFeatures(self): - # split detection and extraction according to file types; JpegFile, ... - TypeFile = FILETYPES.get(tuple(self.image_mime[:2]), FILETYPES['*']) - with TypeFile(self.image_path) as tf: - tf.image_mime = self.image_mime - tf.image = self.image + # split detection and extraction according to file types; _JpegFile, ... + with GenericFile(self.image_path) as gf: + gf.image = self.image # patch for _SvgFile needing url for func in ['getProperties', 'getFeatures']: - result = getattr(tf, func)() + result = getattr(gf, func)() self._info.update(result) - print self._info - #print tf.__dict__ - self.image_size = tf.image_size + self.image_size = gf.image_size def _existInformation(self, info, ignore = ['Properties', 'Metadata', 'ColorAverage']): result = [] @@ -4185,7 +4329,7 @@ linear_svm = mlpy.LibSvm(kernel_type='linear') # new linear SVM instance linear_svm.learn(z, y) # learn from principal components - # !!! train also BoW (bag-of-words) in '_detectclassify_ObjectAll_CV' resp. 'opencv.BoWclassify.main' !!! + # !!! train also BoW (bag-of-words) in '_detectclassify_ObjectAll' resp. 'opencv.BoWclassify.main' !!! xmin, xmax = z[:,0].min()-0.1, z[:,0].max()+0.1 ymin, ymax = z[:,1].min()-0.1, z[:,1].max()+0.1
1
0
0
0
SVN: [11623] trunk/pywikipedia/catimages.py
by drtrigonďĽ svn.wikimedia.org
07 Jun '13
07 Jun '13
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/11623
Revision: 11623 Author: drtrigon Date: 2013-06-07 22:29:18 +0000 (Fri, 07 Jun 2013) Log Message: ----------- improvement; code cleanup and class conversion more or less finished (method renaming not done yet) new feature; initial audio midi support (music21) and own metadata part (Category:MIDI files created with GNU LilyPond) Modified Paths: -------------- trunk/pywikipedia/catimages.py Modified: trunk/pywikipedia/catimages.py =================================================================== --- trunk/pywikipedia/catimages.py 2013-06-07 18:21:31 UTC (rev 11622) +++ trunk/pywikipedia/catimages.py 2013-06-07 22:29:18 UTC (rev 11623) @@ -149,7 +149,27 @@ self.filename = filename self.image_size = (None, None) - self._info = {} + # available file properties and metadata + self._properties = { 'Properties': [{'Format': u'-', 'Pages': 0}], + 'Metadata': [], } + # available feature to extract + self._features = { 'ColorAverage': [], + 'ColorRegions': [], + 'Faces': [], + 'People': [], + 'OpticalCodes': [], + 'Chessboard': [], + 'History': [], + 'Text': [], + 'Streams': [], + 'Audio': [], + 'Legs': [], + 'Hands': [], + 'Torsos': [], + 'Ears': [], + 'Eyes': [], + 'Automobiles': [], + 'Classify': [], } def __enter__(self): return self @@ -158,22 +178,22 @@ pass def getProperties(self): - result = {} - result.update( self._detect_HeaderAndMetadata() ) # Metadata - result.update( self._detect_Properties_PIL() ) # Properties - return result + self._detect_HeaderAndMetadata() # Metadata + self._detect_Properties() # Properties + return self._properties def getFeatures(self): pywikibot.warning(u"File format '%s/%s' not supported (yet)!" % tuple(self.image_mime[:2])) + return self._features def _detect_HeaderAndMetadata(self): # check/look into the file by midnight commander (mc) #
https://pypi.python.org/pypi/hachoir-metadata
- return {} + pass - def _detect_Properties_PIL(self): + def _detect_Properties(self): # get mime-type file-size, ... - return {} + pass class JpegFile(UnknownFile): @@ -197,14 +217,14 @@ self.image_filename = os.path.split(self.filename)[-1] self.image_fileext = os.path.splitext(self.image_filename)[1] - self.image_path = os.path.join(scriptdir, ('cache/' + self.image_filename[-128:])) + self.image_path = self.filename self.image_path_JPEG = self.image_path + '.jpg' self._convert() def __exit__(self, type, value, traceback): - if os.path.exists(self.image_path): - os.remove( self.image_path ) + #if os.path.exists(self.image_path): + # os.remove( self.image_path ) if os.path.exists(self.image_path_JPEG): os.remove( self.image_path_JPEG ) #image_path_new = self.image_path_JPEG.replace(u"cache/", u"cache/0_DETECTED_") @@ -220,9 +240,9 @@ # Face via Landmark(s) # self._detect_FaceLandmark_xBOB() # exclude duplicates (CV and EXIF) - faces = [item['Position'] for item in self._info['Faces']] + faces = [item['Position'] for item in self._features['Faces']] for i in self._util_merge_Regions(faces)[1]: - del self._info['Faces'][i] + del self._features['Faces'][i] # Segments and colors self._detect_SegmentColors_JSEGnPIL() @@ -241,11 +261,6 @@ for cf in self.cascade_files: self._detect_Trained_CV(*cf) - # optical and other text recognition (tesseract & ocropus, ...) - self._detect_EmbeddedText_poppler() -# self._recognize_OpticalText_ocropus() - # (may be just classify as 'contains text', may be store text, e.g. to wikisource) - # barcode and Data Matrix recognition (libdmtx/pydmtx, zbar, gocr?) self._recognize_OpticalCodes_dmtxNzbar() @@ -260,6 +275,8 @@ # general file EXIF history information self._detect_History_EXIF() + + return self._features # supports a lot of different file types thanks to PIL def _convert(self): @@ -293,18 +310,18 @@ #
http://code.google.com/p/pylibtiff/
# MIME: 'image/jpeg; charset=binary', ... - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) + result = {'Format': u'-', 'Pages': 0} try: i = Image.open(self.image_path) except IOError: pywikibot.warning(u'unknown file type [JpegFile]') - return {'Properties': [result]} + return #
http://mail.python.org/pipermail/image-sig/1999-May/000740.html
pc=0 # count number of pages @@ -325,23 +342,22 @@ #self.image_size = i.size - result = { #'bands': i.getbands(), - #'bbox': i.getbbox(), - 'Format': i.format, - 'Mode': i.mode, - #'info': i.info, - #'stat': os.stat(self.image_path), - 'Palette': str(len(i.palette.palette)) if i.palette else u'-', - 'Pages': pc, } + result.update({ #'bands': i.getbands(), + #'bbox': i.getbbox(), + 'Format': i.format, + 'Mode': i.mode, + #'info': i.info, + #'stat': os.stat(self.image_path), + 'Palette': str(len(i.palette.palette)) if i.palette else u'-', + 'Pages': pc, + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), }) - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} - # .../opencv/samples/c/facedetect.cpp #
http://opencv.willowgarage.com/documentation/python/genindex.html
def _detect_Faces_CV(self): @@ -354,11 +370,6 @@ #
http://blog.jozilla.net/2008/06/27/fun-with-python-opencv-and-face-detectio…
#
http://www.cognotics.com/opencv/servo_2007_series/part_4/index.html
- # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - #
https://code.ros.org/trac/opencv/browser/trunk/opencv_extra/testdata/gpu/ha…
xml = os.path.join(scriptdir, 'externals/opencv/haarcascades/haarcascade_eye_tree_eyeglasses.xml') #xml = os.path.join(scriptdir, 'externals/opencv/haarcascades/haarcascade_eye.xml') @@ -402,7 +413,6 @@ raise IOError(u"No such file: '%s'" % xml) cascaderightear = cv2.CascadeClassifier(xml) - #self._info['Faces'] = [] scale = 1. # So, to find an object of an unknown size in the image the scan # procedure should be done several times at different scales. @@ -651,7 +661,7 @@ # cv2.imwrite( image_path_new, img ) #return faces.tolist() - self._info['Faces'] += result + self._features['Faces'] += result return def _util_get_Pose_solvePnP(self, D3points, D2points, shape): @@ -728,7 +738,6 @@ """Prints the locations of any face landmark(s) found, respective converts them to usual face position data""" - #self._info['Faces'] = [] scale = 1. try: #video = bob.io.VideoReader(self.image_path_JPEG.encode('utf-8')) @@ -794,7 +803,7 @@ #cv2.imshow("people detector", img) #cv2.waitKey() - self._info['Faces'] += result + self._features['Faces'] += result return # .../opencv/samples/cpp/peopledetect.cpp @@ -806,7 +815,6 @@ #
http://opencv.willowgarage.com/documentation/cpp/basic_structures.html
#
http://www.pygtk.org/docs/pygtk/class-gdkrectangle.html
- self._info['People'] = [] scale = 1. try: img = cv2.imread(self.image_path_JPEG, cv.CV_LOAD_IMAGE_COLOR) @@ -891,22 +899,15 @@ #cv2.imshow("people detector", img) #c = cv2.waitKey(0) & 255 - self._info['People'] = result + self._features['People'] = result return def _detect_Geometry_CV(self): - self._info['Geometry'] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - result = self._util_get_Geometry_CVnSCIPY() - self._info['Geometry'] = [{'Lines': result['Lines'], - 'Circles': result['Circles'], - 'Corners': result['Corners'],}] + self._features['Geometry'] = [{'Lines': result['Lines'], + 'Circles': result['Circles'], + 'Corners': result['Corners'],}] return #
https://code.ros.org/trac/opencv/browser/trunk/opencv/samples/python/houghl…
@@ -1083,8 +1084,6 @@ # parts of code here should/have to be placed into e.g. a own # class in 'dtbext/opencv/__init__.py' script/module - self._info['Classify'] = [] - trained = ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', @@ -1123,7 +1122,7 @@ #
http://www.xrce.xerox.com/layout/set/print/content/download/18763/134049/fi…
#
http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
- self._info['Classify'] = [dict([ (trained[i], r) for i, r in enumerate(result) ])] + self._features['Classify'] = [dict([ (trained[i], r) for i, r in enumerate(result) ])] return def _detectclassify_ObjectAll_PYWT(self): @@ -1167,13 +1166,6 @@ #
http://code.google.com/p/pymeanshift/wiki/Examples
# (
http://pythonvision.org/basic-tutorial
,
http://luispedro.org/software/mahotas
,
http://packages.python.org/pymorph/
) def _detect_SegmentColors_JSEGnPIL(self): # may be SLIC other other too... - self._info['ColorRegions'] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - try: #im = Image.open(self.image_path).convert(mode = 'RGB') im = Image.open(self.image_path_JPEG) @@ -1220,7 +1212,7 @@ result.append( data ) i += 1 - self._info['ColorRegions'] = result + self._features['ColorRegions'] = result return #
http://stackoverflow.com/questions/2270874/image-color-detection-using-pyth…
@@ -1230,13 +1222,6 @@ #
http://en.wikipedia.org/wiki/Color_difference
#
http://www.farb-tabelle.de/en/table-of-color.htm
def _detect_AverageColor_PILnCV(self): - self._info['ColorAverage'] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - try: # we need to have 3 channels (but e.g. grayscale 'P' has only 1) #i = Image.open(self.image_path).convert(mode = 'RGB') @@ -1249,7 +1234,7 @@ result = self._util_average_Color_colormath(h) result['Gradient'] = self._util_get_Geometry_CVnSCIPY().get('Edge_Ratio', None) or '-' result['FFT_Peaks'] = self._util_get_Geometry_CVnSCIPY().get('FFT_Peaks', None) or '-' - self._info['ColorAverage'] = [result] + self._features['ColorAverage'] = [result] return #
http://stackoverflow.com/questions/2270874/image-color-detection-using-pyth…
@@ -1509,13 +1494,6 @@ # analogue to face detection: - self._info[info_desc] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - #
http://tutorial-haartraining.googlecode.com/svn/trunk/data/haarcascades/
# or own xml files trained onto specific file database/set xml = os.path.join(scriptdir, ('externals/opencv/haarcascades/' + cascade_file)) @@ -1560,170 +1538,9 @@ # generic detection ... - self._info[info_desc] = result + self._features[info_desc] = result return - # ./run-test (ocropus/ocropy) - # (in fact all scripts/executables used here are pure python scripts!!!) - def _recognize_OpticalText_ocropus(self): - # optical text recognition (tesseract & ocropus, ...) - # (no full recognition but - at least - just classify as 'contains text') - #
http://www.claraocr.org/de/ocr/ocr-software/open-source-ocr.html
- #
https://github.com/edsu/ocropy
- #
http://de.wikipedia.org/wiki/Benutzer:DrTrigonBot/Doku#Categorization
- # Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...] - # tesseract imagename.tif output - - # (it's simpler to run the scripts/executables in own environment/interpreter...) - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']): - return - - path = os.path.join(scriptdir, 'dtbext/_ocropus/ocropy') - - curdir = os.path.abspath(os.curdir) - os.chdir(path) - - # binarization - if os.path.exists(os.path.join(path, "temp")): - shutil.rmtree(os.path.join(path, "temp")) - if os.system("ocropus-nlbin %s -o %s" % (self.image_path_JPEG, os.path.join(path, "temp"))): - raise ImportError("ocropus not found!") - - # page level segmentation - if os.system("ocropus-gpageseg --minscale 6.0 '%s'" % os.path.join(path, "temp/????.bin.png")): - # detection error - return - - # raw text line recognition - if os.system("ocropus-lattices --writebestpath '%s'" % os.path.join(path, "temp/????/??????.bin.png")): - # detection error - return - - # language model application - # (optional - improve the raw results by applying a pretrained model) - os.environ['OCROPUS_DATA'] = os.path.join(path, "models/") - if os.system("ocropus-ngraphs '%s'" % os.path.join(path, "temp/????/??????.lattice")): - # detection error - return - - # create hOCR output - if os.system("ocropus-hocr '%s' -o %s" % (os.path.join(path, "temp/????.bin.png"), os.path.join(path, "temp.html"))): - # detection error - return - - ## 'create HTML for debugging (use "firefox temp/index.html" to view)' - ## (optional - generate human readable debug output) - #if os.system("ocropus-visualize-results %s" % os.path.join(path, "temp")): - # # detection error - # return - - # "to see recognition results, type: firefox temp.html" - # "to see details on the recognition process, type: firefox temp/index.html" - tmpfile = open(os.path.join(path, "temp.html"), 'r') - data = tmpfile.read() - tmpfile.close() - - shutil.rmtree(os.path.join(path, "temp")) - os.remove(os.path.join(path, "temp.html")) - - os.chdir(curdir) - - #print data - pywikibot.output(data) - - def _detect_EmbeddedText_poppler(self): - # may be also:
http://www.reportlab.com/software/opensource/rl-toolkit/
- - self._info['Text'] = [] - - if not (self.image_mime[1] == 'pdf'): - return - - # poppler pdftotext/pdfimages - # (similar as in '_util_get_DataTags_EXIF' but with stderr and no json output) - #
http://poppler.freedesktop.org/
- #
http://www.izzycode.com/bash/how-to-install-pdf2text-on-centos-fedora-redha…
- # MIGHT BE BETTER TO USE AS PYTHON MODULE: - #
https://launchpad.net/poppler-python/
- #
http://stackoverflow.com/questions/2732178/extracting-text-from-pdf-with-po…
- #
http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-t…
- #proc = Popen("pdftotext -layout %s %s" % (self.image_path, self.image_path+'.txt'), - proc = Popen("pdftotext %s %s" % (self.image_path, self.image_path+'.txt'), - shell=True, stderr=PIPE)#.stderr.readlines() - proc.wait() - if proc.returncode: - raise ImportError("pdftotext not found!") - data = open(self.image_path+'.txt', 'r').readlines() - os.remove( self.image_path+'.txt' ) - -# self._content_text = data - (s1, l1) = (len(u''.join(data)), len(data)) - - tmp_path = os.path.join(os.environ.get('TMP', '/tmp'), 'DrTrigonBot/') - os.mkdir( tmp_path ) -# switch this part off since 'pdfimages' (on toolserver) is too old; TS-1449 -# proc = Popen("pdfimages -p %s %s/" % (self.image_path, tmp_path), - proc = Popen("pdfimages %s %s/" % (self.image_path, tmp_path), - shell=True, stderr=PIPE)#.stderr.readlines() - proc.wait() - if proc.returncode: - raise ImportError("pdfimages not found!") - images = os.listdir( tmp_path ) -# pages = set() - for f in images: -# pages.add( int(f.split('-')[1]) ) - os.remove( os.path.join(tmp_path, f) ) - os.rmdir( tmp_path ) - - ## pdfminer (tools/pdf2txt.py) - ##
http://denis.papathanasiou.org/?p=343
(for layout and images) - #debug = 0 - #laparams = layout.LAParams() - ## - #pdfparser.PDFDocument.debug = debug - #pdfparser.PDFParser.debug = debug - #cmapdb.CMapDB.debug = debug - #pdfinterp.PDFResourceManager.debug = debug - #pdfinterp.PDFPageInterpreter.debug = debug - #pdfdevice.PDFDevice.debug = debug - ## - #rsrcmgr = pdfinterp.PDFResourceManager(caching=True) - #outfp = StringIO.StringIO() - #device = converter.TextConverter(rsrcmgr, outfp, codec='utf-8', laparams=laparams) - ##device = converter.XMLConverter(rsrcmgr, outfp, codec='utf-8', laparams=laparams, outdir=None) - ##device = converter.HTMLConverter(rsrcmgr, outfp, codec='utf-8', scale=1, - ## layoutmode='normal', laparams=laparams, outdir=None) - ##device = pdfdevice.TagExtractor(rsrcmgr, outfp, codec='utf-8') - #fp = file(self.image_path, 'rb') - #try: - # pdfinterp.process_pdf(rsrcmgr, device, fp, set(), maxpages=0, password='', - # caching=True, check_extractable=False) - #except AssertionError: - # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') - # return - #except TypeError: - # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') - # return - #fp.close() - #device.close() - #data = outfp.getvalue().splitlines(True) - # - #(s2, l2) = (len(u''.join(data)), len(data)) - - result = { 'Size': s1, - 'Lines': l1, - #'Data': data, - #'Position': pos, -# 'Images': u'%s (on %s page(s))' % (len(images), len(list(pages))), # pages containing images - 'Images': u'%s' % len(images), - 'Type': u'-', } # 'Type' could be u'OCR' above... - - self._info['Text'] = [result] - - return - def _recognize_OpticalCodes_dmtxNzbar(self): # barcode and Data Matrix recognition (libdmtx/pydmtx, zbar, gocr?) #
http://libdmtx.wikidot.com/libdmtx-python-wrapper
@@ -1731,13 +1548,6 @@ #
http://zbar.sourceforge.net/
#
http://pypi.python.org/pypi/zbar
- self._info['OpticalCodes'] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - # DataMatrix from pydmtx import DataMatrix # linux distro package (fedora) / TS (debian) @@ -1786,7 +1596,7 @@ 'Type': u'DataMatrix', 'Quality': 10, }) - self._info['OpticalCodes'] = result + self._features['OpticalCodes'] = result # supports many popular symbologies try: @@ -1823,7 +1633,7 @@ # further detection ? - self._info['OpticalCodes'] = result + self._features['OpticalCodes'] = result return def _detect_Chessboard_CV(self): @@ -1832,13 +1642,6 @@ #
http://www.youtube.com/watch?v=bV-jAnQ-tvw
#
http://nullege.com/codes/show/src%40o%40p%40opencvpython-HEAD%40samples%40c…
- self._info['Chessboard'] = [] - - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg', 'pdf', 'vnd.djvu']) or \ - (self.image_mime[0] in ['audio']): - return - scale = 1. try: #im = cv.LoadImage(self.image_path_JPEG, cv.CV_LOAD_IMAGE_COLOR) @@ -1877,9 +1680,10 @@ ##cv2.imshow("win", im) ##cv2.waitKey() + result = {} if corners is not None: - self._info['Chessboard'] = [{ 'Corners': [tuple(item[0]) - for item in corners], }] + result = { 'Corners': [tuple(item[0]) for item in corners], } + self._features['Chessboard'] = [result] # TODO: improve chessboard detection (make it more tolerant) # ##
http://stackoverflow.com/questions/7624765/converting-an-opencv-image-to-bl…
@@ -2019,10 +1823,10 @@ pywikibot.output(u'result for calibrated camera:\n rot=%s\n perp=%s\n perp2D=%s' % (rot.transpose()[0], perp[:,2], ortho)) pywikibot.output(u'nice would be to do the same for uncalibrated/default cam settings') - self._info['Chessboard'][0].update({ - 'Rotation': tuple(rot.transpose()[0]), - 'Perp_Dir' : tuple(perp[:,2]), - 'Perp_Dir_2D': tuple(ortho), }) + result.update({ 'Rotation': tuple(rot.transpose()[0]), + 'Perp_Dir' : tuple(perp[:,2]), + 'Perp_Dir_2D': tuple(ortho), }) + self._features['Chessboard'] = [result] #cv2.imshow("win", im) #cv2.waitKey() @@ -2331,12 +2135,10 @@ # (exclusion of duplicates is done later by '_util_merge_Regions') - self._info['Faces'] += data + self._features['Faces'] += data return def _detect_History_EXIF(self): - self._info['History'] = [] - res = self._util_get_DataTags_EXIF() #a = [] @@ -2372,42 +2174,9 @@ pass i += 1 - self._info['History'] = result + self._features['History'] = result return - def _util_get_DataStreams_FFMPEG(self): - if hasattr(self, '_buffer_FFMPEG'): - return self._buffer_FFMPEG - - # (similar as in '_util_get_DataTags_EXIF') -# switch this part off since 'ffprobe' (on toolserver) is too old; TS-1449 -# data = Popen("ffprobe -v quiet -print_format json -show_format -show_streams %s" % self.image_path, - proc = Popen("ffprobe -v quiet -show_format -show_streams %s" % self.image_path,#.replace('%', '%%'), - shell=True, stdout=PIPE)#.stdout.read() - proc.wait() - if proc.returncode == 127: - raise ImportError("ffprobe (ffmpeg) not found!") - data = proc.stdout.read().strip() -# self._buffer_FFMPEG = json.loads(data) - res, key, cur = {}, '', {} - for item in data.splitlines(): - if (item[0] == '['): - if not (item[1] == '/'): - key = item[1:-1] - cur = {} - if key not in res: - res[key] = [] - else: - res[key].append( cur ) - else: - val = item.split('=') - cur[val[0].strip()] = val[1].strip() - if res: - res = { 'streams': res['STREAM'], 'format': res['FORMAT'][0] } - self._buffer_FFMPEG = res - - return self._buffer_FFMPEG - def _util_merge_Regions(self, regs, sub=False, overlap=False, close=False): # sub=False, overlap=False, close=False ; level 0 ; similar regions, similar position (default) # sub=True, overlap=False, close=False ; level 1 ; region contained in other, any shape/size @@ -2508,25 +2277,23 @@ self.image_size = Image.open(self.image_path_JPEG).size # MIME: 'image/x-xcf; charset=binary' - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) - result = {'Format': u'-', 'Pages': 0} - result = { 'Format': u'%s' % self.image_mime[1].upper() } + result = { 'Format': u'%s' % self.image_mime[1].upper(), # DO NOT use ImageMagick (identify) instead of PIL to get these info !! + 'Pages': 0, + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} - class SvgFile(JpegFile): def _convert(self): # SVG: rasterize the SVG to bitmap (MAY BE GET FROM WIKI BY DOWNLOAD?...) @@ -2556,11 +2323,11 @@ self.image_path_JPEG = self.image_path # MIME: 'application/xml; charset=utf-8' - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) + result = {'Format': u'-', 'Pages': 0} # similar to PDF page count OR use BeautifulSoup @@ -2584,23 +2351,30 @@ #self.image_size = (svg.props.width, svg.props.height) - result = { 'Format': valid, - 'Mode': u'-', - 'Palette': u'-', - 'Pages': pc, } + result.update({ 'Format': valid, + 'Mode': u'-', + 'Palette': u'-', + 'Pages': pc, # may be set {{validSVG}} also or do something in bot template to # recognize 'Format=SVG (valid)' ... + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), }) - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} - class PdfFile(JpegFile): + def getFeatures(self): + # optical and other text recognition (tesseract & ocropus, ...) + self._detect_EmbeddedText_poppler() +# self._recognize_OpticalText_ocropus() + # (may be just classify as 'contains text', may be store text, e.g. to wikisource) + + return self._features + def _convert(self): # self._wikidata = self.image._latestInfo # all info wikimedia got from content (mime, sha1, ...) @@ -2613,32 +2387,180 @@ pass # MIME: 'application/pdf; charset=binary' - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) - result = {'Format': u'-', 'Pages': 0} #
http://code.activestate.com/recipes/496837-count-pdf-pages/
#rxcountpages = re.compile(r"$\s*/Type\s*/Page[/\s]", re.MULTILINE|re.DOTALL) rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL) # PDF v. 1.3,1.4,1.5,1.6 pc = len(rxcountpages.findall( file(self.image_path,"rb").read() )) - result = { 'Format': u'PDF', - 'Mode': u'-', - 'Palette': u'-', - 'Pages': pc, } + result = { 'Format': u'PDF', + 'Mode': u'-', + 'Palette': u'-', + 'Pages': pc, + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} + # ./run-test (ocropus/ocropy) + # (in fact all scripts/executables used here are pure python scripts!!!) + def _recognize_OpticalText_ocropus(self): + # optical text recognition (tesseract & ocropus, ...) + # (no full recognition but - at least - just classify as 'contains text') + #
http://www.claraocr.org/de/ocr/ocr-software/open-source-ocr.html
+ #
https://github.com/edsu/ocropy
+ #
http://de.wikipedia.org/wiki/Benutzer:DrTrigonBot/Doku#Categorization
+ # Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...] + # tesseract imagename.tif output + # (it's simpler to run the scripts/executables in own environment/interpreter...) + path = os.path.join(scriptdir, 'dtbext/_ocropus/ocropy') + + curdir = os.path.abspath(os.curdir) + os.chdir(path) + + # binarization + if os.path.exists(os.path.join(path, "temp")): + shutil.rmtree(os.path.join(path, "temp")) + if os.system("ocropus-nlbin %s -o %s" % (self.image_path_JPEG, os.path.join(path, "temp"))): + raise ImportError("ocropus not found!") + + # page level segmentation + if os.system("ocropus-gpageseg --minscale 6.0 '%s'" % os.path.join(path, "temp/????.bin.png")): + # detection error + return + + # raw text line recognition + if os.system("ocropus-lattices --writebestpath '%s'" % os.path.join(path, "temp/????/??????.bin.png")): + # detection error + return + + # language model application + # (optional - improve the raw results by applying a pretrained model) + os.environ['OCROPUS_DATA'] = os.path.join(path, "models/") + if os.system("ocropus-ngraphs '%s'" % os.path.join(path, "temp/????/??????.lattice")): + # detection error + return + + # create hOCR output + if os.system("ocropus-hocr '%s' -o %s" % (os.path.join(path, "temp/????.bin.png"), os.path.join(path, "temp.html"))): + # detection error + return + + ## 'create HTML for debugging (use "firefox temp/index.html" to view)' + ## (optional - generate human readable debug output) + #if os.system("ocropus-visualize-results %s" % os.path.join(path, "temp")): + # # detection error + # return + + # "to see recognition results, type: firefox temp.html" + # "to see details on the recognition process, type: firefox temp/index.html" + tmpfile = open(os.path.join(path, "temp.html"), 'r') + data = tmpfile.read() + tmpfile.close() + + shutil.rmtree(os.path.join(path, "temp")) + os.remove(os.path.join(path, "temp.html")) + + os.chdir(curdir) + + #print data + pywikibot.output(data) + + def _detect_EmbeddedText_poppler(self): + # may be also:
http://www.reportlab.com/software/opensource/rl-toolkit/
+ + # poppler pdftotext/pdfimages + # (similar as in '_util_get_DataTags_EXIF' but with stderr and no json output) + #
http://poppler.freedesktop.org/
+ #
http://www.izzycode.com/bash/how-to-install-pdf2text-on-centos-fedora-redha…
+ # MIGHT BE BETTER TO USE AS PYTHON MODULE: + #
https://launchpad.net/poppler-python/
+ #
http://stackoverflow.com/questions/2732178/extracting-text-from-pdf-with-po…
+ #
http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-t…
+ #proc = Popen("pdftotext -layout %s %s" % (self.image_path, self.image_path+'.txt'), + proc = Popen("pdftotext %s %s" % (self.image_path, self.image_path+'.txt'), + shell=True, stderr=PIPE)#.stderr.readlines() + proc.wait() + if proc.returncode: + raise ImportError("pdftotext not found!") + data = open(self.image_path+'.txt', 'r').readlines() + os.remove( self.image_path+'.txt' ) + +# self._content_text = data + (s1, l1) = (len(u''.join(data)), len(data)) + + tmp_path = os.path.join(os.environ.get('TMP', '/tmp'), 'DrTrigonBot/') + os.mkdir( tmp_path ) +# switch this part off since 'pdfimages' (on toolserver) is too old; TS-1449 +# proc = Popen("pdfimages -p %s %s/" % (self.image_path, tmp_path), + proc = Popen("pdfimages %s %s/" % (self.image_path, tmp_path), + shell=True, stderr=PIPE)#.stderr.readlines() + proc.wait() + if proc.returncode: + raise ImportError("pdfimages not found!") + images = os.listdir( tmp_path ) +# pages = set() + for f in images: +# pages.add( int(f.split('-')[1]) ) + os.remove( os.path.join(tmp_path, f) ) + os.rmdir( tmp_path ) + + ## pdfminer (tools/pdf2txt.py) + ##
http://denis.papathanasiou.org/?p=343
(for layout and images) + #debug = 0 + #laparams = layout.LAParams() + ## + #pdfparser.PDFDocument.debug = debug + #pdfparser.PDFParser.debug = debug + #cmapdb.CMapDB.debug = debug + #pdfinterp.PDFResourceManager.debug = debug + #pdfinterp.PDFPageInterpreter.debug = debug + #pdfdevice.PDFDevice.debug = debug + ## + #rsrcmgr = pdfinterp.PDFResourceManager(caching=True) + #outfp = StringIO.StringIO() + #device = converter.TextConverter(rsrcmgr, outfp, codec='utf-8', laparams=laparams) + ##device = converter.XMLConverter(rsrcmgr, outfp, codec='utf-8', laparams=laparams, outdir=None) + ##device = converter.HTMLConverter(rsrcmgr, outfp, codec='utf-8', scale=1, + ## layoutmode='normal', laparams=laparams, outdir=None) + ##device = pdfdevice.TagExtractor(rsrcmgr, outfp, codec='utf-8') + #fp = file(self.image_path, 'rb') + #try: + # pdfinterp.process_pdf(rsrcmgr, device, fp, set(), maxpages=0, password='', + # caching=True, check_extractable=False) + #except AssertionError: + # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') + # return + #except TypeError: + # pywikibot.warning(u'pdfminer missed, may be corrupt [_detect_EmbeddedText_poppler]') + # return + #fp.close() + #device.close() + #data = outfp.getvalue().splitlines(True) + # + #(s2, l2) = (len(u''.join(data)), len(data)) + + result = { 'Size': s1, + 'Lines': l1, + #'Data': data, + #'Position': pos, +# 'Images': u'%s (on %s page(s))' % (len(images), len(list(pages))), # pages containing images + 'Images': u'%s' % len(images), + 'Type': u'-', } # 'Type' could be u'OCR' above... + + self._features['Text'] = [result] + return + + #class DjvuFile(JpegFile): # pass @@ -2651,35 +2573,29 @@ # general audio feature extraction # self._detect_AudioFeatures_YAAFE() + return self._features + # MIME: 'application/ogg; charset=binary' - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) - result = {'Format': u'-', 'Pages': 0} # 'ffprobe' (ffmpeg); audio and video streams files (ogv, oga, ...) d = self._util_get_DataStreams_FFMPEG() #print d - result['Format'] = u'%s' % d['format']['format_name'].upper() - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + result = { 'Format': u'%s' % d['format']['format_name'].upper(), + 'Pages': 0, + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return def _detect_Streams_FFMPEG(self): - self._info['Streams'] = [] - - # skip file formats that interfere and can cause strange results (pdf is oga?!) - # or file formats not supported (yet?) - if (self.image_mime[1] in ['pdf']) or (self.image_fileext in [u'.svg']): - return - # audio and video streams files (ogv, oga, ...) d = self._util_get_DataStreams_FFMPEG() if not d: @@ -2707,10 +2623,44 @@ 'Dimensions': dim or (None, None), }) - if 'image' not in d["format"]["format_name"]: - self._info['Streams'] = result + if 'image' in d["format"]["format_name"]: + result = [] + self._features['Streams'] = result return + def _util_get_DataStreams_FFMPEG(self): + if hasattr(self, '_buffer_FFMPEG'): + return self._buffer_FFMPEG + + # (similar as in '_util_get_DataTags_EXIF') +# switch this part off since 'ffprobe' (on toolserver) is too old; TS-1449 +# data = Popen("ffprobe -v quiet -print_format json -show_format -show_streams %s" % self.image_path, + proc = Popen("ffprobe -v quiet -show_format -show_streams %s" % self.image_path,#.replace('%', '%%'), + shell=True, stdout=PIPE)#.stdout.read() + proc.wait() + if proc.returncode == 127: + raise ImportError("ffprobe (ffmpeg) not found!") + data = proc.stdout.read().strip() +# self._buffer_FFMPEG = json.loads(data) + res, key, cur = {}, '', {} + for item in data.splitlines(): + if (item[0] == '['): + if not (item[1] == '/'): + key = item[1:-1] + cur = {} + if key not in res: + res[key] = [] + else: + res[key].append( cur ) + else: + val = item.split('=') + cur[val[0].strip()] = val[1].strip() + if res: + res = { 'streams': res['STREAM'], 'format': res['FORMAT'][0] } + self._buffer_FFMPEG = res + + return self._buffer_FFMPEG + def _detect_AudioFeatures_YAAFE(self): #
http://yaafe.sourceforge.net/manual/tools.html
#
http://yaafe.sourceforge.net/manual/quickstart.html
- yaafe.py @@ -2763,12 +2713,6 @@ # $ export YAAFE_PATH=/home/ursin/Desktop/yaafe-v0.64/src_python/ # $ export PYTHONPATH=/home/ursin/Desktop/yaafe-v0.64/src_python - # skip file formats not supported (yet?) - if (self.image_mime[1] in ['ogg']):#, 'midi']): - return - - self._info['Audio'] = [] - import yaafelib as yaafe # use WAV, OGG, MP3 (and others) audio file formats @@ -2865,15 +2809,14 @@ os.remove(fn) # remove folder too... - self._info['Audio'] = [data] + self._features['Audio'] = [data] return class MidiFile(UnknownFile): def getFeatures(self): - result = {} - result.update( self._detect_AudioFeatures_MUSIC21() ) # Audio - return result + self._detect_AudioFeatures_MUSIC21() # Audio + return self._features def _detect_HeaderAndMetadata(self): result = {} @@ -2895,51 +2838,56 @@ result[key].append(ba[i+3:e].decode('latin-1').strip()) result[key] = u'\n'.join(result[key]) + ## find specific info in extracted data + #print [item.strip() for item in re.findall('Generated .*?\n', result['Text'])] + ##u"Cr'eateur: GNU LilyPond 2.0.1" + #import dateutil.parser + #dates = [] + #for line in result['Text'].splitlines(): + # #
http://stackoverflow.com/questions/3276180/extracting-date-from-a-string-in…
+ # try: + # dates.append(dateutil.parser.parse(line, fuzzy=True).isoformat(' ').decode('utf-8')) + # except ValueError: + # pass + #print dates + import _music21 as music21 try: s = music21.converter.parse(self.filename) if s.metadata: pywikibot.output(unicode(s.metadata)) - result['Metadata'].update(s.metadata) + result.update(s.metadata) except music21.midi.base.MidiException: pass - self._info['Metadata'] = [result] - return {'Metadata': [result]} + self._properties['Metadata'] = [result] + return # MIME: 'audio/midi; charset=binary' - def _detect_Properties_PIL(self): + def _detect_Properties(self): """Retrieve as much file property info possible, especially the same as commons does in order to compare if those libraries (ImageMagick, ...) are buggy (thus explicitely use other software for independence)""" - #self.image_size = (None, None) - result = {'Format': u'-', 'Pages': 0} - result['Format'] = u'%s' % self.image_mime[1].upper() + result = { 'Format': u'%s' % self.image_mime[1].upper(), + 'Pages': 0, + 'Dimensions': self.image_size, + 'Filesize': os.path.getsize(self.filename), + 'MIME': u'%s/%s' % tuple(self.image_mime[:2]), } - result['Dimensions'] = self.image_size - result['Filesize'] = os.path.getsize(self.image_path) - result['MIME'] = u'%s/%s' % tuple(self.image_mime[:2]) + #self._properties['Properties'] = [result] + self._properties['Properties'][0].update(result) + return - #self._info['Properties'] = [result] - self._info['Properties'][0].update(result) - return {'Properties': [result]} - # midi audio feature extraction def _detect_AudioFeatures_MUSIC21(self): - # skip file formats not supported - if (self.image_mime[1] not in ['midi']): - return - import _music21 as music21 - #audiofile = '/home/ursin/Desktop/3_Ships.mid' - audiofile = self.image_path - #music21.features.jSymbolic.getCompletionStats() try: - #s = music21.midi.translate.midiFilePathToStream(audiofile) - s = music21.converter.parse(audiofile) + #audiofile = '/home/ursin/Desktop/3_Ships.mid' + #s = music21.midi.translate.midiFilePathToStream(self.filename) + s = music21.converter.parse(self.filename) except music21.midi.base.MidiException: pywikibot.warning(u'unknown file type [_detect_AudioFeatures_MUSIC21]') return @@ -2979,8 +2927,8 @@ #print s.seconds #print s.secondsMap - self._info['Audio'] += [data] - return {'Audio': [data]} + self._features['Audio'] = [data] + return FILETYPES = { '*': UnknownFile, @@ -3218,21 +3166,14 @@ return (u'Graphics', bool(relevance)) -# # Category:MIDI files created with GNU LilyPond -# def _cat_audio_MIDIfilescreatedwithGNULilyPond(self): -# # find metadata in extracted data -# print [item.strip() for item in re.findall('Generated .*?\n', u'\n'.join(result['Text']))] -# #u"Cr'eateur: GNU LilyPond 2.0.1" -# import dateutil.parser -# dates = [] -# for line in result['Text']: -# #
http://stackoverflow.com/questions/3276180/extracting-date-from-a-string-in…
-# try: -# dates.append(dateutil.parser.parse(line, fuzzy=True).isoformat(' ').decode('utf-8')) -# except ValueError: -# pass -# print dates + # Category:MIDI files created with GNU LilyPond + def _cat_meta_MIDIfilescreatedwithGNULilyPond(self): + result = self._info_filter['Metadata'] + relevance = (u"Generated automatically by: GNU LilyPond" in + result[0]['Text']) + return (u'MIDI files created with GNU LilyPond', bool(relevance)) + # Category:Categorized by DrTrigonBot def _addcat_BOT(self): # - ALWAYS - @@ -3611,7 +3552,7 @@ def log_output(self): # ColorRegions always applies here since there is at least 1 (THE average) color... - ignore = ['Properties', 'ColorAverage', 'ColorRegions', 'Geometry'] + ignore = ['Properties', 'Metadata', 'ColorAverage', 'ColorRegions', 'Geometry'] #if not self._existInformation(self._info): # information available? # information available? AND/OR category available? if not (self._existInformation(self._info, ignore = ignore) or self._result_check): @@ -3664,14 +3605,13 @@ return u"\n".join( ret ) def clean_cache(self): -# if os.path.exists(self.image_path): -# os.remove( self.image_path ) -# if os.path.exists(self.image_path_JPEG): -# os.remove( self.image_path_JPEG ) -# #image_path_new = self.image_path_JPEG.replace(u"cache/", u"cache/0_DETECTED_") -# #if os.path.exists(image_path_new): -# # os.remove( image_path_new ) - pass + if os.path.exists(self.image_path): + os.remove( self.image_path ) + #if os.path.exists(self.image_path_JPEG): + # os.remove( self.image_path_JPEG ) + ##image_path_new = self.image_path_JPEG.replace(u"cache/", u"cache/0_DETECTED_") + ##if os.path.exists(image_path_new): + ## os.remove( image_path_new ) # LOOK ALSO AT: checkimages.CatImagesBot.report def report(self): @@ -3799,40 +3739,19 @@ # gather data from all information interfaces def gatherFeatures(self): - self._info['Properties'] = [{'Format': u'-', 'Pages': 0}] - self._info['Metadata'] = [] - self._info['ColorAverage'] = [] - self._info['ColorRegions'] = [] - self._info['Faces'] = [] - self._info['OpticalCodes'] = [] - self._info['People'] = [] - self._info['Chessboard'] = [] - self._info['Text'] = [] - self._info['Streams'] = [] - self._info['Audio'] = [] - self._info['Legs'] = [] - self._info['Hands'] = [] - self._info['Torsos'] = [] - self._info['Ears'] = [] - self._info['Eyes'] = [] - self._info['Automobiles'] = [] - # split detection and extraction according to file types; JpegFile, ... TypeFile = FILETYPES.get(tuple(self.image_mime[:2]), FILETYPES['*']) - with TypeFile(self.image_path, self.image_mime) as tf: - import copy - tf.__dict__.update(copy.deepcopy(self.__dict__)) + with TypeFile(self.image_path) as tf: + tf.image_mime = self.image_mime + tf.image = self.image for func in ['getProperties', 'getFeatures']: result = getattr(tf, func)() - if result: - self._info.update(result) + self._info.update(result) print self._info - print tf._info #print tf.__dict__ - self._info = tf._info self.image_size = tf.image_size - def _existInformation(self, info, ignore = ['Properties', 'ColorAverage']): + def _existInformation(self, info, ignore = ['Properties', 'Metadata', 'ColorAverage']): result = [] for item in info: if item in ignore: @@ -3846,6 +3765,11 @@ result = self._info['Properties'] return {'Properties': result} + def _filter_Metadata(self): + # >>> never drop <<< + result = self._info['Metadata'] + return {'Metadata': result} + def _filter_Faces(self): result = self._info['Faces'] if (len(result) < self._thrhld_group_size):
1
0
0
0
← Newer
1
...
13
14
15
16
17
18
19
...
517
Older →
Jump to page:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
Results per page:
10
25
50
100
200