Re: [Wikitech-l] Proposal for new table image_metadata

6 Dec 2011


      On Thu, Dec 1, 2011 at 8:49 PM, bawolff bawolff+wn@gmail.com wrote:
...
Thus, just storing a table of key/value pairs is kind of problematic -
how do you store an "array" value. Additionally you have to consider
finding info. You probably want to efficiently be able to search
through lang values in a specific language, or for a specific property
and not caring for the language.
Two easiest things based on my previous experience:
1) separate values with \x00, making them easy to split after extracting a
row
2) store multiple entries with an index field, making it easy to query for
potentially multiples
...
Also consider how big a metadata field can get. Theoretically it's not
really limited, well I don't expect it to be huge, > 255 bytes of
utf-8 seems a totally reasonable size for a value of a metadata field.
Last of all, you have to keep in mind all sorts of stuff is stored in
the img_metadata. This includes things like the text layer of Djvu
files (although arguably that shouldn't be stored there...) and other
handler specific things (OggHandler stores some very complex
structures in img_metadata). Of course, we could just keep the
img_metadata blob there, and simply stop using it for "exif-like"
data, but continue using it for handler specific ugly metadata that's
generally invisible to user [probably a good idea. The two types of
data are actually quite different].
On text: DjVu and PDF files can optionally contain flattened searchable
text, which we extract so it can be used for things like
Extension:ProofreadPage and, potentially, search indexing:
https://bugzilla.wikimedia.org/showdependencytree.cgi?id=21061&hide_reso...
Currently this gets stuffed into the metadata blob along with the exif data
etc, and can make metadata blobs *very* large if there are hundreds of
pages of text.
If extracted page text is stored in a better key-value store, we should
make sure it doesn't get pulled in to backwards-compatible metadata blobs
(if we keep em around as they are now) -- but they should be accessible
through some API.
...
One issue to consider is the file archive. Should we replicate the
metadata
...
table for file archive? Or serialize the data and store it in a new
table
...
(something like fa_metadata)?
Honestly, I wouldn't worry about that, especially in the beginning. As
far as i know, the only place fa_metadata/oi_metadata is used, is that
you can request it via api (I suppose it's copied over during file
reverts as well). I don't think anyone uses that field on archived
images really. (maybe one day bug 26741 will be fixed and this would
be less of a concern).
That reminds me: ForeignAPIRepo (InstantCommons) wants to be able to
transfer the metadata at least for current versions; API formats should
remain compatible if possible in order for data to continue to be
transferred to clients running old versions.
-- brion

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Proposal for new table image_metadata