I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
On the application side, LocalFile's load* and decodeRow methods will have to be changed to support the new table.
One issue to consider is the file archive. Should we replicate the metadata table for file archive? Or serialize the data and store it in a new table (something like fa_metadata)?
Please let us know if you see any issues with this plan. We hope that this will be useful to the MediaWiki project, and a candidate to merge back.
Thanks, Will
On Thu, Dec 1, 2011 at 12:34 PM, William Lee wlee@wikia-inc.com wrote:
I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
On the application side, LocalFile's load* and decodeRow methods will have to be changed to support the new table.
One issue to consider is the file archive. Should we replicate the metadata table for file archive? Or serialize the data and store it in a new table (something like fa_metadata)?
Please let us know if you see any issues with this plan. We hope that this will be useful to the MediaWiki project, and a candidate to merge back.
That was part of bawolff's plan last summer for GSoC when he overhauled our metadata support. He got a lot of his project done, but never quite got to this point. Something we'd definitely like to see though!
-Chad
On 12/01/2011 12:36 PM, Chad wrote:
On Thu, Dec 1, 2011 at 12:34 PM, William Lee wlee@wikia-inc.com wrote:
I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
On the application side, LocalFile's load* and decodeRow methods will have to be changed to support the new table.
One issue to consider is the file archive. Should we replicate the metadata table for file archive? Or serialize the data and store it in a new table (something like fa_metadata)?
Please let us know if you see any issues with this plan. We hope that this will be useful to the MediaWiki project, and a candidate to merge back.
That was part of bawolff's plan last summer for GSoC when he overhauled our metadata support. He got a lot of his project done, but never quite got to this point. Something we'd definitely like to see though!
-Chad
William, https://www.mediawiki.org/wiki/Summer_of_Code_Past_Projects#Improve_metadata... points me to https://www.mediawiki.org/wiki/Special:Code/MediaWiki/86169 and its followups, in case you want to take a look at that.
And I am heartily in favor of merging stuff so that eventually MediaWiki trunk gets to benefit from all your improvements and you don't have to spend as much work on your own maintenance! :-) You've already seen https://www.mediawiki.org/wiki/Wikia_code right?
Thanks!
On 1 December 2011 17:34, William Lee wlee@wikia-inc.com wrote:
I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
This sounds a natural for Commons, too.
- d.
On Thu, 01 Dec 2011 09:34:03 -0800, William Lee wlee@wikia-inc.com wrote:
I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
On the application side, LocalFile's load* and decodeRow methods will have to be changed to support the new table.
One issue to consider is the file archive. Should we replicate the metadata table for file archive? Or serialize the data and store it in a new table (something like fa_metadata)?
Please let us know if you see any issues with this plan. We hope that this will be useful to the MediaWiki project, and a candidate to merge back.
Thanks, Will
imgmeta_name, imgmeta_attribute, imgmeta_value would fit our standards for column naming better.
Why isn't our image table primary key an integer anyways?
We already have columns for the metadata blobs. And I'm assuming that you don't need to query old image versions for metadata. If that's the case then how about leaving the oi_metadata column in use?
On Thu, Dec 1, 2011 at 3:36 PM, Daniel Friesen lists@nadir-seen-fire.comwrote:
Why isn't our image table primary key an integer anyways?
In part, legacy foolishness. :)
Also, the physical storage of images is still tied to the title, so anything that renames already has to run around renaming things. :(
-- brion
Sounds like a good idea to me.
What things are you interested in searching? I'd like to clean up metadata a bit. Except for latitude and longitude, we don't have any notion of what the image metadata means. For example we could use a standard machine-readable notion of creation date, or author, or license.
Also, the current metadata scheme is just serialized PHP, so it allows for rich data structures in values. So a flat key-val store may not be able to hold everything.
On 12/1/11 9:34 AM, William Lee wrote:
I'm a developer at Wikia. We have a use case for searching through a file's metadata. This task is challenging now, because the field Image.img_metadata is a blob.
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
On the application side, LocalFile's load* and decodeRow methods will have to be changed to support the new table.
One issue to consider is the file archive. Should we replicate the metadata table for file archive? Or serialize the data and store it in a new table (something like fa_metadata)?
Please let us know if you see any issues with this plan. We hope that this will be useful to the MediaWiki project, and a candidate to merge back.
Thanks, Will _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Dec 1, 2011 at 6:34 PM, William Lee wlee@wikia-inc.com wrote:
We propose expanding the metadata field into a new table. We propose the name image_metadata. It will have three columns: img_name, attribute (varchar) and value (varchar). It can be joined with Image on img_name.
Per convention this should probably read "file" instead of image, (like is already done with namespaces and the "filearchive" table). Anyway, that's just naming.
A major problem as mentioned before in this thread is a key. Right now files (both the files as an abstract thing or the versions) have a no unique key. All they have is a page title and a timestamp.
This is related to the License-integration project[1] (that name is a bit outdated, it started for license information, but it basically aiming at storing all kinds of file properties).
The first blocker bug would be https://bugzilla.wikimedia.org/show_bug.cgi?id=26741 (image/oldimage to filerevision).
And another one would be to make the file system even more like page/revisions. By giving implementing file ids and filerevision ids.
- Krinkle
wikitech-l@lists.wikimedia.org