Hi all,
as you may know I've been involved in the structured data community for a few years (through the original "Wikidata" proposal in 2004 as well as architecting and developing OmegaWiki, together with the OpenProgress team and others from 2005-2007). I've been following Semantic MediaWiki, Freebase and other projects from the beginning. You don't need to sell me on the value or importance of structured data.
The problem space is very complex, especially when taking into account that Wikimedia is a fully multilingual system. There are still low hanging fruits, especially for a project like Wikimedia Commons, but I agree w/ Michael that a more holistic approach to how to access and manage data in WMF projects is much preferable to, for example, throwing SMW into some wikis and not others, etc.
When I joined WMF, I couldn't justify arguing for higher priority on data tech projects more so than, for example, the 2009-10 usability initiative and continuing efforts in this area, especially given that we still have only a tiny engineering staff. I don't believe that structured data is going to be the principal driver of participation -- that problem space is more about social and technical barriers to entry, interaction with new users, mentoring, etc. And we're continuing to fall behind the rest of the web in terms of usability.
That being said, it's clear that it's a key enabling technology (including for _some_ usability improvements, although many of them can be made without a full-fledged structured data support system). I particularly think it has huge potential in bootstrapping small languages by more closely interconnecting useful and translatable bits of information (start a page about "Germany" in a new language and immediately pull all relevant data, possibly including translations of labels if available).
Danese and I have been working on a "Data Summit" this year to bring together both the key players in the structured data field (DBPedia, SMW, etc.), as well as some of the research and analytics community. Unfortunately we've had to reschedule it, but it'll happen in Q1 2011. We're not going to be able to dedicate lots of resources to engineering in this area in the near future, but since there are already so many disparate efforts that focus on making WP data usable, we do hope that we can partner up with others to move things forward.
In a nutshell, I think we should aim to establish a “Wikidata Commons” project at data.wikimedia.org which serves all Wikimedia projects with structured data in a language-neutral fashion, analog to “Wikimedia Commons” for multimedia files, and which becomes the central location to curate, maintain and discuss such data. Wikidata Commons should provide standard interfaces for querying, importing, and exporting data. This project could be built incrementally (starting with clunky but reasonably future-proof ways to manage and retrieve data).
The key challenges as I see them continue to be, as ever: 1) maintaining predictable and reasonable system performance as the DB scales, more and increasingly complex queries are performed, etc., 2) consistently improving rather than degrading user experience, 3) handling multilingual representations of all translatable content well without giving undue prominence to any one language, 4) effectively caching and purging data wherever it's used, 5) versioning/transactioning relational data to be maximally useful and conducive to collaboration.
Earlier this week, Danese and I met with Denny Vrandecic from SMW, who's recently put together a prototype called "Shortipedia" that allows language-independent (using multilingual labels) annotation of concepts with SMW-style properties through a minimal form-based interface, interfacing with whichever triple store is configured for SMW. It's still very much a hack, and he's aiming to clean it up for the summit. But it looks potentially very interesting, and like a concept we could rally energy behind. The data from such a repository could then be pulled into WP templates, accessed through "wizards" that auto-generate template data for new articles, etc.
Anyone who wants to advance the thinking in this space should also consider what can be done today with Wikimedia Commons and SMW. Since Wikimedia Commons is an intrinsically multilingual database with focus on annotating individual files, its operational requirements are somewhat different from those of most other projects. It would be useful to have an instance of SMW running using a copy of the Wikimedia Commons database and possibly Semantic Forms to see what such annotation could look like in practice. Anyone with time and technical skills can put together prototypes like this that'll help us move forward.
Again, I think the likely path forward here is for us to ally effectively with the key players in the space, rather than doing all the work ourselves.