-----Original Message----- Date: Sat, 13 Sep 2008 07:58:23 +0100 From: River Tarnell river@wikimedia.org Subject: [Toolserver-l] collaborative report generator? To: toolserver-l@lists.wikimedia.org Message-ID: 20080913065821.GO10237@harmony.local Content-Type: text/plain; charset=us-ascii; x-action=pgp-signed
-----BEGIN PGP SIGNED MESSAGE-----
so, it's quite common that people want various kinds of reports on wikis, which are usually generated by a standard SQL query. in fact, this is so common that we have the "query service" to handle these requests.
this isn't the most efficient way to do it, though; a toolserver user has to run the actual query, and if it's meant to be a regular report, they need to crontab it to run as needed. there's a lot of duplication of effort.
instead, what about a collaborative 'report' tool? i would envisage this working as follows: each query (for example, '100 users with most edits') is described in a file. the exact way isn't that important, but for the sake of example, the file might look like this:
name=Top 100 editors # For a slow query, run it once a day from crontab. Faster queries could be # done on demand (for example, 'when=immediately'), or cached for a certain # period of time (for example, 'cache=1h'). when=nightly query=SELECT ...
then for users, there's a web interface where they can select the wiki and the report they want to run. for queries that need parameters (e.g. those that report on a particular article), they could select the article (preferably with a nice ajaxy input box). then the SQL might look like:
SELECT ... WHERE page_namespace=<NAMESPACE> AND page_title=<TITLE>
<NAMESPACE> and <TITLE> would be filled in by the report generator.
the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature). any project developer would be able to add new queries, which users could request in JIRA.
as the project would have many developers, i envisage it running on the stable server. if people are actually interested in doing this, i'd be willing to create at least the basic framework (hopefully the interface would have several maintainers who would add nice features).
opinions?
- river.
-----BEGIN PGP SIGNATURE-----
This sounds like a really neat and useful idea. Building it sounds like fun for its own sake, too. But when I read it I was struck by the thought that the specification and generation of reports is something that many commercial tools do. (In my real job, one of the things that the software I work with does is feed data warehouses that you can run Business Objects, Cognos, Crystal Reports, and the like against).
Perhaps it might be worth doing an investigation to see if there is a freeware alternative to one of those commercial tools that could be deployed? (let me apologise in advance if this was done already and I missed it!) And if there is none, consider if this project could be built in a way that results in reusable components to make such a thing?
Here is an example:
http://www.jaspersoft.com/downloads.html
I have not evaluated this to see if it's truly "free" enough, or a technical fit, or whatever, I just found it with a quick search, but it suggests that free report generators/designers might be out there if we look.
Here is another one:
Same caveats... (in particular this one is not open source... so presumably a non starter. it's just an example to consider.)
Now of course, we may not want to give the ability to create reports of whatever sort, schedule them, and have them run, consuming who knows how much resource, etc to "just anyone"... maybe we'd want to see some sort of a "design/review/proposal/approval" process in place before a report went live (but we'd want that in any case I'm guessing, even with something done ourselves)
I hope this is helpful. I will say that being able to generate reports that I design would be exceedingly useful functionality regardless of how it is realised.
Larry Pieniazek Hobby mail: Lar at Miltontrainworks dot com
Jaspersoft is one example, most of their software is open source, some of the more user friendly parts isn't tho, There's a fairly interesting sales webex they did a moth or so ago demonstrating what you can do. I've been meaning to try it out with some of the editdata for a while.
There's another open source BI company, Pentaho for whom most of their platform is open source. In both these cases running the software on data will prolly require some translating steps from the base data tho.
Finne/henna
On Sat, Sep 13, 2008 at 3:29 PM, Larry Pieniazek lar@miltontrainworks.com wrote:
-----Original Message----- Date: Sat, 13 Sep 2008 07:58:23 +0100 From: River Tarnell river@wikimedia.org Subject: [Toolserver-l] collaborative report generator? To: toolserver-l@lists.wikimedia.org Message-ID: 20080913065821.GO10237@harmony.local Content-Type: text/plain; charset=us-ascii; x-action=pgp-signed
-----BEGIN PGP SIGNED MESSAGE-----
so, it's quite common that people want various kinds of reports on wikis, which are usually generated by a standard SQL query. in fact, this is so common that we have the "query service" to handle these requests.
this isn't the most efficient way to do it, though; a toolserver user has to run the actual query, and if it's meant to be a regular report, they need to crontab it to run as needed. there's a lot of duplication of effort.
instead, what about a collaborative 'report' tool? i would envisage this working as follows: each query (for example, '100 users with most edits') is described in a file. the exact way isn't that important, but for the sake of example, the file might look like this:
name=Top 100 editors # For a slow query, run it once a day from crontab. Faster queries could be # done on demand (for example, 'when=immediately'), or cached for a certain # period of time (for example, 'cache=1h'). when=nightly query=SELECT ...
then for users, there's a web interface where they can select the wiki and the report they want to run. for queries that need parameters (e.g. those that report on a particular article), they could select the article (preferably with a nice ajaxy input box). then the SQL might look like:
SELECT ... WHERE page_namespace=<NAMESPACE> AND page_title=<TITLE>
<NAMESPACE> and <TITLE> would be filled in by the report generator.
the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature). any project developer would be able to add new queries, which users could request in JIRA.
as the project would have many developers, i envisage it running on the stable server. if people are actually interested in doing this, i'd be willing to create at least the basic framework (hopefully the interface would have several maintainers who would add nice features).
opinions?
- river.
-----BEGIN PGP SIGNATURE-----
This sounds like a really neat and useful idea. Building it sounds like fun for its own sake, too. But when I read it I was struck by the thought that the specification and generation of reports is something that many commercial tools do. (In my real job, one of the things that the software I work with does is feed data warehouses that you can run Business Objects, Cognos, Crystal Reports, and the like against).
Perhaps it might be worth doing an investigation to see if there is a freeware alternative to one of those commercial tools that could be deployed? (let me apologise in advance if this was done already and I missed it!) And if there is none, consider if this project could be built in a way that results in reusable components to make such a thing?
Here is an example:
http://www.jaspersoft.com/downloads.html
I have not evaluated this to see if it's truly "free" enough, or a technical fit, or whatever, I just found it with a quick search, but it suggests that free report generators/designers might be out there if we look.
Here is another one:
Same caveats... (in particular this one is not open source... so presumably a non starter. it's just an example to consider.)
Now of course, we may not want to give the ability to create reports of whatever sort, schedule them, and have them run, consuming who knows how much resource, etc to "just anyone"... maybe we'd want to see some sort of a "design/review/proposal/approval" process in place before a report went live (but we'd want that in any case I'm guessing, even with something done ourselves)
I hope this is helpful. I will say that being able to generate reports that I design would be exceedingly useful functionality regardless of how it is realised.
Larry Pieniazek Hobby mail: Lar at Miltontrainworks dot com
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Larry Pieniazek:
I was struck by the thought that the specification and generation of reports is something that many commercial tools do.
while i imagine there are several tools that could be adapted to our use, i think it makes more sense to NIH this, for the usual reasons:
- - it's very simple; maybe even easier than learning and adapting an existing system - - we can make it work exactly how we want (in particular, being easy for users to use) - - the MediaWiki schema is kind of weird and teaching external software to use it is a pain - - it's fun ;)
- river.
-----Original Message----- From: River Tarnell [mailto:river@wikimedia.org] Sent: Saturday, September 13, 2008 4:10 PM To: lar@miltontrainworks.com; toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] collaborative report generator?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Larry Pieniazek:
I was struck by the thought that the specification and generation of reports is something that many commercial tools do.
while i imagine there are several tools that could be adapted to our use, i think it makes more sense to NIH this
As a consultant in real life, I woud be remiss not to raise this as a possibility... why reinvent wheels unless one needs to? But on the other hand... if it IS needful, an examination of why often flushes out additional requirements and good stuff.
for the usual reasons:
- it's very simple; maybe even easier than learning and
adapting an existing system
It depends on how powerful of a report generator/designer you want... a simple one is easy enough to do but one as robust as Cognos isn't.... it would be a major development effort. What are the requirements here? If simplicity is all that's needed, then yes.
- we can make it work exactly how we want (in particular,
being easy for users to use)
This is a strong plus for doing it ourselves, especially since we don't necessairly want random end users running random queries.
- the MediaWiki schema is kind of weird and teaching
external software to use it is a pain
Most of these tools run against MySQL, not against anything else. So would any query tool we did ourselves, presumably, so this isn't a factor. MW has a weird schema, but that's going to be the same either way
- it's fun ;)
This is a strong plus for doing it ourselves too!
In the end, I have no idea yet what the right answer is but talking about it should help.
Larry Pieniazek Hobby mail: Lar at Miltontrainworks dot com
toolserver-l@lists.wikimedia.org