-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
so, it's quite common that people want various kinds of reports on wikis, which are usually generated by a standard SQL query. in fact, this is so common that we have the "query service" to handle these requests.
this isn't the most efficient way to do it, though; a toolserver user has to run the actual query, and if it's meant to be a regular report, they need to crontab it to run as needed. there's a lot of duplication of effort.
instead, what about a collaborative 'report' tool? i would envisage this working as follows: each query (for example, '100 users with most edits') is described in a file. the exact way isn't that important, but for the sake of example, the file might look like this:
name=Top 100 editors # For a slow query, run it once a day from crontab. Faster queries could be # done on demand (for example, 'when=immediately'), or cached for a certain # period of time (for example, 'cache=1h'). when=nightly query=SELECT ...
then for users, there's a web interface where they can select the wiki and the report they want to run. for queries that need parameters (e.g. those that report on a particular article), they could select the article (preferably with a nice ajaxy input box). then the SQL might look like:
SELECT ... WHERE page_namespace=<NAMESPACE> AND page_title=<TITLE>
<NAMESPACE> and <TITLE> would be filled in by the report generator.
the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature). any project developer would be able to add new queries, which users could request in JIRA.
as the project would have many developers, i envisage it running on the stable server. if people are actually interested in doing this, i'd be willing to create at least the basic framework (hopefully the interface would have several maintainers who would add nice features).
opinions?
- river.
On Sat, Sep 13, 2008 at 14:58, River Tarnell river@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature).
"Easy-to-read" is commonly hard or sometimes even impossible to do with pure SQL, so it should have a place for code to transform the data to display it more efficiently. Top100 is a simple example, but what about queries that, say, will have a "type" column that is percepted much better if it was a color line, or a "status" column with sort of a progress bar? (I can't think of such a task right now, but I'm certain that the issue would arise at some point anyway.)
— Kalan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Kalan:
"Easy-to-read" is commonly hard or sometimes even impossible to do with pure SQL, so it should have a place for code to transform the data to display it more efficiently.
this would probably be needed even for simple queries, e.g. to translate page_namespace+page_title into an actual title.
- river.
Il giorno 13/set/08, alle ore 09:35, River Tarnell ha scritto:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Kalan:
"Easy-to-read" is commonly hard or sometimes even impossible to do with pure SQL, so it should have a place for code to transform the data to display it more efficiently.
this would probably be needed even for simple queries, e.g. to translate page_namespace+page_title into an actual title.
I have written some code for that, in my intersectContribs.php. Here is it (generalized):
function getNamespacesForHost($host) { $serData = file_get_contents("http://$host/w/api.php? action=query&meta=siteinfo&siprop=namespaces&format=php", 'r'); $nsData = unserialize($serData); return $nsData; }
$nsArray = getNamespacesForHost($wikihost); // The host is defined somewhere
$curPageName = $MY_page_title; // Insert here the page title $curPageNamespace = $MY_page_namespace; // Insert here the namespace number $curPageNamespaceName = $nsArray['query']['namespaces'] [$curPageNamespace]['*']; if($curPageNamespaceName) // If the NS is not 0... $pageTitle = $curPageNamespaceName . ":" . $curPageName; else $pageTitle = $curPageName; print "<li><a href="http://$wikihost/wiki/$pageTitle%5C%22%3E$pageTitle</ a></li>";
For this I use the API, because namespaces in different wikis are different.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (SunOS)
iEYEARECAAYFAkjLbT0ACgkQIXd7fCuc5vIukwCeLN54e6BkYlaq0gPu3AYbhh5R mYgAoIldxpPS+XFmVJDykWChqDPTUtjX =7T/w -----END PGP SIGNATURE-----
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Pietrodn powerpdn@gmail.com
On Sat, Sep 13, 2008 at 16:35, Pietrodn powerpdn@gmail.com wrote:
For this I use the API, because namespaces in different wikis are different.
You'd better use the toolserver.namespace table from "sql" server.
— Kalan
I personally would prefer that there would be choice of CSV-format, XML-format and custom format which would be handled with http://www.php.net/vprintf. This method wouldn't limit query select, and looping through result set could allow any formatting like "# [[User:%2s|%2s]] ({{formatnum:%3s}} edits" for those who doesn't have OpenOffice Calc, Microsoft Excel or similar for parsing CSV-data. Output also should always be displayed "text/plain" (because of bots) and tool should allow GET parameters. I'm not sure how I have time but I would certainly be interested at least participating tool creation (and perhaps maintaining also).
- Agony
2008/9/13 Kalan kalan.001@gmail.com
On Sat, Sep 13, 2008 at 16:35, Pietrodn powerpdn@gmail.com wrote:
For this I use the API, because namespaces in different wikis are different.
You'd better use the toolserver.namespace table from "sql" server.
— Kalan _______________________________________________ Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Thank you for the hint! I will use it.
Il giorno 13/set/08, alle ore 10:46, Kalan ha scritto:
On Sat, Sep 13, 2008 at 16:35, Pietrodn powerpdn@gmail.com wrote:
For this I use the API, because namespaces in different wikis are different.
You'd better use the toolserver.namespace table from "sql" server.
— Kalan _______________________________________________ Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Pietrodn powerpdn@gmail.com
That seems cool. Are you going to write it in PHP?
One problem is the data visualisation. We can simply create a wiki (un)ordered list of links (with * or #). We can define the format in the SELECT clause with a CONCAT instruction. I think we shouldn't give the output in CSV/XML, because the most frequent uses of lists are: * Manual use from wiki users * Automatic use with bot (which require a list of links, too)
I'm sorry I can't help with the writing of code, my PHP/CGI experience is very limited.
Regards
Il giorno 13/set/08, alle ore 08:58, River Tarnell ha scritto:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
so, it's quite common that people want various kinds of reports on wikis, which are usually generated by a standard SQL query. in fact, this is so common that we have the "query service" to handle these requests.
this isn't the most efficient way to do it, though; a toolserver user has to run the actual query, and if it's meant to be a regular report, they need to crontab it to run as needed. there's a lot of duplication of effort.
instead, what about a collaborative 'report' tool? i would envisage this working as follows: each query (for example, '100 users with most edits') is described in a file. the exact way isn't that important, but for the sake of example, the file might look like this:
name=Top 100 editors # For a slow query, run it once a day from crontab. Faster queries could be # done on demand (for example, 'when=immediately'), or cached for a certain # period of time (for example, 'cache=1h'). when=nightly query=SELECT ...
then for users, there's a web interface where they can select the wiki and the report they want to run. for queries that need parameters (e.g. those that report on a particular article), they could select the article (preferably with a nice ajaxy input box). then the SQL might look like:
SELECT ... WHERE page_namespace=<NAMESPACE> AND page_title=<TITLE>
<NAMESPACE> and <TITLE> would be filled in by the report generator.
the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature). any project developer would be able to add new queries, which users could request in JIRA.
as the project would have many developers, i envisage it running on the stable server. if people are actually interested in doing this, i'd be willing to create at least the basic framework (hopefully the interface would have several maintainers who would add nice features).
opinions?
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (SunOS)
iEYEARECAAYFAkjLZI0ACgkQIXd7fCuc5vIsjwCgtnOixZhmqE+SNzQ/vnliTdzF f7wAmwcdE6ctpH8//aMg79BcsUJuxQNp =Q0vB -----END PGP SIGNATURE-----
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Pietrodn powerpdn@gmail.com
On Saturday 13 September 2008 08:58:23 River Tarnell wrote:
instead, what about a collaborative 'report' tool? i would envisage this working as follows: each query (for example, '100 users with most edits') is described in a file. the exact way isn't that important, but for the sake of example, the file might look like this:
Yes, this is an excellent idea! :) Most of my tools (http://toolserver.org/~nikola/ - articlesby, grep, namestats, whatlinks) are built basically in the way you describe. And I have ideas for more, but they would surely be easier to make and use with the report tool.
then for users, there's a web interface where they can select the wiki and the report they want to run. for queries that need parameters (e.g. those that report on a particular article), they could select the article (preferably with a nice ajaxy input box). then the SQL might look like:
SELECT ... WHERE page_namespace=<NAMESPACE> AND page_title=<TITLE>
<NAMESPACE> and <TITLE> would be filled in by the report generator.
the web interface would display the result of the query in a nice, easy-to-read table (and probably with some kind of XML or CSV export feature). any project developer would be able to add new queries, which users could request in JIRA.
Add wiki syntax to the list of exports :)
Have some practical ideas, will write about them later.
Hi there,
I think it would be best to create the result of queries in PHP. It would be best if the result files and the query files are wrote in the same language. The easiest way in PHP of creating such files would be in the style of php.in and parse them with parse_ini_file().
Slow queries as river wrote would be running nightly from an extra script and the result cached for 24 hours in toolserver database. For other query which should be created on demand but stored for the cache time (e.g. 1 hour) in the database. The smaller would be created on demand as river wrote.
You are looking for interface designers? As I am doing mostly the user interfaces at work and not much scriptwriting I could join the team.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
okay, so i've created a basic version of this that you can find at http://toolserver.org/~river/cgi-bin/reports. the sample reports (not very useful, just for examples) are in ~river/reports/*.query. see 'README' for an example of the report format. (this is still very preliminary, i imagine there are more user-friendly ways to specify fields and variables.)
the source is in ~river/reports/python/.
this is just a prototype, so there's probably not much point reporting bugs (unless you happen to find me on IRC, in which case you might as well).
- river.
On Saturday 13 September 2008 17:09:58 River Tarnell wrote:
okay, so i've created a basic version of this that you can find at http://toolserver.org/~river/cgi-bin/reports. the sample reports (not very useful, just for examples) are in ~river/reports/*.query. see 'README' for an example of the report format. (this is still very preliminary, i imagine there are more user-friendly ways to specify fields and variables.)
the source is in ~river/reports/python/.
I don't see anything in ~river/reports
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Nikola Smolenski:
I don't see anything in ~river/reports
you should be able to:
river@hemlock:~>ls ~river/reports/ first100pages.query first100users.query last100edits.query python README top100users.query river@hemlock:~>ls -ld ~ ~/reports drwx--x--x+ 67 river users 191 2008-09-13 16:10 /home/river drwxr-xr-x+ 3 river users 8 2008-09-13 15:24 /home/river/reports
- river.
On Saturday 13 September 2008 18:46:50 River Tarnell wrote:
Nikola Smolenski:
I don't see anything in ~river/reports
you should be able to:
river@hemlock:~>ls ~river/reports/ first100pages.query first100users.query last100edits.query python README top100users.query river@hemlock:~>ls -ld ~ ~/reports drwx--x--x+ 67 river users 191 2008-09-13 16:10 /home/river drwxr-xr-x+ 3 river users 8 2008-09-13 15:24 /home/river/reports
Yet I don't:
nikola@kompjuter:/ram> wget http://toolserver.org/~river/reports/ --23:13:12-- http://toolserver.org/~river/reports/ => `index.html' Resolving toolserver.org... 91.198.174.203, 2620:0:862:101:91:198:174:203 Connecting to toolserver.org|91.198.174.203|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 439 [text/html]
100%[=============================>] 439 --.--K/s
23:13:13 (17.63 MB/s) - `index.html' saved [439/439]
index.html attached.
Perhaps the files are not world-readable?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Nikola Smolenski:
nikola@kompjuter:/ram> wget http://toolserver.org/~river/reports/
look at the local directory ~river/reports, not the URL.
(i.e.: /home/river/reports/)
- river.
On Saturday 13 September 2008 23:15:23 River Tarnell wrote:
Nikola Smolenski:
nikola@kompjuter:/ram> wget http://toolserver.org/~river/reports/
look at the local directory ~river/reports, not the URL.
(i.e.: /home/river/reports/)
Ah, OK. (BTW, now the tool itself doesn't work...)
I imagined a different syntax, similar to wikisyntax, but this is fine too. To comment on it:
# First, one should be able to include comments, for example like this :) %name Last 100 edits %description Shows the last 100 edits for a user %category Users # Probably, this should be changed into 'categories', as one tool can # realistically be in several categories, as is the case with Interiot's tstoc
%query SELECT page_namespace, page_title, rev_timestamp FROM page, revision WHERE rev_user_text=%(username)s AND rev_page=page_id ORDER BY rev_timestamp DESC LIMIT 100 %end # I don't understand, in %(username)s what is the 's' at the end doing? # Also, there is some potential for clash here, as % is modulo operator in # MySQL. It will be rarely used, but still...
%variable username, username, Username # The final Username is name displayed in the form, but what are the first # two? # Also, what is needed is the ability to specify variable type. Username # could be of type 'text', but there is also need for booleans (checkbox), # arrays (radio buttons), maybe more.
%fields page, Page, page_namespace, page_title timestamp, Time, rev_timestamp %end # I don't understand this, as I can't look at the tool's output. # I imagined that here, one would be able to specify patterns that would # shape the output, similar to %query. For example: # %report # html: <td><a href="http://%(wiki_url)/wiki/%(page_title)">%(page_title)</a></td> # wiki: * [[%(page_title)]] # %end # Variable names here would be column names from the query, + variable names # from the form, given in %variable, + "magic" variables like this %(wiki_url)
%variable username, username, Username # Don't get why is this repeated.
One more thing is that the list of wikis should be handled better, and should be on the same page as other form elements.
Also, how are new reports made by different users going to be handled? Everyone would just put them in reports directory in his home dir?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Nikola Smolenski:
Ah, OK. (BTW, now the tool itself doesn't work...)
(fixed)
I imagined a different syntax, similar to wikisyntax, but this is fine too. To comment on it:
i think a lot of your comments are answered in the README (/home/river/reports/README). nonetheless, i will answer again here with some rationale, since it's not obvious why something might be done a particular way.
# First, one should be able to include comments, for example like this :)
in fact, everything in the file that's not a %directive is a comment.
%name Last 100 edits %description Shows the last 100 edits for a user %category Users # Probably, this should be changed into 'categories', as one tool can # realistically be in several categories, as is the case with Interiot's tstoc
well, the category is used to sort the reports in the web interface: http://toolserver.org/~river/cgi-bin/reports?wiki=dewiki_p having multiple categories would mean a report appeared twice in the list. (which might not be a bad thing when there are many reports, perhaps...)
# I don't understand, in %(username)s what is the 's' at the end doing?
that indicates it's a string (Python database query format syntax). MySQLdb only supports strings, so it's somewhat redundant, but still required. (it might be possible to implement a custom format syntax, but it would have to be done very carefully to avoid SQL injection.)
# Also, there is some potential for clash here, as % is modulo operator in # MySQL. It will be rarely used, but still...
unfortunate, but any syntax is going to have that problem at some point. (btw, you can write '%%' for a literal %.)
%variable username, username, Username # The final Username is name displayed in the form, but what are the first # two?
# Also, what is needed is the ability to specify variable type. Username # could be of type 'text', but there is also need for booleans (checkbox), # arrays (radio buttons), maybe more.
%variable <name>, <type>, <display name>
the only type right now is username, which is just a text box. but in the future, i imagine this would have some kind of auto-complete for usernames.
%fields page, Page, page_namespace, page_title timestamp, Time, rev_timestamp %end
# I don't understand this, as I can't look at the tool's output.
it's explained in the README; basically, <type>, <title>, <params...> for example 'page' takes two params, the SQL namespace and title columns, and constructs a title from them.
# I imagined that here, one would be able to specify patterns that would # shape the output, similar to %query. For example: # %report # html: <td><a href="http://%(wiki_url)/wiki/%(page_title)">%(page_title)</a></td> # wiki: * [[%(page_title)]] # %end # Variable names here would be column names from the query, + variable names # from the form, given in %variable, + "magic" variables like this %(wiki_url)
i don't really like the idea of embedding markup in every query, since it means any change to the markup will require editing every query (and hoping there aren't any mistakes). in general i think most reports can be done using the different %field formatters (even more complicated ones, like 'thumb' to embed a thumbnail of an image). the aim isn't to replace _every_ report tool; very complicated reports can still remain as separate tools.
One more thing is that the list of wikis should be handled better, and should be on the same page as other form elements.
it can't easily be, because the list of reports depends on which wiki you select (some reports might only make sense on one wiki).
the list of wikis could be done better; i haven't done any work on the interface yet, until the logic is fairly complete.
Also, how are new reports made by different users going to be handled? Everyone would just put them in reports directory in his home dir?
when the tool moves to stable, reports will be in (for example) /projects/reports/reports/*.query. all tool developers will have access to that directory.
- river.
2008/9/13 River Tarnell river@wikimedia.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
<snip>
opinions?
- river.
One slightly offshoot thought here...
A while ago I've been looking for open source reporting/BI frameworks (as opposed to commercial ones like Business Objects, Hyperion, etc.) and failed to turn in any satisfiable results (all findings were either only semi-open or in early development stages).
So I figured there's possibly a niche that a determined team of developers could fill in. For basics, I have wrapped my mind around possibly employing J2EE for this, but that may be because I'm recently into Enterprise JavaBeans and stuff (still, the EJB approach provides a good base for abstraction and separation of the model (data access and queries), view (presentation or results: XML, wikitext, etc.) and controller (user queries, cron triggers)). As far as I remember, we have the GlassFish AppServ installed on stable.ts?
In essence, such a framework would be a useful "byproduct" of this project. ;-)
Just some thoughts...
Regards, Misza
toolserver-l@lists.wikimedia.org