Hi everyone,
I just noticed https://secure.wikimedia.org/wikipedia/commons/wiki/Commons:Village_pump#Use... so I wondered where the data comes from. I checked the commonswiki_p database and it looks like user_properties is visible now. I always considered the user_properties information privacy sensitive information. It contains the gender, language and timezone of every user. Information I'm sure a lot of people don't want to have exposed. Please remove this table from the view.
Maarten
gender is already public if set.
John
On Sat, Jun 25, 2011 at 8:05 PM, Maarten Dammers maarten@mdammers.nlwrote:
Hi everyone,
I just noticed
https://secure.wikimedia.org/wikipedia/commons/wiki/Commons:Village_pump#Use... so I wondered where the data comes from. I checked the commonswiki_p database and it looks like user_properties is visible now. I always considered the user_properties information privacy sensitive information. It contains the gender, language and timezone of every user. Information I'm sure a lot of people don't want to have exposed. Please remove this table from the view.
Maarten
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Maarten Dammers wrote:
Hi everyone,
I just noticed https://secure.wikimedia.org/wikipedia/commons/wiki/Commons:Village_pump #User_preferences so I wondered where the data comes from. I checked the commonswiki_p database and it looks like user_properties is visible now. I always considered the user_properties information privacy sensitive information. It contains the gender, language and timezone of every user. Information I'm sure a lot of people don't want to have exposed. Please remove this table from the view.
Maarten
Please look at the table contents.
The public view of user_properties on the Toolserver only contains gender, nothing else.
In addition to user_properties, Toolserver also features a custom view, named "user_properties_anonym".
That view contains all (if not, most) preferneces, but without the user-id. This is were the commons-page got it's information from.
-- Timo
Hi Timo,
Op 26-6-2011 2:23, Krinkle schreef:
The public view of user_properties on the Toolserver only contains gender, nothing else.
Sure? mysql> select distinct(up_property) from user_properties limit 10; +----------------+ | up_property | +----------------+ | gender | | language | | nickname | | skin | | timecorrection | +----------------+ 5 rows in set (19.56 sec)
Let's take a not so random user:
select * from user_properties JOIN user ON up_user=user_id WHERE user_name='Multichill' limit 10; +---------+----------------+----------------------------------------------------------------+---------+------------+-------------------+----------------+ | up_user | up_property | up_value | user_id | user_name | user_registration | user_editcount | +---------+----------------+----------------------------------------------------------------+---------+------------+-------------------+----------------+ | 115072 | gender | male | 115072 | Multichill | 20070207171241 | 46493 | | 115072 | language | nl | 115072 | Multichill | 20070207171241 | 46493 | | 115072 | nickname | [[User:Multichill|Multichill]] ([[User talk:Multichill|talk]]) | 115072 | Multichill | 20070207171241 | 46493 | | 115072 | timecorrection | ZoneInfo|60|Europe/Amsterdam | 115072 | Multichill | 20070207171241 | 46493 | +---------+----------------+----------------------------------------------------------------+---------+------------+-------------------+----------------+ 4 rows in set (0.02 sec)
Maarten
In addition to user_properties, Toolserver also features a custom view, named "user_properties_anonym".
That view contains all (if not, most) preferneces, but without the user-id. This is were the commons-page got it's information from.
-- Timo
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, At Sunday 26 June 2011 16:51:29 Maarten wrote:
Sure? mysql> select distinct(up_property) from user_properties limit 10; +----------------+
| up_property |
+----------------+
| gender | | language | | nickname | | skin | | timecorrection |
+----------------+ 5 rows in set (19.56 sec)
the table works with a white-list. At the moment only the above fields are visable, which are (in my opinion) non-critical. The only may-be-critical-field is the gender – but the gender is displayed by mediawiki itself so there is no point to hide it on the toolserver. If you think that one of the fields is a problem, I would like to hear your arguments :-).
Sincerly, DaB.
Actually, this is a boon for delivery bots. The bots can get the signatures automatically without having to rely on the user entering it.
-ManishEarth
On Sun, Jun 26, 2011 at 8:28 PM, DaB. WP@daniel.baur4.info wrote:
Hello, At Sunday 26 June 2011 16:51:29 Maarten wrote:
Sure? mysql> select distinct(up_property) from user_properties limit 10; +----------------+
| up_property |
+----------------+
| gender | | language | | nickname | | skin | | timecorrection |
+----------------+ 5 rows in set (19.56 sec)
the table works with a white-list. At the moment only the above fields are visable, which are (in my opinion) non-critical. The only may-be-critical-field is the gender – but the gender is displayed by mediawiki itself so there is no point to hide it on the toolserver. If you think that one of the fields is a problem, I would like to hear your arguments :-).
Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hi everyone.
I try to help to prepare statistics for a report to Wikimania, so we use selective list of `page_id`s and try to analyze revisions, logs and others info from ruwiki_p database. But there are some strange problems in using db.
For example, by http://www.mediawiki.org/wiki/Manual:Logging_table in table `logging` should be field `log_page` similar to `page_id`, but there is no such...
Also by http://www.mediawiki.org/wiki/Manual:Archive_table in table `archive` should be field `ar_comment`, but there is no such...
Maybe there are other bugs in db, but I found just these two, and they create difficulties in our work. Can anybody fix them? or what can you advice to do?
Dmitry.
In addition (dbname = ruwiki_p):
mysql> DESCRIBE logging; +---------------+---------------------+------+-----+----------------+------- + | Field | Type | Null | Key | Default | Extra | +---------------+---------------------+------+-----+----------------+------- + | log_id | int(10) unsigned | NO | | 0 | | | log_type | varchar(32) | NO | | | | | log_action | varchar(32) | NO | | | | | log_timestamp | varchar(14) | NO | | 19700101000000 | | | log_user | int(10) unsigned | NO | | 0 | | | log_namespace | int(11) | NO | | 0 | | | log_deleted | tinyint(3) unsigned | NO | | 0 | | | log_user_text | varchar(255) | NO | | | | | log_title | varchar(255) | NO | | | | | log_comment | varchar(255) | NO | | | | | log_params | longblob | NO | | NULL | | +---------------+---------------------+------+-----+----------------+------- + 11 rows in set (0.00 sec)
mysql> describe archive; +---------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------+------------------+------+-----+---------+-------+ | ar_namespace | int(11) | NO | | 0 | | | ar_title | varchar(255) | NO | | | | | ar_user | int(5) unsigned | NO | | 0 | | | ar_user_text | varchar(255) | NO | | | | | ar_timestamp | varchar(14) | NO | | | | | ar_minor_edit | tinyint(1) | NO | | 0 | | | ar_flags | tinyblob | NO | | NULL | | | ar_rev_id | int(8) unsigned | YES | | NULL | | | ar_len | int(8) unsigned | YES | | NULL | | | ar_page_id | int(10) unsigned | YES | | NULL | | | ar_parent_id | int(10) unsigned | YES | | NULL | | +---------------+------------------+------+-----+---------+-------+ 11 rows in set (0.00 sec)
Hello, At Friday 15 July 2011 17:33:54 DaB. wrote:
Also by http://www.mediawiki.org/wiki/Manual:Archive_table in table `archive` should be field `ar_comment`, but there is no such...
the field is not visable because it may contain critical data (like insults, personal data, copyviolations, etc. pp.) which was deleted long before (most times: Before there was Oversignting). For example: The word "asshole" is found in 1463 comments. So I'm sorry but we can't show this field to normal users.
Sincerly, DaB.
What about logging.log_page? I don't see it in enwiki_p either.
-Aaron
On Fri, Jul 15, 2011 at 8:37 AM, DaB. WP@daniel.baur4.info wrote:
Hello, At Friday 15 July 2011 17:33:54 DaB. wrote:
Also by http://www.mediawiki.org/wiki/Manual:Archive_table in table `archive` should be field `ar_comment`, but there is no such...
the field is not visable because it may contain critical data (like insults, personal data, copyviolations, etc. pp.) which was deleted long before (most times: Before there was Oversignting). For example: The word "asshole" is found in 1463 comments. So I'm sorry but we can't show this field to normal users.
Sincerly, DaB.
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Aaron Halfaker wrote:
What about logging.log_page? I don't see it in enwiki_p either.
Huh. The reference to logging.log_page required me to go to mediawiki.org and figure out what you were talking about. I guess at some point logging.log_page was added as a reference to page.page_id. It looks like there's an index on it as well (CREATE INDEX /*i*/log_page_id_time ON /*_*/logging (log_page,log_timestamp);).
It's not part of the logging view currently on the Toolserver. You can file a ticket in JIRA if you'd like to see the column exposed (https://jira.toolserver.org). Even if the logging view is modified to expose that column, I can't imagine it'd be very helpful. I don't know if Wikimedia has that index (or that column, for that matter) and I don't imagine it's been back-populated. I don't imagine it'd even be possible to back-populate in most cases.
When joining against the logging table, use the log_namespace/log_title pair. That's the reliable way to get old log entries. If you're concerned about speed, there's an alternate view to the logging table called "logging_ts_alternative" that you can use (it simply omits certain redacted rows rather than checking their permissions, or something).
There should be some decent info about MySQL/queries/database access on the wiki: https://wiki.toolserver.org.
MZMcBride
toolserver-l@lists.wikimedia.org