Wikitech-l August 2007

wikitech-l@lists.wikimedia.org

88 participants
94 discussions

Designing coockie-less login for API
by Yuri Astrakhan 10 Aug '07

10 Aug '07

I would like some feedback on the issue of how to allow API users to prove who they are without using a cookie (some clients simply do not support them), but instead pass all relevant information in the URL/POST. The login api module returns userID, userName, and userToken - all necessary parts of a cookie. The client should be able to pass those values in the URL, which should override the browser cookie (or lack thereof), and instead resume the session specified. The $_SESSION object gets initialized based on the cookie before the php code starts. In order to resume the session, I could set $_SESSION['wsUserID'], $_SESSION['wsUserName'], $_SESSION['wsToken'] to the URL values, and set $wgUser = User::newFromSession() before any other operations. Does this introduce any security risks? Is there another way to solve this? Thanks!

4 3

Re: [Wikitech-l] OT: medical search engines and personal privacy
by MESHine Team 10 Aug '07

10 Aug '07

@jra: In principle, I agree with you. Matching of personal interests (formulated -for example- in queries to search engines) with -for example- insurance data would be an issue for interested parties. But, it is not the type of business we are interested in. Some day, we might charge commercial users for online information management. Some other day we might add value-adding services like delivery of fulltext articles etc.). May be, will try to sell context-related books through Amazon Apis (This feature was already actice and it could be set agin in the near future). Basic Searching and Storing of search results and bookmarks (personal Web Collections) will remain forever a free feature for private persons. We are not logging IPs and we are not creating hidden personal user profiles. Contrarily, MESHine lets users query Google and the other search engines privately. The only IP that is transmitted to search engine providers is the one of MESHine itself. And: 1. It is not necessary at all to register to use MESHine only for searching 2. To be able to store and/or re-publish searches and/or results, it is necessary to register with valid email adress only. No additional personal data is required. 3. Thus, personal privacy is kept. Users can decide deliberately what (if any) personal information will be disclosed to what public or private users or user groups. To this point in time (and I say: unfortunately) it seems to me that this (big-brother-thing) is not an issue at all, since MESHine is used by / known to only very few people....(yes, I know: too much colors, buggy/ugly styles and CSS, too slow, low quantity and quality of user-generated content much, too small audience, etcetcetc..) Cheers, Alex Hoelzel, http://www.meshine.info Alexander Hölzel CEO EUTROPA AG =========================== EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 ===========================

1 0

Re: [Wikitech-l] Creation of fulltext index for mediawiki, tables is slow
by MESHine Team 10 Aug '07

10 Aug '07

Hi, many thanks for your suggestions. Very slow "copy to tmp_table" seems to be a known MySQL issue. See: http://forums.mysql.com/read.php?21,82045,82045#msg-82045 MySQL does seem to have problems with (fulltext) index creation I found a solution (or better: a workaround) at: http://peter-zaitsev.livejournal.com/11772.html : quot from http://peter-zaitsev.livejournal.com/11772.html: ... Create table of the same structure without keys, load data into it to get correct .MYD, Create table with all keys defined and copy over .frm and .MYI files from it, followed by FLUSH TABLES. Now you can use REPAIR TABLE to rebuild all keys by sort, including UNIQUE keys ... I did exactly that, and it worked !! Issuing REPAIR LOCAL TABLE <xx>wiki.text<xxx> QUICK; after creating-renaming (empty) .frm and .myi files gave following results: frwiki: frwiki.textns0 | repair | warning | Number of rows changed from 0 to 794523 frwiki.textns0 | repair | status | OK 2 rows in set (31 min 3.00 sec) dewiki: dewiki.textns0 | repair | warning | Number of rows changed from 0 to 1050514 dewiki.textns0 | repair | status | OK 2 rows in set (45 min 16.31 sec) enwiki: enwiki.textns0 | repair | warning | Number of rows changed from 0 to 3940175 enwiki.textns0 | repair | status | OK 2 rows in set (2 hours 27 min 2.13 sec) (Number of rows differs (is smaller) from that of full dumps as only pages of namespace=0 where processed. For my project I need only to analyse pages with ns=0) Thus, it seems that -to this point- a small computer is enough to come to convenient indexing performance. (My development server is only a 1,7 Ghz Pentium M, 512 RAM) @Simetrical: >Make sure there's more than enough room in your temp folder There was plenty of room in my temp folder. So, that does not seem to be the problem. Alex Hoelzel, http://www.meshine.info Alexander Hölzel CEO EUTROPA AG =========================== EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 ===========================

1 0

YSlow
by Magnus Manske 09 Aug '07

09 Aug '07

Just thought I'd share this with you: I have installed the "YSlow" Firefox extension to find performance hogs on some of my pages for another project. For fun, I ran it on [[:en:Bezel]]. Something to think about: These components are not gzipped: (0.6K) http://en.wikipedia.org/skins-1.5/common/shared.css?90 (27.5K) http://en.wikipedia.org/skins-1.5/monobook/main.css?90 (5.2K) http://en.wikipedia.org/skins-1.5/common/commonPrint.css?90 (39.8K) http://en.wikipedia.org/skins-1.5/common/wikibits.js?90 (4.5K) http://en.wikipedia.org/skins-1.5/common/ajax.js?90 (4.4K) http://en.wikipedia.org/skins-1.5/common/ajaxwatch.js?90 These components do not have a far future Expires header: (no expires) http://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Linguistics_stub.s… (no expires) http://upload.wikimedia.org/wikipedia/en/1/18/Monobook-bullet.png This page has 9 external JavaScript files. This page has 7 external StyleSheets. So, I'll try to sum up: * css/js should be gzipped * images should have expiry date Would that speed up anything? Probably on the first page load only... Magnus

4 7

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [24673] trunk/phase3/config/index.php
by Brion Vibber 08 Aug '07

08 Aug '07

yurik(a)svn.wikimedia.org wrote: > Revision: 24673 > Author: yurik > Date: 2007-08-08 15:12:08 +0000 (Wed, 08 Aug 2007) > > Log Message: > ----------- > Revert r24668; needed to prevent accidental API exposure by novice administrators. In theory, read-only mode should now be safe for all standard installations (eg, requiring authentication if in restricted read mode). What's the remaining issue that requires this? -- brion

1 0

Re: [Wikitech-l] RE-Thanks: Re: enwiki mwdumper slows down, memory problems, (David A. Desrosiers)
by MESHine Team 08 Aug '07

08 Aug '07

> In the interest of not duplicating similar efforts, might I ask what your project is? I am a medical doctor with a long term "hoppyist" faible for information technology and its application for the improvement of medical products and services. I dont know of any business then health care where it can be of such a vital importance to have valuable information at hands in a timely manner. Google Health and other approaches do not really satisfy me. So I decided to do something on my own. I designed a multi-language terminology-enhanced search engine plus online information management system for (professionals in) Medicine and Health Care which I called MESHine. You can find an early alpha-version at: http://www.meshine.info. MESHine lets you search Google, Yahoo and/or Pubmed/Medline (Google Scholar, Scirus) and harvest results directly into your personal infosphere, where you can store, collect, annotate personal information collections. Metatags (MeSH terms, keywords) can be reused instantanously to re-search again using the above search engines. If you like, you can republish your Web Collections to the public or selected user groups. Moreover, browsing the web, MESHine bookmarklet lets you add more links to existing or new Web Collections (just like delicio.us or other bookmarking services). MESHine is an alternative browser for the MeSH thesaurus. It helps you to choose the right search words from the MeSH in your language (currently in english, german, french and italian). Basically, you can browse the MeSH Thesaurus, harvest appropriate search terms (with boolean operators) by simple box-checking into the search-box and send your query to the Web-Apis of Google, Yahoo and Pubmed/Medline which will return links to valuable information in the medical field. In principle, you dont have to type a single word (from medical terminology) by hand but you build up your complex searches by simply clicking them together...(sorry for my english..) Now, browsing the MeSH, MESHine is designed to display links and definitions from Wikipedia, which relate to the actual MeSH term. Everytime a MeSH term is clicked, content is retrieved from Wikipedia via a rather low-performing simple scraping mechanism for Wikipedia pages. Thus, Web Collections of related Wikipedia articles are continously created and updated. More (always 10) related Wikipedia links can be retrieved and added to the actual Wikipedia Collection a 1-click mechanism which displays next Google results that relate to the actual MeSH term (Web Collection). You will have noticed that scraping Wikipedia pages in real time results in prolonged waiting periods and user annoyance...For the reason I decided to get the Wikipedia dumps and do the whole MeSH-to-Wikipedia-Mapping offline. This approach has many advantages: A fully MeSH-interlinked Wikipedia will be available at once. Relevance ranking can be done based on transparent algorithms. Medical Wikipedia becomes semantic..Professional information from bibliographic databases meets/is explained/abridged by/ Wikipedia content. etcetc So, what I am going to do now is the following: 1. Get actual Wikiedia-dumps (done) 2. Create full Wikipedia content-images by importing the dumps into Mediawiki (done) 3. Modify existing Mediawiki tables, so that they fit the needs, creating additional tables if necessary. (thats what I am currently doing) 4. Map MeSH thesaurus entirely on Wikipedia, getting all articles that contain MeSH terms and/or their term-families (broader/wider/neighbour terms) in title, text, category, wikilink or link. 5. Thus, getting an entirely new view on Wikipedia content, structured by hierarchical controlled medical vocabulary (MeSH), ordered by relevancy. If you will: a Medical Wikipedia is created. (Help !!! Who is going to help me ? ;-) ) Step 4 will take me some weeks of continous computer processing, especially if step 3 is not made successfully, i.e.the indexes (!) are not set properly (and timely). Any suggestions ? Any cooperations ? Alex Hoelzel, http://www.meshine.info Alexander Hölzel CEO EUTROPA AG ========================= EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 =========================

2 1

Re: [Wikitech-l] [Bug 10834] site_stats.ss_good_articles and site_stats. ss_total_pages not synchronized with the real count
by Rob Church 08 Aug '07

08 Aug '07

On 08/08/07, bugzilla-daemon(a)mail.wikimedia.org <bugzilla-daemon(a)mail.wikimedia.org> wrote: > Duesentrieb was suggesting last time he saw sources the stats were checking > only for "[[" string presence Indeed; Article::isGoodArticle() or whatever it's called checks for the presence of "[[", whereas the stats initialisers are more explicit, and check for at least one incoming internal link. Rob Church

1 0

Creation of fulltext index for mediawiki tables is slow
by MESHine Team 08 Aug '07

08 Aug '07

Hello to this list ! I downloaded enwiki,dewiki,frwiki and itwiki dumps and imported them correctly into Mediawiki. Table text is the table that contains the actual wikipedia pages (field old_text) and it was created as MyISAM for adding fulltext search capabilities. dewiki.text contains 1.248.933 rows of > 2,8 GB size. I issued this command on mysql (v5.0.45) command line: ALTER TABLE dewiki.text ADD FULLTEXT (old_text); This command is running now for approx. 20 hours (!) not showing up any errors. Apparently MySQL server is up and running. (W2kSP4; MySQL v5.0.45; Intel Pentium M 1,7 GHz, 512 RAM; mysql-nt.exe is using 152.760 kb constantly) There is only 1 active thread: Command: Query; Time 74059; State: copy to tmp_table; Info: ALTER TABLE dewiki.text ADD FULLTEXT (old_text); If interested, see MySQL variables set in this recent post to this list: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/3287… I know, this is not a Wikitech question in strict sense. Cannot find sufficient answers for optimizing MySQL for large files as Wikipedia's in Mysql forums or elsewhere... Any suggestions welcome ! Alex Hoelzel, http://www.meshine.info Alexander Hölzel CEO EUTROPA AG =========================== EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 ===========================

3 3

Why use innodb in this case ? RE:Re: RE-Thanks: Re: enwiki mwdumper slows down, , memory problems, (David A. Desrosiers)
by MESHine Team 08 Aug '07

08 Aug '07

Hi Rob, >If you only need to use MyISAM for full-text matching, then use it for the "searchindex" table, and use InnoDB for the other tables. Why should I use InnoDB in this case ? I do not plan to use transactions. Wikipedia tables are only processed / matched on a single computer with just very few (if any) concurrent connections. Not considering transactions: What are the main advantages of InnoDB: Is it faster, is it more stable, is it space-keeping, is it better for large files/databases like the Wikipedia dumps ... ?? Alex, http://www.meshine.info (very interested in your answers..) Alexander Hölzel CEO EUTROPA AG ============================ EUTROPA Aktiengesellschaft Oelmüllerstrasse 9, D-82166 Gräfelfing, Tel 089 87130900, Fax 089 87130902 ============================

2 1

Specifying starting value of a numbered list
by Paul Grinberg 06 Aug '07

06 Aug '07

Hello, I have been maintaining my company's wiki for some time now. The number one concern that has been expressed throughout the wiki's existence is that numbered lists cannot be restarted. For example, numbered lists are a perfect solution for procedure documentation. However, oftentimes it is necessary to insert a <pre>, <div>, or even wiki-syntax table between two consecutive steps. In the current wiki implementation, Parser.php restarts the numbered list back at #1 after such an interjection. Am I missing something? Is there a mechanism to embed these (and possible other) non-numbered list items in a numbered list? If there currently isn't a way to do so, then I'd like to see if there is currently any effort to hack/redesign the Parser to accept some new syntax (backwards compatible, of course) to allow for starting a numbered list at any user defined value? And if there isn't such an effort already, then I'd like to give it a try myself (I've looked at the code already and have a pretty good idea on how to do it). Thanks in advance! ____________________________________________________________________________________ Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC

9 20

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l August 2007