I would like some feedback on the issue of how to allow API users to
prove who they are without using a cookie (some clients simply do not
support them), but instead pass all relevant information in the
URL/POST.
The login api module returns userID, userName, and userToken - all
necessary parts of a cookie. The client should be able to pass those
values in the URL, which should override the browser cookie (or lack
thereof), and instead resume the session specified.
The $_SESSION object gets initialized based on the cookie before the
php code starts. In order to resume the session, I could set
$_SESSION['wsUserID'], $_SESSION['wsUserName'], $_SESSION['wsToken']
to the URL values, and set $wgUser = User::newFromSession() before any
other operations.
Does this introduce any security risks? Is there another way to solve this?
Thanks!
@jra:
In principle, I agree with you. Matching of personal interests
(formulated -for example- in queries to search engines) with -for
example- insurance data would be an issue for interested parties.
But, it is not the type of business we are interested in. Some day, we
might charge commercial users for online information management. Some
other day we might add value-adding services like delivery of fulltext
articles etc.). May be, will try to sell context-related books through
Amazon Apis (This feature was already actice and it could be set agin in
the near future). Basic Searching and Storing of search results and
bookmarks (personal Web Collections) will remain forever a free feature
for private persons.
We are not logging IPs and we are not creating hidden personal user
profiles. Contrarily, MESHine lets users query Google and the other
search engines privately. The only IP that is transmitted to search
engine providers is the one of MESHine itself.
And:
1. It is not necessary at all to register to use MESHine only for searching
2. To be able to store and/or re-publish searches and/or results, it is
necessary to register with valid email adress only. No additional
personal data is required.
3. Thus, personal privacy is kept. Users can decide deliberately what
(if any) personal information will be disclosed to what public or
private users or user groups.
To this point in time (and I say: unfortunately) it seems to me that
this (big-brother-thing) is not an issue at all, since MESHine is used
by / known to only very few people....(yes, I know: too much colors,
buggy/ugly styles and CSS, too slow, low quantity and quality of
user-generated content much, too small audience, etcetcetc..)
Cheers,
Alex Hoelzel, http://www.meshine.info
Alexander Hölzel
CEO EUTROPA AG
===========================
EUTROPA Aktiengesellschaft
Oelmüllerstrasse 9, D-82166 Gräfelfing,
Tel 089 87130900, Fax 089 87130902
===========================
Hi,
many thanks for your suggestions.
Very slow "copy to tmp_table" seems to be a known MySQL issue. See: http://forums.mysql.com/read.php?21,82045,82045#msg-82045
MySQL does seem to have problems with (fulltext) index creation
I found a solution (or better: a workaround) at:
http://peter-zaitsev.livejournal.com/11772.html :
quot from http://peter-zaitsev.livejournal.com/11772.html:
...
Create table of the same structure without keys, load data into it to get correct .MYD,
Create table with all keys defined and copy over .frm and .MYI files from it, followed by
FLUSH TABLES. Now you can use
REPAIR TABLE to rebuild all keys by sort, including UNIQUE keys
...
I did exactly that, and it worked !!
Issuing REPAIR LOCAL TABLE <xx>wiki.text<xxx> QUICK; after creating-renaming (empty) .frm and .myi files
gave following results:
frwiki:
frwiki.textns0 | repair | warning | Number of rows changed from 0 to 794523
frwiki.textns0 | repair | status | OK
2 rows in set (31 min 3.00 sec)
dewiki:
dewiki.textns0 | repair | warning | Number of rows changed from 0 to 1050514
dewiki.textns0 | repair | status | OK
2 rows in set (45 min 16.31 sec)
enwiki:
enwiki.textns0 | repair | warning | Number of rows changed from 0 to 3940175
enwiki.textns0 | repair | status | OK
2 rows in set (2 hours 27 min 2.13 sec)
(Number of rows differs (is smaller) from that of full dumps as only
pages of namespace=0 where processed. For my project I need only to
analyse pages with ns=0)
Thus, it seems that -to this point- a small computer is enough to come
to convenient indexing performance. (My development server is only a 1,7
Ghz Pentium M, 512 RAM)
@Simetrical:
>Make sure there's more than enough room in your temp folder
There was plenty of room in my temp folder. So, that does not seem to be
the problem.
Alex Hoelzel, http://www.meshine.info
Alexander Hölzel
CEO EUTROPA AG
===========================
EUTROPA Aktiengesellschaft
Oelmüllerstrasse 9, D-82166 Gräfelfing,
Tel 089 87130900, Fax 089 87130902
===========================
yurik(a)svn.wikimedia.org wrote:
> Revision: 24673
> Author: yurik
> Date: 2007-08-08 15:12:08 +0000 (Wed, 08 Aug 2007)
>
> Log Message:
> -----------
> Revert r24668; needed to prevent accidental API exposure by novice administrators.
In theory, read-only mode should now be safe for all standard
installations (eg, requiring authentication if in restricted read mode).
What's the remaining issue that requires this?
-- brion
> In the interest of not duplicating similar efforts, might I ask what
your project is?
I am a medical doctor with a long term "hoppyist" faible for information
technology and its application for the improvement of medical products
and services. I dont know of any business then health care where it can
be of such a vital importance to have valuable information at hands in a
timely manner. Google Health and other approaches do not really satisfy
me. So I decided to do something on my own.
I designed a multi-language terminology-enhanced search engine plus
online information management system for (professionals in) Medicine and
Health Care which I called MESHine. You can find an early alpha-version
at: http://www.meshine.info.
MESHine lets you search Google, Yahoo and/or Pubmed/Medline (Google
Scholar, Scirus) and harvest results directly into your personal
infosphere, where you can store, collect, annotate personal information
collections. Metatags (MeSH terms, keywords) can be reused
instantanously to re-search again using the above search engines. If you
like, you can republish your Web Collections to the public or selected
user groups. Moreover, browsing the web, MESHine bookmarklet lets you
add more links to existing or new Web Collections (just like delicio.us
or other bookmarking services).
MESHine is an alternative browser for the MeSH thesaurus. It helps you
to choose the right search words from the MeSH in your language
(currently in english, german, french and italian). Basically, you can
browse the MeSH Thesaurus, harvest appropriate search terms (with
boolean operators) by simple box-checking into the search-box and send
your query to the Web-Apis of Google, Yahoo and Pubmed/Medline which
will return links to valuable information in the medical field. In
principle, you dont have to type a single word (from medical
terminology) by hand but you build up your complex searches by simply
clicking them together...(sorry for my english..)
Now, browsing the MeSH, MESHine is designed to display links and
definitions from Wikipedia, which relate to the actual MeSH term.
Everytime a MeSH term is clicked, content is retrieved from Wikipedia
via a rather low-performing simple scraping mechanism for Wikipedia
pages. Thus, Web Collections of related Wikipedia articles are
continously created and updated. More (always 10) related Wikipedia
links can be retrieved and added to the actual Wikipedia Collection a
1-click mechanism which displays next Google results that relate to the
actual MeSH term (Web Collection).
You will have noticed that scraping Wikipedia pages in real time results
in prolonged waiting periods and user annoyance...For the reason I
decided to get the Wikipedia dumps and do the whole
MeSH-to-Wikipedia-Mapping offline. This approach has many advantages: A
fully MeSH-interlinked Wikipedia will be available at once. Relevance
ranking can be done based on transparent algorithms. Medical Wikipedia
becomes semantic..Professional information from bibliographic databases
meets/is explained/abridged by/ Wikipedia content. etcetc
So, what I am going to do now is the following:
1. Get actual Wikiedia-dumps (done)
2. Create full Wikipedia content-images by importing the dumps into
Mediawiki (done)
3. Modify existing Mediawiki tables, so that they fit the needs,
creating additional tables if necessary. (thats what I am currently doing)
4. Map MeSH thesaurus entirely on Wikipedia, getting all articles that
contain MeSH terms and/or their term-families (broader/wider/neighbour
terms) in title, text, category, wikilink or link.
5. Thus, getting an entirely new view on Wikipedia content, structured
by hierarchical controlled medical vocabulary (MeSH), ordered by
relevancy. If you will: a Medical Wikipedia is created.
(Help !!! Who is going to help me ? ;-) ) Step 4 will take me some
weeks of continous computer processing, especially if step 3 is not made
successfully, i.e.the indexes (!) are not set properly (and timely).
Any suggestions ? Any cooperations ?
Alex Hoelzel, http://www.meshine.info
Alexander Hölzel
CEO EUTROPA AG
=========================
EUTROPA Aktiengesellschaft
Oelmüllerstrasse 9, D-82166 Gräfelfing,
Tel 089 87130900, Fax 089 87130902
=========================
On 08/08/07, bugzilla-daemon(a)mail.wikimedia.org
<bugzilla-daemon(a)mail.wikimedia.org> wrote:
> Duesentrieb was suggesting last time he saw sources the stats were checking
> only for "[[" string presence
Indeed; Article::isGoodArticle() or whatever it's called checks for
the presence of "[[", whereas the stats initialisers are more
explicit, and check for at least one incoming internal link.
Rob Church
Hello to this list !
I downloaded enwiki,dewiki,frwiki and itwiki dumps and imported them
correctly into Mediawiki. Table text is the table that contains the
actual wikipedia pages (field old_text) and it was created as MyISAM for
adding fulltext search capabilities. dewiki.text contains 1.248.933 rows
of > 2,8 GB size.
I issued this command on mysql (v5.0.45) command line: ALTER TABLE
dewiki.text ADD FULLTEXT (old_text);
This command is running now for approx. 20 hours (!) not showing up any
errors. Apparently MySQL server is up and running. (W2kSP4; MySQL
v5.0.45; Intel Pentium M 1,7 GHz, 512 RAM; mysql-nt.exe is using 152.760
kb constantly)
There is only 1 active thread: Command: Query; Time 74059; State: copy
to tmp_table; Info: ALTER TABLE dewiki.text ADD FULLTEXT (old_text);
If interested, see MySQL variables set in this recent post to this list:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/3287…
I know, this is not a Wikitech question in strict sense. Cannot find
sufficient answers for optimizing MySQL for large files as Wikipedia's
in Mysql forums or elsewhere...
Any suggestions welcome !
Alex Hoelzel, http://www.meshine.info
Alexander Hölzel
CEO EUTROPA AG
===========================
EUTROPA Aktiengesellschaft
Oelmüllerstrasse 9, D-82166 Gräfelfing,
Tel 089 87130900, Fax 089 87130902
===========================
Hi Rob,
>If you only need to use MyISAM for full-text matching, then use it for
the "searchindex" table, and use InnoDB for the other tables.
Why should I use InnoDB in this case ? I do not plan to use
transactions. Wikipedia tables are only processed / matched on a single
computer with just very few (if any) concurrent connections.
Not considering transactions: What are the main advantages of InnoDB: Is
it faster, is it more stable, is it space-keeping, is it better for
large files/databases like the Wikipedia dumps ... ??
Alex, http://www.meshine.info
(very interested in your answers..)
Alexander Hölzel
CEO EUTROPA AG
============================
EUTROPA Aktiengesellschaft
Oelmüllerstrasse 9, D-82166 Gräfelfing,
Tel 089 87130900, Fax 089 87130902
============================
Hello,
I have been maintaining my company's wiki for some time now. The number
one concern that has been expressed throughout the wiki's existence is
that numbered lists cannot be restarted.
For example, numbered lists are a perfect solution for procedure
documentation. However, oftentimes it is necessary to insert a <pre>,
<div>, or even wiki-syntax table between two consecutive steps. In the
current wiki implementation, Parser.php restarts the numbered list back
at #1 after such an interjection. Am I missing something? Is there a
mechanism to embed these (and possible other) non-numbered list items
in a numbered list?
If there currently isn't a way to do so, then I'd like to see if there
is currently any effort to hack/redesign the Parser to accept some new
syntax (backwards compatible, of course) to allow for starting a
numbered list at any user defined value? And if there isn't such an
effort already, then I'd like to give it a try myself (I've looked at
the code already and have a pretty good idea on how to do it).
Thanks in advance!
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.
http://mobile.yahoo.com/go?refer=1GNXIC