Hello,
Is there an easy known way to find all HTML elements with an attribute that
appear in the text of a given wiki after it's parsed?
Here's an example of something that I need:Find all elements that have the
HTML lang attribute, with any value. This would be useful for me for
collecting information about the multilingualism of Wikipedia - which
foreign languages do we incorporate in pages, how often we do it, for which
of them we may have various fonts problems, etc. This, again, must be
checked after the page is parsed - this attribute is very often inserted by
templates.
Of course, this would rely on the editors actually using this attribute,
but this is fairly common, at least in the English Wikipedia. (Among other
things we could compare its usage between projects.)
I could do this by analyzing a dump, but I've got a hunch that something
like this was already with the research that was done for Parsoid. Does
anybody know?
Thanks!
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hi everyone,
I'm Bhagya Kandage from Colombo, Sri Lanka. I'm currently reading for
Masters of Computer Science at University of Colombo.
I wish to participate in OPW through my contribution to the project
"Welcoming new contributors to Wikimedia Labs and Wikimedia Tool
Labs".
Attaching the following links for your perusal.
[1] My user page
[2] Microtask about feed back review on Main Page and Getting Started Page
[3] Project proposal for enhancement of tracking and documenting new
and current projects in Wikimedia Labs and Wikimedia Tools Labs
[1] https://www.mediawiki.org/wiki/User:Dilshak
[2] https://www.mediawiki.org/wiki/Welcome_to_Labs_-Demonstration_Work
[3] https://www.mediawiki.org/wiki/User:Dilshak/Welcoming_new_contributors_to_W…
Your comments,suggestions and improvements are highly welcomed. Thanks!
Regards,
Bhagya.
That was "fun".
Hello, everyone. The hard part of the migration is over for all but two
tools[1].
If you had not already migrated your tools manually during the migration
period that ended this Monday, the data for all your tools has now been
copied to the new datacenter, and your tools are ready to finish migrating.
There were more issues than expected during the automated migration,
however, so there may yet be kinks to work out for some tools. I am
avaliable on #wikimedia-labs to help anyone running into difficulty.
== What you need to do ==
If you already migrated your tools by hand (using migrate-tool); you're
already done and there is nothing else to do. Yeay you!
Otherwise:
(a) log into tools-login.wmflabs.org which now points at the new
datacenter (your SSH client will certainly complain about the host key
having changed -- this is normal and expected since the host /has/ changed).
(b) run "finish-migration <tool>" from your user account for any tool
you had not yet migrated. If your tool had any crontab(s) or databases
in the previous datacenter, this will restore them.
(c) check that everything is where you expect it to be. As mentionned
in the previous emails, the user name of your tools will have changed -
including those it uses to connect to the databases. If you had any of
those hardcoded in your code, you may need to revise them (the
replica.my.cnf has been automatically corrected, however, so if your
code reads the credentials from that then you're all set).
(d) if your tool had cron entries, they will currently be commented out.
Edit your crontab and uncomment them as apropriate.
(e) if your tool had a web interface, you will need to start your
webservice. This usually just needs you to "webservice start" from the
tool account unless it relied on .htaccess files -- in which case you'll
need to tweak the configuration a bit as outlined at:
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb
(f) All done.
== What next? ==
Things should settle back down to normality during the week. If you run
into issues at any point, don't hesitate to ask for help on labs-l, or
on IRC.
There were a number of bug reports filed before and during migration.
My priority is to work through those for this and the next week. If you
already reported issues, then they'll get attention soon.
For those of you who were surprised by the migration: you should
subscribe to, and /read/, the labs-l mailing list! It's low volume, and
every bit of important announcement about the Labs infrastructure goes
there.
Thank you all for your patience and collaboration during migration.
-- Marc
[1] wikidata-analysis and wikitrends, that are still being copied over
and may take up to another day to complete given the large amount of
data to copy.
Hello,
My name is Orestis Ioannou and I am a first year graduate student in the
Master's program of the University Claude Bernard Lyon1 in Lyon, France.
I have a fair knowledge of PHP and MySQL acquired from various projects
in the university as well as some web development jobs I had. My latest
projects are a multilingual website for a company in Cyprus
www.anosis.com.cy and a personal website for a statistician
www.atoumazi.com . Right now i am finishing another website for a
company in Cyprus. Development can be reviewed here:
www.ancora-services.com . You can find more about myself on my personal
website: www.oioannou.com
I am interested in the project " A system for reviewing funding
requests". I am looking for more details so i can start my proposal.
Does the project include nay font-end development? How are the reviewers
going to score each application?
Best regards,
Orestis
Hi all,
I'm Jack Phoenix [1], a MediaWiki developer whose primary interests are
social tools [2] and skinning [3]. Skinning, especially when related to
core MediaWiki, is a rather esoteric area where we don't have much
developer capacity, the skinning system is obscure and not well-documented
and as a direct result of this, MediaWiki has considerably fewer
third-party skins than, for example, phpBB or WordPress.
Although most of us can agree on the fact that the skin system needs an
overhaul, that's quite a huge project which cannot be undertaken lightly
nor without prior discussion and planning. Instead, I'm proposing a few
steps in the right direction in my Google Summer of Code (GSoC) 2014
proposal "A modern, scalable and attractive skin for MediaWiki" [4].
[1] https://www.mediawiki.org/wiki/User:Jack_Phoenix
[2] https://www.mediawiki.org/wiki/Social_tools
[3] https://www.mediawiki.org/wiki/Manual:Skinning
[4]
https://www.mediawiki.org/wiki/User:Jack_Phoenix/A_modern,_scalable_and_att…
)
Looking forward to hearing your thoughts,
--
Jack Phoenix
MediaWiki developer
Hi! I am Deepali Jain. I am an
undergraduate student at Indian
Institute of Technology Roorkee. I wish
to participate in OPW and GSoC
2014 through wikmedia. I am planning
to work on Book management in
Wikibooks/Wikisource as my project.
Here is the link to my proposal:
www.mediawiki.org/wiki/
User:Jaindeepali/Proposal . Please
have a look
at it and provide feedback. I will be
more than happy to provide any
clarification needed for the same.
Thank you.
Hi! I am Deepali Jain. I am an undergraduate student at Indian
Institute of Technology Roorkee. I wish to participate in OPW and GSoC
2014 through wikmedia. I am planning to work on Book management in
Wikibooks/Wikisource as my project. Here is the link to my proposal:
www.mediawiki.org/wiki/User:Jaindeepali/Proposal . Please have a look
at it and provide feedback. I will be more than happy to provide any
clarification needed for the same. Thank you.
Hi,
My name is Santosh Reddy. I am a final year Computer Sciene student
from IIT Ropar, India. More info about me can be found on my user page
<https://www.mediawiki.org/wiki/User:Santosh2201>.
I intend to work on a GSoC project under this organization.
I have made a proposal for the following project : "Collaborative
spelling dictionary building tool". You can find my gsoc proposal here
<https://www.mediawiki.org/wiki/User:Santosh2201/GSoC14>.
I would be happy to answer any queries or suggestion on the project
here or on my gsoc proposal
<https://www.mediawiki.org/wiki/User:Santosh2201/GSoC14>.
Thanks,
Santosh Reddy
IIT Ropar
All,
We've just finished our second sprint on the new PDF renderer. A
significant chunk of renderer development time this cycle was on non latin
script support, as well as puppetization and packaging for deployment. We
have a work in progress pipeline up and running in labs which I encourage
everyone to go try and break. You can use the following featured articles
just to see what our current output is:
* http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot
*
http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki:
* http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link
"Download as WMF PDF"; if you "Download as PDF" you'll be using the old
renderer (useful for comparison.) Additionally, you can create full books
via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old
renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)"
renderer, but that's not on the critical path. As of right now we do not
have a bugzilla project entry so reply to this email, or email me directly
-- we'll need one of: the name of the page, the name of the collection, or
the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have
to address in the coming weeks or in another sprint.
* Attribution for images and text. The APIs are done, but we still need
to massage that information into the document.
* Message translation -- right now all internal messages are in English
which is not so helpful to non English speakers.
* Things using the <cite> tag and the Cite extension are not currently
supported (meaning you won't get nice references.)
* Tables may not render at all, or may break the renderer.
* Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get
this into beta labs for general testing and connect test.wikipedia.org up
to our QA hardware for load testing. The major blocker there is acceptance
of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal
aptitude package source. This is not quite as easy as it sounds, we already
use TexLive 2009 in production for the Math extension and we must apply
thorough tests to ensure we do not introduce any regressions when we update
to the 2012 package. I'm not sure what actual dates for those migrations /
testing will be because it greatly depends on when Ops has time. In the
meantime, our existing PDF cluster based on mwlib will continue to serve
our offline needs. Once our solution is deployed and tested, mwlib
(pdf[1-3]) will be retired here at the WMF and print on demand services
will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid
deployment model -- using trebuchet to push out a source repository
(services/ocg-collection) that has the configuration and node dependencies
built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it
wouldn't have been possible without the (probably exasperated) help from
Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their
work, and Gabriel for some head thunking. C. Scott and I are not quite off
the hook yet, as indicated by the list above, but hopefully soon enough
we'll be enjoying the cake and cookies from another new product launch.
(And yes, even if you're remote if I promised you cookies as bribes I'll
ship them to you :p)
~Matt Walker
Hi folks,
I've added my proposal for the OPW. I'm still fleshing out the details at the moment, so welcome any input. This is a great opportunity for me to apply my experience with data import to something I'm becoming very interested in - Linked Data & Semantic Web.
Project page: https://www.mediawiki.org/wiki/User:Zahara/OPW_Proposal_Round_8
My page: https://www.mediawiki.org/wiki/User:Zahara
I still need to find a suitable microtask, so any suggestions for that also welcome!
Cheers,
Ali