Comments on Raw HTML Output from a Parser Extension
Contents |
Hi Peleg,
It sounds like the execution of the decoding step is being skipped. I have a few questions about your setup, which may help:
Thanks in advance - I hope we can figure this out!
--Jimbojw 08:17, 14 May 2007 (MST)
Heya,
I also had a problem running this, but found out that:
To get this last point sorted I suppose some way to switch off the cache (at least for the page viewed) is needed.
--Joris Hilhorst 06:01, 22 May 2007 (MST)
small update (feel free to edit) from the mediawiki FAQ: this worked for me:
global $wgTitle; wfPurgeSquidServers(array($wgTitle->getInternalURL())); $wgTitle->invalidateCache();
--Joris Hilhorst 06:17, 22 May 2007 (MST)
Hi Joris,
Thanks for taking the time to comment! I'd like to address your concerns:
> the parser doesn't use the tidy hook unless you define $wgUseTidy = true; in LocalSettings.php
I believe $wgUseTidy is set to true by default - I've never had to enable it as you suggest, however I see what you mean that having this set to false would cause the Parser to skip over the 'ParserAfterTidy' hook.
A better solution would probably be:
> On just refreshing a page, the page will be cached, so edit/save and then the page should have reloaded/ your output should be visible.
That's correct. The Raw HTML is being stored in the parsed page cache along with everything else.
If you require the extension to be "run" on every page view - there's a better solution than simply disabling the cache.
In the article Doing more with MediaWiki parser extensions, I discuss a technique which uses the 'OutputPageBeforeHTML' hook to execute code on every page view, without violating the cache.
An admitted drawback of this solution is that it is still affected by Squid caching (or so I understand). If you're operating in a Squid cached environment, the only way to get true dynamic content may be to invalidate the Squid cache for the desired page on every page view - or pursue a JavaScript/Ajax solution where content is built up after the browser has received and rendered the initial page.
Good luck - I hope this helps!
--Jimbojw 07:21, 22 May 2007 (MST)
This didn't work when I tried it. It encoded the content properly but didn't un-encode it. However, I was able to get it to work by changing the search string in preg_replace from
'/<!-- ENCODED_CONTENT ([0-9a-zA-Z\\+]+=*) -->/esm',
to
'/<!-- ENCODED_CONTENT ([^ ]+) -->/esm',
I suppose there's some theoretical possibility that my simpler search pattern would find a false positive somewhere, but it's pretty unlikely.
--Sheldon Rampton 11:41, 13 July 2007 (MST)
Oops. Forgot to put pre tags around my code snippets. I changed
'/<!-- ENCODED_CONTENT ([0-9a-zA-Z\\+]+=*) -->/esm',
to
'/<!-- ENCODED_CONTENT ([^ ]+) -->/esm',
--Sheldon Rampton 11:44, 13 July 2007 (MST)
After thinking a little further about this technique, it seems to me that it opens up a security hole that would let a technically savvy hacker insert arbitrary HTML or Javascript into wiki pages.
Suppose, for example, that I want to insert malicious code into a page where this parser extension is being use. All I'd have to do is run base64_encode my malicious code and then paste it the encoded result into a page surrounded by your "lt;!-- ENCODED_CONTENT" comment tags. When the preg_replace executes, it would translate the malicious code back into HTML at the same time that it translates back the HTML generated by the extension.
In order to do this, of course, the hacker would need to know that your extension is using your technique.
I think you can make this code secure, though, by salting things with an encryption key that end users can't see. This could be accomplished by simply replacing the "ENCODED_CONTENT" string at the beginning of your HTML comment with "ENCODED_CONTENT_{key}" (where) {key} is your encryption key.
--Sheldon Rampton 15:06, 14 July 2007 (MST)
Hi Sheldon,
If html comments in wikitext were fed through the Parser to the point where the decoding takes place, then yes this would be a vulnerability.
However, HTML comments are stripped in wikitext and do not appear in the resultant page. You can try this yourself by putting something like "<!-- whatever -->" in a wiki page , saving the page, and then viewing the source in the final version. You'll find that the comment is gone.
In any case, even if it were passed through, this isn't a significant vulnerability since the example extension is meant to faithfully render Raw HTML anyway (which is itself the very vulnerability you found).
Hope this helps - and thanks for commenting!
--Jimbojw 08:10, 16 July 2007 (MST)
Hey - first of all, THANKS!
As for me - I have tried both this idea and the previous (with OutputPageBeforeHTML) - but my text remain encrypted...
No matter what I do, I still get something like "<!-- ENCODED_CONTENT 15TXp9ep15HXqteZINec16jXkNeZ15XXnyDX..." in my HTML source (after saving and viewing...)
Anyway, if you/someone knows what the problem could be, I'll be glad to hear.
And again - thanks!
Peleg.
--Peleg 23:50, 12 May 2007 (MST)