Raw HTML Output from a MediaWiki Parser Function

From Jimbojw.com

Jump to: navigation, search

In a previous article I described how to get Raw HTML output from an extension tag. Here, I'll describe how to do the same for Parser Function.


Example Extension

Consider this simple parser function which provides a way to inject arbitrary raw HTML into a page.

$wgExtensionFunctions[] = 'setupRawHTML';
$wgHooks['LanguageGetMagic'][] = array( $this, 'setupRawHTMLMagic' );
function setupRawHTML( ) {
    global $wgParser;
    $wgParser->setFunctionHook( 'rawhtml', 'renderRawHTML' );
function setupRawHTMLMagic( &$magicWords, $langCode='en' ) {
    $magicWords['rawhtml'] = array( 0, 'rawhtml' );
function renderRawHTML( &$parser, $input='' ) {
    return $input;

It is intended to be called like this: {{#rawhtml:HTML}} where HTML is any markup to be passed straight through.

A seasoned extension developer will immediately notice that this won't work as written. When the output of a parser function's implementing method is a String, the resulting text is treated as wikimarkup and thus is subject to the same restrictions as regular text. So since renderRawHTML() returns a String, it will not be rendered as raw html.

First Attempt

Fortunately, the MediaWiki Parser has a workaround. If the return value of the implementing method is an Array, it can specify that the output is raw HTML and shouldn't be rendered further. Here's the updated method, returning an Array with the appropriate options:

function renderRawHTML( &$parser, $input='' ) {
    return array( $input, noparse => true, isHTML => true );

We can test this new version with the text '{{#rawhtml:<a href="http://mahalo.com/">Mahalo</a>}}', which renders:

<a href="http://mahalo.com/">Mahalo</a>

Exactly as expected!

New Problem

However, due to a hardcoded "\n\n" which is prepended to the HTML output of parser functions, using this option will break lists (which are processed later during the parse).

To demonstrate, consider this wiki markup:

# Check out this site: {{#rawhtml:<a href="http://mahalo.com/">Mahalo</a>}}
# It's really cool.

When rendered, the newlines force the above markup to be interpreted as this:

# Check out this site: 
<a href="http://mahalo.com/">Mahalo</a>
# It's really cool.

Which means that instead of getting a single list with two elements, we get two lists with one element each, separated by a link. Clearly this is not what was intended.

The Solution

Fortunately, since MediaWiki's codebase is so open and flexible, it is possible to bypass this newline insertion by doing some of the Parser's work ourselves.

Consider this new version of the renderRawHTML() method:

function renderRawHTML( &$parser, $input='' ) {
    return $parser->insertStripItem( $input, $parser->mStripState );

The Parser's insertStripItem() method adds a new raw HTML string ($input) to the strip state, then returns a unique string which will be replaced with the HTML during a later parsing step.

Now that the implementing method is returning a String again instead of an Array, the output is treated as wikitext, but it doesn't matter. The unique string is guaranteed to not contain any reserved wikitext characters, and thus will sail through subsequent parsing steps untouched.

Final Thoughts

In the final analysis, it was MediaWiki's hardcoding of the newline characters that caused the problem. In a less flexible architecture, this would have been an insurmountable failing-point, the only resolution to which would have been hacking the core (a very Bad Thing).

However, because MediaWiki passes many of its internal objects around by reference, and because the members and methods of those objects are public, many things that would otherwise be stopping points become merely inconveniences.

This is the true beauty of the MediaWiki architecture - not that it's a monolithic testament to software design principles, but that it contains the healthy balance of structure and pragmatism. It may not do exactly what you want out-of-the-box, but it's often easy to get it to do so (with enough time and cleverness, of course).

Enjoy! As always, I will be happy to answer any questions.

Got something to say?

Leave a comment
Sorry, comments are disabled.

or, read what others have said ...