In a recent discussion on the #mediawiki IRC channel, a user came on with a common problem - there were leading empty lines in the recent changes RSS feed. Something like this (I've put underscores where the newlines are for readability):
_ _ _ <?xml version="1.0" encoding="utf-8"?>
When this happens, it's nearly always due to an extension with trailing whitespace at the end of the PHP file - like this (using underscores again for emphasis):
?> _
Since this is a common problem, I crafted a solution the quickest way possible - using Ruby:
#!/usr/bin/ruby Dir.glob( '**/*.php' ) do |file| puts file if IO.read(file).match( /\?>\s{2,}\Z/m) end
To run this as a one-liner, cd to your MediaWiki (or other PHP) directory and paste this on the command line:
ruby -e 'Dir.glob( "**/*.php" ) { |file| puts file if IO.read(file).match( /\?>\s{2,}\Z/m) }'
When I mentioned this solution in IRC - I was scoffed at to the tune of "not everyone has Ruby". So here's an alternative version using PHP's command line interface (CLI):
<?php
foreach (glob('**/*.php') as $file){
if (preg_match('/\\?'.'>\\s\\s+\\Z/m',file_get_contents($file)))
echo("$file\n");
}
?>
Again, here's the same script as a one-liner, suitable for pasting into your shell.
I'm tempted to throw together a patch which adds this in a new PHP file in MediaWiki's maintenance directory. That way admins have a resource at hand for finding out what's broken in their distribution.
Note: I only have my Ubuntu linux box available right now, so I haven't been able to test in Windows. If someone out there can confirm that either or both of these work in a Windows Command Prompt, please leave me a comment - I'd love to hear about it!
As always, I'll be happy to answer any questions. Enjoy!Update!
I've created a file suitable for inclusion in maintenance - however it appears this will never be included in the core as there are plans to implement a comparable check during normal page-load operation. Oh well. Teaches me for being proactive :(
maintenance/findBadFiles.php:
<?php /** * Simple script to identify 'bad' PHP files - those with trailing * whitespace which often break functionality. * * @addtogroup Maintenance * * @author Jim R. Wilson (wilson.jim.r<at>gmail.com) * @copyright Copyright © Jim R. Wilson * @license http://www.opensource.org/licenses/mit-license.php MIT */ # This is a command line script include('commandLine.inc'); # Find all those bad files and call them out! foreach (glob($IP . '/**/*.php') as $file){ if (preg_match('/\\?'.'>\\s\\s+\\Z/m',file_get_contents($file))) echo("$file\n"); } ?>
Note: Comments are now closed.
Ack - MediaWiki has garbled the code - here's the corrected version:
# remove trailing blank lines at the end of a file
find . -name '*.php' -exec sed -i -e :a -e '/^\n*$/{$d;N;ba' -e '}' '{}' \;
# remove trailing spaces from individual lines
find . -name '*.php' -exec sed -i 's/[[:blank:]]*$//g' '{}' \;
Thanks to the many friendly people on Freenode who have helped me with this.
PS: I would share this on MW.org, but I don't quite know where to put it...
--Ace_NoOne 14:39, 30 June 2007 (MST)
Another very common problem is extra whitespace at the *start* of php files - especially editors inserting invisibly UTF BOM bytes. If you want to make a maintenance tool, it would be good to check for this issue too.
--Duesentrieb 16:49, 30 June 2007 (MST)
No worries, Ace - glad to help!
Dues, yeah I can see how that would also be an issue. I'd be happy to create such a script, but the idea was already shot down by robchurch and Simetrical.
I learned my lesson though - no writing of patches without assurance that they're likely to be committed!
--Jimbojw 08:02, 2 July 2007 (MST)
What should we do about files that lack a trailing '?>' all together? Should we strip them from our files for consistency?
--Dan 09:47, 1 October 2007 (MST)
Hi Dan,
Regarding trailing '?>' characters at the ends of PHP files: this is perfectly fine. The closing symbols are optional and have been for a while now.
That said, it's usually a good idea to omit them where possible since this minimizes risk of accidental whitespace injection.
--Jimbojw 23:17, 1 October 2007 (MST)
Hey Jim,
Thanks for this!
Not knowing you were dealing with this as well, I've come up with an sed-based solution:
# remove trailing blank lines at the end of a file find . -name '*.php' -exec sed -i -e :a -e '/^\n*$/{$d;N;ba' -e '}' '{}' \; # remove trailing spaces from individual lines find . -name '*.php' -exec sed -i 's/[[:blank:]]*$//g' '{}' \;This is sub-optimal, of course (both commands could certainly be merged into a single line), but it worked for me (needed two iterations to be safe though).
/Ace
--Ace_NoOne 14:22, 30 June 2007 (MST)