How to find PHP files with trailing whitespace

From Jimbojw.com

Jump to: navigation, search

In a recent discussion on the #mediawiki IRC channel, a user came on with a common problem - there were leading empty lines in the recent changes RSS feed. Something like this (I've put underscores where the newlines are for readability):

_
_
_
<?xml version="1.0" encoding="utf-8"?>

When this happens, it's nearly always due to an extension with trailing whitespace at the end of the PHP file - like this (using underscores again for emphasis):

?>
_

Since this is a common problem, I crafted a solution the quickest way possible - using Ruby:

#!/usr/bin/ruby
 
Dir.glob( '**/*.php' ) do |file|
  puts file if IO.read(file).match( /\?>\s{2,}\Z/m)
end

To run this as a one-liner, cd to your MediaWiki (or other PHP) directory and paste this on the command line:

ruby -e 'Dir.glob( "**/*.php" ) { |file| puts file if IO.read(file).match( /\?>\s{2,}\Z/m) }'

When I mentioned this solution in IRC - I was scoffed at to the tune of "not everyone has Ruby". So here's an alternative version using PHP's command line interface (CLI):

<?php
foreach (glob('**/*.php') as $file){
  if (preg_match('/\\?'.'>\\s\\s+\\Z/m',file_get_contents($file)))
    echo("$file\n");
}
?>

Again, here's the same script as a one-liner, suitable for pasting into your shell.

echo '<?php foreach (glob("**/*.php") as $file){if (preg_match( "/\\?".">\\s\\s+\\Z/m", file_get_contents($file))) echo("$file\n");} ?>' | php

I'm tempted to throw together a patch which adds this in a new PHP file in MediaWiki's maintenance directory. That way admins have a resource at hand for finding out what's broken in their distribution.

Note: I only have my Ubuntu linux box available right now, so I haven't been able to test in Windows. If someone out there can confirm that either or both of these work in a Windows Command Prompt, please leave me a comment - I'd love to hear about it!

As always, I'll be happy to answer any questions. Enjoy!

Update!

I've created a file suitable for inclusion in maintenance - however it appears this will never be included in the core as there are plans to implement a comparable check during normal page-load operation. Oh well. Teaches me for being proactive :(

maintenance/findBadFiles.php:

<?php
/**
 * Simple script to identify 'bad' PHP files - those with trailing
 * whitespace which often break functionality.
 * 
 * @addtogroup Maintenance
 *
 * @author Jim R. Wilson (wilson.jim.r<at>gmail.com)
 * @copyright Copyright © Jim R. Wilson
 * @license http://www.opensource.org/licenses/mit-license.php MIT
 */
 
# This is a command line script
include('commandLine.inc');
 
# Find all those bad files and call them out!
foreach (glob($IP . '/**/*.php') as $file){
    if (preg_match('/\\?'.'>\\s\\s+\\Z/m',file_get_contents($file))) echo("$file\n");
}
?>

Comments

Note: Comments are now closed.

Ace_NoOne said ...

Hey Jim,

Thanks for this!

Not knowing you were dealing with this as well, I've come up with an sed-based solution:

# remove trailing blank lines at the end of a file
find . -name '*.php' -exec sed -i -e :a -e '/^\n*$/{$d;N;ba' -e '}' '{}' \;
# remove trailing spaces from individual lines
find . -name '*.php' -exec sed -i 's/[[:blank:]]*$//g' '{}' \;

This is sub-optimal, of course (both commands could certainly be merged into a single line), but it worked for me (needed two iterations to be safe though).


/Ace

--Ace_NoOne 14:22, 30 June 2007 (MST)

Ace_NoOne said ...

Ack - MediaWiki has garbled the code - here's the corrected version:

# remove trailing blank lines at the end of a file
find . -name '*.php' -exec sed -i -e :a -e '/^\n*$/{$d;N;ba' -e '}' '{}' \;
# remove trailing spaces from individual lines
find . -name '*.php' -exec sed -i 's/[[:blank:]]*$//g' '{}' \;

Thanks to the many friendly people on Freenode who have helped me with this.

PS: I would share this on MW.org, but I don't quite know where to put it...

--Ace_NoOne 14:39, 30 June 2007 (MST)

Duesentrieb said ...

Another very common problem is extra whitespace at the *start* of php files - especially editors inserting invisibly UTF BOM bytes. If you want to make a maintenance tool, it would be good to check for this issue too.

--Duesentrieb 16:49, 30 June 2007 (MST)

Jimbojw said ...

No worries, Ace - glad to help!

Dues, yeah I can see how that would also be an issue. I'd be happy to create such a script, but the idea was already shot down by robchurch and Simetrical.

I learned my lesson though - no writing of patches without assurance that they're likely to be committed!

--Jimbojw 08:02, 2 July 2007 (MST)

Dan said ...

What should we do about files that lack a trailing '?>' all together? Should we strip them from our files for consistency?

--Dan 09:47, 1 October 2007 (MST)

Jimbojw said ...

Hi Dan,

Regarding trailing '?>' characters at the ends of PHP files: this is perfectly fine. The closing symbols are optional and have been for a while now.

That said, it's usually a good idea to omit them where possible since this minimizes risk of accidental whitespace injection.

--Jimbojw 23:17, 1 October 2007 (MST)