Using HTML Purifier in Code Igniter to clean user generated content

21st May 2009

[caption id="attachment_196" align="alignleft" width="80" caption="HTML PUrifier"]HTML PUrifier[/caption]

Accepting user-generated content into your web application is one of the most risky parts of web application development.

Apart from the possibility of the user input containing HTML tags which may break your design and layout, there is also a wide range of security exploits, often via JavaScript, opened as soon as HTML user input is permitted.

Luckily for PHP users, HTMLPurifier is a class which goes a long way toward solving this issue.

HTMLPurifier is a mature, open-source PHP class library designed to clean up and sanitize HTML input. It is continually developed and tested to ensure any newly discovered exploits and vulnerabilities are also secured against.

It works by using a whitelist approach and both removes XSS vulnerabilities and returns standards compliant, 'safe' HTML output. It also has a huge array of configuration options to enable it to be used in a varierty of situations.

As many of you already know, CodeIgniter is a high-performance, flexible PHP framework which aids rapid application development. To keep things simple, CodeIgniter requires class libraries to be in a specific format. This means the HTMLPurifier must me modified to work with CI.

Part 1: Adding HTML Purifier as library to CodeIgniter

This section is detailed in the blog post by Ortz, making html purifier work in CodeIgniter.

Download the latest version of the HTML Purifier librarys. Put the contents of the HTMLPurifier library folder into the Libraries folder in your CodeIgniter application folder.

Now go to HTMLPurifier.includes.php and comment out the line: require ‘HTMLPurifier.php’; So that it now reads: //require ‘HTMLPurifier.php’;

Then go to the file called HTMLPurifier.php and add this snippet on line 2, just under the 'load->library(‘HTMLPurifier’);

Now that we have the HTMLPurifier library working in our CodeIgniter installation we can implement it in our application.

Part 2: Using HTML Purifier in a CodeIgniter project

We then need to follow the guidance of Rdjs to implement the comment santizing in CodeIgniter.

Rjds's article on validating comments in CodeIgniter is excellent. He shows how to add the code to CodeIgniter and implement it to clean comments submitted by users.

`function cleanComment($dirtyHtml) { // load the config and overide defaults as necessary $config = HTMLPurifier_Config::createDefault(); $config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); $config->set('HTML', 'AllowedElements', 'a,em,blockquote,p,strong,pre,code'); $config->set('HTML', 'AllowedAttributes', 'a.href,a.title'); $config->set('HTML', 'TidyLevel', 'light');

// run the escaped html code through the purifier $cleanHtml = $this->htmlpurifier->purify($dirtyHtml, $config); return $cleanHtml; }`

The cleanComment function returns clean user input in a string. The tags which match the items passed in the 'AllowedElements' config option remain while all other tags are removed.