Extract H1, H2, H3 heading tags from a HTML page using a regular expression

To heading tags from a string, you can use the following regular expression.

<cfset myString = "<h1>Welcome to my site</h1>">
<cfset regex = "(<h1.*?>)(.*?)(</h1>)">
<cfset tx = reFindNoCase(regex, mystring, 1, true)>
<cfif tx.pos[1] gt 0>
     <cfset theResult = mid(myString, tx.pos[3], tx.len[3])>
     <cfoutput>H1 is: #theResult#</cfoutput>
</cfif>

Alternatively, you could use <cfset regex = "(<h[0-9]{1}.*?>)(.*?)(</h[0-9]{1}>)"> which would match all H1, H2, H3 etc and you could then loop over the complete page to extract them all.

Validate an email address in ColdFusion using regular expressions (regexp)

To validate an email address in ColdFusion, we use the regular expression shown below.

You could easily add this to a CFC if you wanted to return a true or false depending whether the email address validates or not.

<cfset myEmail = "test@test.com">
<CFIF REFindNocase("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.(([a-z]{2,3})|(aero|coop|info|museum|name))$", trim(myEmail))>
Valid email address
<cfelse>
Oops, email address does not validate
</cfif>
 

Regular Expression (regexp) to remove all HTML tags from a string

To remove all HTML tags from a string, you can use the following regular expression.

This could be used to stop HTML being injected into URL values. Or even just for creating plain text content from a HTML page.

<cfset myString = "<a href='/link.cfm'>Link to my page</a>">
<cfset myString = reReplaceNoCase(myString, "</?\w+(((\s|\n)+\w+((\s|\n)*=(\s|\n)*(?:#chr(34)#.*?#chr(34)#|'.*?'|[^'#chr(34)#>\s]+))?)+(\s|\n)*|(\s|\n)*)/?>", "", "All")>
<cfoutput>There should be no hyperlink: #myString#</cfoutput>

Remove hyperlinks ( A tags ) from HTML - ColdFusion regular expression

If ever you want to remove hyperlinks from a HTML page - here you go.

This code assumes a variable has been created called myFile that has either been pulled from CFHTTP, a stored local file or from the ColdFusion page output:-

<cfset myFile = "<a href='www.mysite.com'>Click here for my site</a>">
<cfset myFile = ReReplaceNoCase(myFile, "(<a.*?>)(.*?)(</a>)", "\2", "all")>
<cfoutput>Replaced content: #myFile#</cfoutput>

Extract HTML page title - ColdFusion regular expression

To extract a page title from a HTML page that has either been pulled from CFHTTP, a stored local file or from the ColdFusion page output:-

<cfset pageTitle = "">
<cfset RegExp = REFindNoCase("(<title[>])(.*)(<\/title>)", myFile, 1, True)>
<cfif RegExp.len[1] gt 0>
  <cfset pageTitle = mid(myFile, RegExp.pos[3], RegExp.len[3])>
</cfif>
BlogCFC was created by Raymond Camden. This blog is running version 5.9.5.002. Contact Blog Owner