Extract H1, H2, H3 heading tags from a HTML page using a regular expression
To heading tags from a string, you can use the following regular expression.
<cfset regex = "(<h1.*?>)(.*?)(</h1>)">
<cfset tx = reFindNoCase(regex, mystring, 1, true)>
<cfif tx.pos[1] gt 0>
<cfset theResult = mid(myString, tx.pos[3], tx.len[3])>
<cfoutput>H1 is: #theResult#</cfoutput>
</cfif>
Alternatively, you could use <cfset regex = "(<h[0-9]{1}.*?>)(.*?)(</h[0-9]{1}>)"> which would match all H1, H2, H3 etc and you could then loop over the complete page to extract them all.
