Extract H1, H2, H3 heading tags from a HTML page using a regular expression

To heading tags from a string, you can use the following regular expression.

<cfset myString = "<h1>Welcome to my site</h1>">
<cfset regex = "(<h1.*?>)(.*?)(</h1>)">
<cfset tx = reFindNoCase(regex, mystring, 1, true)>
<cfif tx.pos[1] gt 0>
     <cfset theResult = mid(myString, tx.pos[3], tx.len[3])>
     <cfoutput>H1 is: #theResult#</cfoutput>
</cfif>

Alternatively, you could use <cfset regex = "(<h[0-9]{1}.*?>)(.*?)(</h[0-9]{1}>)"> which would match all H1, H2, H3 etc and you could then loop over the complete page to extract them all.

TweetBacks
Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
BlogCFC was created by Raymond Camden. This blog is running version 5.9.5.002. Contact Blog Owner