please dont rip this site

Strip HTML tags with ASP

This code quickly strips out any HTML tags in a string. It does NOT require a regular expression and so runs quite a bit faster, especially on shorter strings. It works by first replacing all the "<" (which are only present at the start of a new HTML tag) with "><". It does this so that there is a single, consistant character to split the string on, while still leaving the "<" to identify the sections that are HTML. It then splits the string on ">" which cuts each section just before the html tags (as indicated by the "<") and at the end of each tag (as indicated by the closeing ">"). It then filters the resulting array, removing any element that contains a "<". This will be all the elements that were an html tag. The final operation is to simply re-join all the remaining elements, which are the text.

Example: "<i>this is <b>a <a href='test.html'>test</b></a>" (note that the html need not be correct or have matching closing tags.)

><i>this is ><b>a ><a href='test.html'>test></b>></a> (after replacing all "<" with "><")

|<i|this is |<b|a |<a href='test.html'|test|</b||</a| (after splitting on ">"; the | character is used to show the elements of the array)

|this is |a |test|| (after filtering out all the elements with a "<")

this is a test (after joining the remaining elements)

function StripHTML(ByRef asHTML)
	StripHTML = join(filter(split(replace(asHTML, "<", "><"),">"),"<", false))
	End function

You may also want to remove excessive whitespace with:

	set regex = New RegExp
	regex.pattern = "\s+"
	regex.Global = True   ' Set global applicability.
	asHTML = regEx.Replace(asHTML, " ")

And possibly process common strings such as:

	asHTML=replace(asHTML,"&nbsp;"," ")

Comments:

See also:


file: /Techref/language/asp/striphtml.htm, 2KB, , updated: 2008/12/25 13:47, local time: 2017/12/12 10:24,
TOP NEW HELP FIND: 
54.196.201.241:LOG IN

 ©2017 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?
Please DO link to this page! Digg it! / MAKE! / 

<A HREF="http://www.piclist.com/techref/language/asp/striphtml.htm"> Strip HTML with ASP </A>

After you find an appropriate page, you are invited to your to this massmind site! (posts will be visible only to you before review) Just type in the box and press the Post button. (HTML welcomed, but not the <A tag: Instead, use the link box to link to another page. A tutorial is available Members can login to post directly, become page editors, and be credited for their posts.


Link? Put it here: 
if you want a response, please enter your email address: 
Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.
Did you find what you needed?

  PICList 2017 contributors:
o List host: MIT, Site host massmind.org, Top posters @20171212 RussellMc, Van Horn, David, Sean Breheny, James Cameron, alan.b.pearce, IVP, Neil, Bob Blick, David C Brown, John Gardner,
* Page Editors: James Newton, David Cary, and YOU!
* Roman Black of Black Robotics donates from sales of Linistep stepper controller kits.
* Ashley Roll of Digital Nemesis donates from sales of RCL-1 RS232 to TTL converters.
* Monthly Subscribers: Gregg Rew. on-going support is MOST appreciated!
* Contributors: Richard Seriani, Sr.
 

Welcome to www.piclist.com!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  .