Scalabium Software

SMExport advertising
Knowledge for your independence'.
Home Delphi and C++Builder tips


#162: How can I extract the plain text from html-formatted string?

Today I want to publish small procedure that extract the plain text from html-formatted string

function StripHTMLTags(const strHTML: string): string;
var
  P: PChar;
  InTag: Boolean;
  i, intResultLength: Integer;
begin
  P := PChar(strHTML);
  Result := '';

  InTag := False;
  repeat
    case P^ of
      '<': InTag := True;
      '>': InTag := False;
      #13, #10: ; {do nothing}
      else
        if not InTag then
        begin
          if (P^ in [#9, #32]) and ((P+1)^ in [#10, #13, #32, #9, '<']) then
          else
            Result := Result + P^;
        end;
    end;
    Inc(P);
  until (P^ = #0);

  {convert system characters}
  Result := StringReplace(Result, '&quot;', '"',  [rfReplaceAll]);
  Result := StringReplace(Result, '&apos;', '''', [rfReplaceAll]);
  Result := StringReplace(Result, '&gt;',   '>',  [rfReplaceAll]);
  Result := StringReplace(Result, '&lt;',   '<',  [rfReplaceAll]);
  Result := StringReplace(Result, '&amp;',  '&',  [rfReplaceAll]);
  {here you may add another symbols from RFC if you need}
end;


Published: April 6, 2004

See also
 
ABA Picture Convert
Excel Reader (dll)
ABA Database Convert
SMImport suite
Clarion Viewer
Paradox to Text converter
SMDBGrid
DBExport tools
dBase Viewer
Mail parser (ActiveX)
 
 


Contact to webmaster

 

Borland Software Code Gear Scalabium Delphi tips

Copyright© 1998-2015, Scalabium Software. All rights reserved.
webmaster@scalabium.com

SMExport advertising