Scalabium Software

SMExport advertising

Knowledge for your independence'.
Home Delphi and C++Builder tips


#162: How can I extract the plain text from html-formatted string?

Today I want to publish small procedure that extract the plain text from html-formatted string

function StripHTMLTags(const strHTML: string): string;
var
  P: PChar;
  InTag: Boolean;
  i, intResultLength: Integer;
begin
  P := PChar(strHTML);
  Result := '';

  InTag := False;
  repeat
    case P^ of
      '<': InTag := True;
      '>': InTag := False;
      #13, #10: ; {do nothing}
      else
        if not InTag then
        begin
          if (P^ in [#9, #32]) and ((P+1)^ in [#10, #13, #32, #9, '<']) then
          else
            Result := Result + P^;
        end;
    end;
    Inc(P);
  until (P^ = #0);

  {convert system characters}
  Result := StringReplace(Result, '&quot;', '"',  [rfReplaceAll]);
  Result := StringReplace(Result, '&apos;', '''', [rfReplaceAll]);
  Result := StringReplace(Result, '&gt;',   '>',  [rfReplaceAll]);
  Result := StringReplace(Result, '&lt;',   '<',  [rfReplaceAll]);
  Result := StringReplace(Result, '&amp;',  '&',  [rfReplaceAll]);
  {here you may add another symbols from RFC if you need}
end;


Published: April 6, 2004

See also
 
Protected Storage Viewer
Viewer for MS Outlook Messages
ABA Document Convert
Database Information Manager
Viewer for TNEF-files (winmail.dat)
SMDBGrid
Paradox ActiveX
Excel Reader (dll)
SMExport suite
Paradox to MS Access converter
 
 
Contact to webmaster

 

Borland Software Code Gear Scalabium Delphi tips

Copyright© 1998-2014, Scalabium Software. All rights reserved.
webmaster@scalabium.com

SMReport Autogenerated