Scalabium Software

SMExport/SMImport suites

Knowledge for your independence'.
Home Delphi and C++Builder tips


#162: How can I extract the plain text from html-formatted string?

Today I want to publish small procedure that extract the plain text from html-formatted string

function StripHTMLTags(const strHTML: string): string;
var
  P: PChar;
  InTag: Boolean;
  i, intResultLength: Integer;
begin
  P := PChar(strHTML);
  Result := '';

  InTag := False;
  repeat
    case P^ of
      '<': InTag := True;
      '>': InTag := False;
      #13, #10: ; {do nothing}
      else
        if not InTag then
        begin
          if (P^ in [#9, #32]) and ((P+1)^ in [#10, #13, #32, #9, '<']) then
          else
            Result := Result + P^;
        end;
    end;
    Inc(P);
  until (P^ = #0);

  {convert system characters}
  Result := StringReplace(Result, '&quot;', '"',  [rfReplaceAll]);
  Result := StringReplace(Result, '&apos;', '''', [rfReplaceAll]);
  Result := StringReplace(Result, '&gt;',   '>',  [rfReplaceAll]);
  Result := StringReplace(Result, '&lt;',   '<',  [rfReplaceAll]);
  Result := StringReplace(Result, '&amp;',  '&',  [rfReplaceAll]);
  {here you may add another symbols from RFC if you need}
end;


Published: April 6, 2004

See also
 
Fast Document Viewer
Excel Reader (dll)
SMDBGrid
DBISAM Password Recovery
ExcelFile Viewer
ABA Document Convert
Protected Storage Viewer
Viewer for MS Outlook Messages
Paradox ActiveX
Database Information Manager
 
 
Contact to webmaster

 

Borland Software Code Gear Scalabium Delphi tips

Copyright© 1998-2015, Scalabium Software. All rights reserved.
webmaster@scalabium.com

SMReport Autogenerated