This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Extract text from a HTML file

Tags: None
(comma "," separated)
User avatar
JanGerrit
Moderator
Posts
647
Karma
3
OS

Extract text from a HTML file

Tue Sep 22, 2009 5:29 pm
Hello you,
I want to extract text the user sees out of a HTML file. So for example:
Code: Select all
<b>hello</b> and nice to see you
<!-- COMMENT -->

So the first line should be completely extracted and the second not. I want the script to ignore Javascript and such things. I only need the visible text with its HTML format tags. And as said, it would be nice to have a bash script which gets the filename of the file and extracts the texts into a new file.

I need some hints about the extracting, so could anyone of you help me, please?


Image
User avatar
Alec
Registered Member
Posts
565
Karma
1
OS

Re: Extract text from a HTML file

Tue Sep 22, 2009 7:22 pm
You could try one of these:

http://www.google.com/search?q=html2text


Get problems solved faster - get reply notifications through Jabber!
User avatar
JanGerrit
Moderator
Posts
647
Karma
3
OS

Re: Extract text from a HTML file

Mon Sep 28, 2009 5:55 pm
Thanks Alec, it isn't necessary anymore. But thanks for the help :)


Image


Bookmarks



Who is online

Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]