Package fr.inria.edelweiss.extractor.webpage

Package implementing the web page extractor as a very simple set of methods to extract text and metadata from wab pages. here is a simple example on hox to use it

See:
          Description

Class Summary
Anchor reprensents the HTML element <a> and provides the getters for its attributes.
ContentBlock parent of all the HTML elements with a geter for the default text value / content.
Embedded reprensents the HTML elements <object> and <embed> and provides the getters for their attributes.
Header reprensents the HTML elements <h1> ...
Image reprensents the HTML element <img> and provides the getters for its attributes.
Link reprensents the HTML element <link> and provides the getters for its attributes.
Meta reprensents the HTML element <meta> and provides the getters for its attributes.
Paragraph reprensents the HTML element <p> and <div> and any piece of text.
WebPageExtractor Class providing methods to extract and access metadata (HTTP header and HTML meta and link tags) and text content of web pages.
 

Package fr.inria.edelweiss.extractor.webpage Description

Package implementing the web page extractor as a very simple set of methods to extract text and metadata from wab pages. here is a simple example on hox to use it