|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||
See:
Description
| Packages | |
|---|---|
| fr.inria.edelweiss.extractor.webpage | Package implementing the web page extractor as a very simple set of methods to extract text and metadata from wab pages. here is a simple example on hox to use it |
The web page extractor is a very simple set of methods to extract text and metadata from wab pages.
public static void main(String[] args) {
try {
WebPageExtractor extractor = new WebPageExtractor(new URL("http://www.inria.fr/recherche/equipes/edelweiss.en.html"));
extractor.extract();
// show internal structure
System.out.println(extractor.toString());
// show extracted text
System.out.println("========================\n"+ extractor.fullText());
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||