-
Parse HTML from a URL, file or string; -
Use DOM or CSS selectors to find and retrieve data; -
Operable HTML elements, attributes, and text;
File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", " http://example.com/ "); Element content = doc.getElementById("content"); Elements links = content.getElementsByTag("a"); for (Element link : links) { String linkHref = link.attr("href"); String linkText = link.text(); }