Jsoup is participating 2021 OSC China Open Source Project Selection , please vote for it!
Jsoup 2021 OSC China Open Source Project Selection {{projectVoteCount} has been obtained in, please vote for it!
2021 OSC China Open Source Project Selection It is in hot progress. Come and vote for your favorite open source project!
2021 OSC China Open Source Project Selection>>> Midfield Review
Jsoup won the 2021 OSC China Open Source Project Selection "The Best Popularity Project" !
Authorization Agreement MIT
development language Java View source code »
operating system Cross platform
Software type Open source software
Open source organizations nothing
region Unknown
deliverer sweet potato
intended for unknown
Recording time 2010-01-31

Software Introduction

Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving APIs, which can be used through DOM, CSS JQuery To retrieve and manipulate data.

This site uses jsoup to parse HTML.

The main functions of jsoup are as follows:

  1. Parse HTML from a URL, file or string;

  2. Use DOM or CSS selectors to find and retrieve data;

  3. Operable HTML elements, attributes, and text;

Jsoup is released based on MIT protocol and can be used in commercial projects with confidence.

Sample code:

 File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", " http://example.com/ "); Element content = doc.getElementById("content"); Elements links = content.getElementsByTag("a"); for (Element link : links) {   String linkHref = link.attr("href");   String linkText = link.text(); }

Online Javadoc: http://tool.oschina.net/apidocs/apidoc?api=jsoup -1.6.3

Expand to read the full text

code

Gitee index of is
exceed Items for

comment

Click to join the discussion 🔥 (33) Post and join the discussion 🔥
Published information
2023/12/30 10:14

Jsoup 1.17.2 release, Java HTML parser

Jsoup 1.17.2 has been released. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download Specific updates include: improving Attribute object accessors: adding Element.attribute (String) and Attributes. attribute (String) to make it easier to obtain Attribute objects. 2069 Attribute source tracking: If source tracking is turned on

zero
two
Published information
2023/11/27 14:11

Jsoup 1.17.1 release, Java HTML parser

Jsoup 1.17.1 has been released, which supports request level authentication, attribute name and value source range, stream() iterative support, and a large number of other improvements and error fixes. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download Specific updates include: improving Request Level Authentication: adding request level authentication to Jsoup. connect()

two
one
Published information
2023/10/21 10:19

Jsoup 1.16.2 release, Java HTML parser

Jsoup 1.16.2 has been released. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download Specific updates include: Improvements optimized the performance of complex CSS selectors by adding a cost based query planner. Evaluators are sorted according to their relative execution costs and executed in the order of lowest cost to highest cost. This is done by ensuring that more complex evaluations (such as attribute regular expressions or using: has

one
two
Published information
2023/05/06 07:31

Jsoup 1.16.1 release, Java HTML parser

Jsoup 1.16.1 has been released. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download Specific updates include: Improvements In Jsoup.connect (String url), the native support includes URLs with Unicode characters in the path or query string, without the need for the caller to escape# 1914 Calling Node. remove() on a node without a parent node is now not feasible

three
three
Published information
2023/02/21 07:02

Jsoup 1.15.4 release, Java HTML parser

Jsoup 1.15.4 has been released, including some improvements, especially when pretty printing HTML; And the correction of some errors. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download Specific updates include: Improvements added the function of escaping CSS selectors (tag, ID, classes) to match elements that do not follow the regular CSS syntax. For example, to press the class name<

zero
three
Published information
2022/08/25 07:02

Jsoup 1.15.3 release, Java HTML parser

Jsoup 1.15.3 has been released, including security fixes for potential XSS attacks, as well as other improvements, including more descriptive validation error messages. Jsoup is a Java library for processing real world HTML. It uses the best HTML5 DOM methods and CSS selectors to provide a very convenient API for extracting and manipulating data. Download address: https://jsoup.org/download The specific update content includes: safely fixed that if SafeList.preserveRelativeLinks is enabled, the jsoup cleaner may incorrectly clean up the carefully crafted XSS attempts

two
ten
Published information
2020/03/03 07:27

The Java HTML parser jsoup released 1.13.1, significantly improving the parsing speed

Jsoup 1.13.1 has been released, and notable improvements include: the parsing speed is significantly improved compared with 1.12. x, new features are added to selectors, the problem of Mark Invalid exception is fixed, and many other improvements. Jsoup is the best Java HTML parser (sweet potato authentication). It uses the best HTML5 DOM methods and CSS selectors, providing a very convenient API for extracting and processing data. Feel the code: Document doc = Jsoup.connect(" https://en.wikipedia.org/ ").get(); log(doc.title()); Elements newsHeadlines ...

ten
twenty-eight
Published information
2019/05/13 10:40

Jsoup 1.12.1 is released. It is the best Java HTML parser

Jsoup 1.12.1 has been released. This version contains many usability improvements, improves the parsing speed and memory efficiency, and fixes many bugs. Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving API, which can fetch and operate data through DOM, CSS and operation methods similar to JQuery. Download address: Download The complete improvement record is as follows: Changes Change: removed deprecated method to disable TLS cert checking in Connection.validateTLSCertificates

thirteen
fifty-six
Published information
2018/04/16 07:50

Jsoup 1.11.3, Java HTML parser

Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving API, which can fetch and operate data through DOM, CSS and operation methods similar to JQuery. The main functions of jsoup are as follows: parse HTML from a URL, file or string; Use DOM or CSS selectors to find and retrieve data; Operable HTML elements, attributes, and text; Jsoup is released based on MIT protocol and can be used in commercial projects with confidence. This update: Improve CDATA sections are now treated as whitespace pre

thirteen
twenty-nine
Published information
2017/11/20 14:29

Jsoup 1.11.2, Java HTML parser

Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving API, which can fetch and operate data through DOM, CSS and operation methods similar to JQuery. The main functions of jsoup are as follows: parse HTML from a URL, file or string; Use DOM or CSS selectors to find and retrieve data; Operable HTML elements, attributes, and text; Jsoup is released based on MIT protocol and can be used in commercial projects with confidence. This update content: improved Added a new pseudo selector: matchText, which al

five
forty-one
Published information
2017/11/06 09:17

Jsoup 1.11.1 is released, the strongest Java HTML parser

Jsoup 1.11.1 has been released, which reduces DOM memory usage by 30%, increases streaming network HTML parsing, faster HTML generation, and a large number of improvements and bug fixes. Download address: https://jsoup.org/download Improve when loading content from a URL or a file, the content is now parsed as it streams in from the network or disk, rather than being fully buffered before parsing. This substantially reduces memory consumption & large garbage objects...

fourteen
thirty-three
Published information
2017/06/12 12:01

Jsoup 1.10.3 release, Java HTML parser

Jsoup 1.10.3 has been released. This version has brought better CSS selector performance, improved Jsoup.Connection, and other bug fixes. Details include: Improvements Added Elements. eachText() and Elements. eachAttr(), which return a list of an Element's text or attribute values, specifically. This makes it simple to for example get a list of each URL on a page: List<String>urls=doc. select ("a"). eachAttr ("abs: href" "); Improved selector va...

eleven
twenty-six
Published information
2017/01/05 09:46

Jsoup 1.10.2 release, Java HTML parser

Jsoup 1.10.2 has been released. This version brings faster startup time, extends DOM tree traversal, improves HTTP compatibility, and fixes some bugs. Details include: Improvements Improved startup time, specifically on Android, by reducing garbage generation and CPU execution time when loading the HTML entity files. About 1.72x faster in this area. Added Element. is (query) to check if an element matches this CSS query Added new methods to El...

eighteen
twenty-seven
Published information
2016/10/24 00:00

Jsoup 1.10.1 release, Java HTML parser

Jsoup 1.10.1 was released. Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving API, which can fetch and operate data through DOM, CSS and operation methods similar to JQuery. The updated content is as follows: improved support for extended HTML entities, including supplemental characters and multiple character references. Also reduced memory consumption of the entity tables. Added support for *|E wildcard n...

seven
forty-four
Published information
2016/05/18 00:00

Jsoup 1.9.2 release, Java HTML parser

Jsoup 1.9.2 was released. The improvements include: improvements: 1 In XML documents, detect the charset from the XML prolog --<? Xml encoding="UTF-8"?>Bug fix 1 Fixed an issue where tag names that contained non-ascii characters but started with an ascii character would cause the parser to get stuck in an infinite loop. 2. Fixed an issue where API created XML documents would have an incorrect prolog. 3. Fixed an iss...

ten
fifty-nine
Published information
2016/04/18 00:00

Jsoup 1.9.1 release, HTML parser

Jsoup 1.9.1 is released. Update log: improvements: Added support for HTTP and SOCKS request proxies, specific per connection. See Connection. proxy (String, int). Added support for sending plain HTTP request bodies in POST and PUT requests, with Connection. requestBody (String) Added support in Jsoup.Connect() for HEAD, OPTIONS, and TRACE. Added support for HTTP 307 Temporary Redirect (replays posts, if appl...

nine
forty-five
Published information
2015/08/03 00:00

Jsoup 1.8.3 release, HTML parser

Jsoup 1.8.3 was released. The main improvements in this version include: some performance improvements in parsing large HTML files; Automatically switch to the XML parser when fetching XML documents; Important bug fixes. Updated content: Improve Performance improvement on parsing larger HTML pages On Android KitKat, around 1.7x times faster. On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of various websites; also from furt...

thirty-one
seventy-seven
Published information
2015/04/15 00:00

Jsoup 1.8.2 release, HTML parser

Jsoup 1.8.2 was released. This version improves the performance of Android, HTML parsing, HTML generation, query, etc. At the same time, it has added file upload, W3C DOM interoperability and other functions, as well as other improvements and bug fixes. Updating content improves the performance of Android parsing HTML, improves the performance of Android HTML serialization, speeds up the encoding of character sets on Android, improves the performance of selector classes on Android, and supports file uploading Add a meta charset element to documents when setting the character set Added ability to disable

twenty-two
seventy-two
Published information
2014/09/28 00:00

The release of jsoup 1.8.1 has greatly improved the performance!

Jsoup 1.8.1 has been released! Jsoup 1.8.1 significantly improves the performance of text and tree serialization; You can select HTML or XML output; There are also a lot of function improvements and bug fixes. This version is now available for download. The updated content is as follows: HTML or XML output can be selected for improvement. By default, HTML Element.text() performance improvement Element.html() performance improvement shortens the time for reading files. At the same time, the file parser is also improved. The speed of Element.cssSelector() is increased by about 10%. Add Element.cssSelector() Tightened the scope of what characters are escaped i

thirty-eight
ninety-four
Published information
2013/11/11 00:00

Jsoup 1.7.3 release, super strong HTML parser

Jsoup has just released version 1.7.3, which improves form processing, more reliable character set detection, performance improvement of CSS selectors and parsing, and memory optimization, and fixes some bugs. Jsoup is a Java HTML parser that can directly parse a URL address and HTML text content. It provides a set of very labor-saving API, which can fetch and operate data through DOM, CSS and operation methods similar to JQuery. The main functions of jsoup are as follows: parse HTML from a URL, file or string; Use DOM or CSS selectors to find and retrieve data; Operable HTML elements, attributes, text

twenty-two
one hundred and six
No more
Loading failed, please refresh the page
Click to load more
Loading
next page
{{o.pubDate | formatDate}}

{{formatAllHtml(o.title)}}

{{parseInt(o.replyCount) | bigNumberTransform}}
{{parseInt(o.viewCount) | bigNumberTransform}}
No more
No content temporarily
Issued a question and answer
{{o.pubDate | formatDate}}

{{formatAllHtml(o.title)}}

{{parseInt(o.replyCount) | bigNumberTransform}}
{{parseInt(o.viewCount) | bigNumberTransform}}
No more
No content temporarily
No content temporarily
thirty-three comment
1K Collection
 OSCHINA
Log in to view more high-quality content
 Back to top
Top