Semantic Web

website
Collection
zero Useful+1
zero
semantics Network is an idea of the future network Web 3.0 This concept is combined as 3.0 Network era One of the characteristics of. In short, semantics Net is a kind of intelligent network It can not only understand words and concept And can also understand the logical relationship between them, which can make communication more efficient and valuable.
The core of the Semantic Web is: web Documents on (e.g.: HTML Documents XML Document) Add semantics that can be understood by the computer metadata (Foreign language: Meta data), so that the entire Internet becomes a common information exchange medium. [1]
Chinese name
Semantic Web
Foreign name
Semantic Web
Propose
International World Wide Web Consortium
Presenter
director Tim Berners-Lee
Proposed time
1998

concept

Announce
edit
The concept of the Semantic Web is defined by World Wide Web Consortium Of Tim Berners-Lee (Tim Berners Lee) put forward a concept in 1998, which is actually based on many existing technologies, and also depends on the later integration of text and markup and knowledge representation. Its origin can even be traced back to the research of Collins, Quillian, Loftus and others in the late 1960s, as well as some theoretical achievements successively proposed by Simon, Schamk, Minsky and others in the early 1970s. Simon proposed semantic network (Semantic Network, not Semantic Web). At that time, people even invented a logic based programming language Prolog
Tim Berners-Lee In 2006 Princeton University In his speech and later interview with the media, he publicly said that it might not be appropriate for him to name this intelligent network as the Semantic Web at first, and perhaps the more accurate name should be the Data Web (foreign language: Data Web).
The Semantic Web is an intelligent network that can make judgments based on semantics to achieve barrier free communication between people and computers. It's like a giant brain, Intelligent The degree is very high, and the coordination ability is very strong. Every computer connected on the Semantic Web can not only understand words and concepts, but also understand the logical relationship between them, and can do what people do. It will liberate mankind from the heavy work of searching relevant web pages and turn users into omnipotent God. Computers in the Semantic Web can use their own intelligent software to find the information you need from the massive resources on the World Wide Web, thus developing existing information islands into a huge database.
The establishment of the Semantic Web greatly involves the field of artificial intelligence, which coincides with the concept of Web 3.0 intelligent network. Therefore, the initial implementation of the Semantic Web is also one of the important features of Web 3.0. However, to become a super brain on the network, long-term research is required, which means that the related implementation of the Semantic Web will occupy an important part of the network development process, It has continued in several network times and gradually transformed into an "intelligent network" [2]

basic feature

Announce
edit
Similar to Web 2.0, which takes the AJAX concept as an opportunity, if Web 3.0 takes the Semantic Web concept as an opportunity, there will also be a technology similar to AJAX, which will become a network standard, markup language or related processing tool to expand the World Wide Web and create the Semantic Web era. Enterprises with this technology will be the trendsetters in the network era.
The Semantic Web is different from WWW. The existing WWW is document oriented while the Semantic Web is document oriented. The Semantic Web pays more attention to computer "understanding and processing", and has certain judgment and reasoning capabilities.
The realization of the Semantic Web means that there will be a large number of intelligent individuals (programs) that are interdependent with the Semantic Web, widely existing in computers, communication tools, appliances and other items, and they will combine to form a primary intelligent network around human existence.
The Semantic Web is the expansion and extension of the WWW. It shows the bright future of the WWW and the resulting revolution of the Internet. However, the implementation of the Semantic Web still faces enormous challenges:
  • Content accessibility, i.e. based on Ontology( noumenon , the same below);
  • Ontology development and evolution, including the development of core ontology used in all fields, methods and technical support in the development process, ontology evolution and annotation version control Problems;
  • Content scalability, that is, how to manage the content of the Semantic Web in an extensible way, including how to organize, store, and find it;
  • Multilingual support;
  • Ontological Standardization

difference

Announce
edit
How to understand and judge?
The Semantic Web "is different from the existing World Wide Web. Its data is mainly for human use. The new generation WWW will provide data that can also be processed by computers, which will make a large number of intelligent services possible"; The goal of Semantic Web research activities is to "develop a series of languages and technologies that can be understood and processed by computers to express semantic information, so as to support extensive and effective automatic reasoning in the network environment".
The World Wide Web we use is actually a medium for storing and sharing images and texts. All computers can see is A pile of words Or image, its content cannot be recognized. If the information in the World Wide Web is to be processed by the computer, it must be processed into the original information that the computer can understand before it can be processed, which is quite troublesome. The establishment of the Semantic Web makes things much easier.
For example, one morning you suddenly want to travel to Hoh Xil, so you open the computer, connect to the Semantic Web, and enter "book a flight ticket to Hoh Xil at any time between 2:00 and 6:00 this afternoon". At this moment, your computer agent will first contact the agent of the airline company where you live to obtain the flight ticket information that meets your requirements, Then contact the airline booking agent to complete the order. You don't have to check the schedule online, copy and paste it, and then call or book tickets and hotels online. The software installed on your computer will automatically complete the above steps for you. All you have to do is press a few buttons with the mouse, and then wait for the person delivering the ticket to come to the door or even go directly to the airport to board.
When browsing news, the Semantic Web will label each news report, describing in detail which sentence is the author, which sentence is the lead, and which sentence is the title. So, if you type in the search engine“ Lao She You can easily find Lao She's works instead of his articles.
In a word, the Semantic Web is a more colorful and personalized network. You can give it high trust and let it help you filter out the content you don't like, making the network more like your own network. Its main differences from the ordinary World Wide Web are as follows:
1、 Different objects
The World Wide Web mainly uses HTML to express web content. Web pages using HTML tags can really express some information such as controlling the display format of web pages, which makes people think that computers can really "understand" our intentions. But in fact, HTML only focuses on the form of text expression, such as font color, size, type, etc., without considering the specific content and meaning of the text. Although there are some automatic script programs on the World Wide Web that can help people realize some functions, they cannot be used for computer interaction in an open network environment. Therefore, the World Wide Web we use is mainly for "people" to read and use. The Semantic Web is to add some semantic information that can be "understood" by computers on the World Wide Web. It is not only convenient for people to read and use, but also convenient for mutual communication and cooperation between computers. Therefore, the objects of the World Wide Web are mainly "people", while the objects of the Semantic Web are mainly "machines".
2、 Different information organization methods
Due to the different objects they face, there will naturally be great differences in the way information is organized. The World Wide Web mainly focuses on "people" when organizing information resources, and organizes network information resources according to people's thinking habits and convenience. When organizing information resources, the Semantic Web must take into account the computer's "understanding" of text content as well as their mutual exchange and communication.
3、 Different emphasis
The World Wide Web focuses on the display format and style of information, rather than the content to be displayed. For example, for more important information, the World Wide Web may display it in large font or bright color font. The semantic web is more focused on the semantic content of information, and the text with specific meaning must be annotated or interpreted.
4、 Different main tasks
The World Wide Web is mainly for people to read, communicate and use, and its main task is to release and obtain information. Share and communicate by publishing or obtaining information on the network. The main task of the Semantic Web is to communicate and share among computers, so that computers can replace people to complete part of the work, making network applications more intelligent, automated and user-friendly.
5、 Different working methods
The Semantic Web and the World Wide Web have different objects, and their working methods are naturally different. The World Wide Web is mainly for "people", so most of its work is done by people, including information collection, retrieval, sorting, sorting and analysis. By adding some semantic information that can be "understood" by computers, the Semantic Web can free people from the above-mentioned tedious work and use "intelligent agents" to help complete most of the above work. A typical example is information retrieval. Using intelligent search agents, the Semantic Web will provide people with the information they really need, instead of outputting tens of thousands of useless search results like search engines [3]

realization

Announce
edit
Although the Semantic Web is a better network, its implementation is a complex and huge project. The architecture of the Semantic Web is under construction, which mainly needs support from the following two aspects:
1、 Implementation of data network
That is, through a set of unified and perfect data standards, the web information is marked more thoroughly and in more detail, so that the semantic web can accurately identify information and distinguish the role and meaning of information
In order to make semantic web search more accurate and thorough, it is easier to judge whether information is true or false, so as to achieve the practical goal, first of all, we need to develop a standard that allows users to add metadata to web content (that is, explain detailed tags), and enable users to accurately point out what they are looking for; Then, we need to find a way to ensure that different programs can share the content of different websites; Finally, users are required to add other functions, such as adding application software.
The implementation of the Semantic Web is based on Extensible Markup Language Standard Universal Markup Language Subset of, foreign language abbreviation: XML), and Resource description framework (Abbreviation: RDF). XML is a tool used to define markup languages. Its contents include XML declarations, DTDs (document type declaration document type definitions) used to define language syntax, detailed descriptions of description tags, and documents themselves. The document itself contains tags and content. RDF is used to express the content of web pages.
2、 Search engine with semantic analysis capability
If the data network can be realized by hundreds of millions of individuals in a short time, then the semantic intelligence of the network will be realized through the efforts of human cutting-edge intelligent groups. Developing an information search engine with semantic analysis capability will become the most important step of the semantic web. This engine can understand human natural language and has certain reasoning and judgment capabilities.
Semantic search engine (foreign language: semantic search engine) and search engine with semantic analysis ability (foreign language: semantically enabled search engine) are two different things. The former is just the use of semantic network, a way of information search, while the search engine with semantic analysis ability is a kind of search engine that can understand natural language and further provide more psychological answers through computer reasoning. [4]

present situation

Announce
edit
We know that most scientific and technological innovations and breakthroughs are the recombination and updating of existing knowledge. The Semantic Web, which has the ability to intelligently evaluate the data stored in cyberspace, will inevitably provide endless resources for new scientific and technological innovations. Once this technology is widely used, its benefits are immeasurable. Therefore, the Semantic Web has become a hot field of computer research since its birth.
The W3C organization is the main driver and standard setter of the Semantic Web. Under its care, the Semantic Web technology has grown. On July 30, 2001, Stanford University held an academic conference entitled "Semantic Web Infrastructure and Applications", which was the first international conference on the Semantic Web. On July 9, 2002, the first International Semantic Web Conference was held in Italy. Since then, the Semantic Web Conference has been held once a year, forming a convention. At the same time, HP, IBM, Microsoft, Fujitsu and other large companies, Stanford University, University of Maryland University of Karlsruhe, Germany , Victoria University of Manchester, UK and other educational institutions have conducted extensive and in-depth research on semantic web technology, and developed a series of semantic web technology development and application platforms such as Jena, KAON, Racer, Pellet, information integration based on semantic web technology, and query, reasoning and ontology editing systems.
Research Status of Semantic Web in China
China also attaches great importance to the research of the semantic web. As early as 2002, the semantic web technology was National 863 Program Listed as key support projects, Tsinghua University, Southeast University Shanghai Jiaotong University Beijing University of Aeronautics and Astronautics and Renmin University of China Both are domestic research centers of semantic web and related technologies. Southeast University's research on semantic web ontology mapping has a certain international influence. The semantic web assisted ontology mining system SWARMS of Tsinghua University and the ontology engineering development platform ORIENT of Shanghai Jiaotong University represent the level of domestic semantic web research and development. The popular human-computer interaction tools are all specific applications of the semantic web, but their levels are uneven, Through some simple tests, we can see their differences. (As shown in the figure Comparison of Chinese Semantic Software)

prospect

Announce
edit
The architecture of the Semantic Web is under construction, and the current international research on this architecture has not yet formed a satisfactory and rigorous logical description and theoretical system. Chinese scholars have only made a brief introduction to this architecture on the basis of foreign research, and have not yet formed a systematic exposition.
The implementation of the Semantic Web requires the support of three key technologies: XML, RDF and Ontology. Extensible Markup Language Can make Information provider Define tags and attribute names as needed, so that the structure of the XML file can be complex to any degree. It has the advantages of good data storage format and scalability, high structure, and easy network transmission. In addition, its unique NS mechanism and the multiple data types and verification mechanisms supported by XML Schema make it one of the key technologies of the Semantic Web. The discussion on the key technologies of the Semantic Web mainly focuses on RDF and Ontology.
RDF is a language specification recommended by W3C to describe resources and their relationships. It is simple, easy to expand, open, easy to exchange, and easy to integrate. It is worth noting that RDF only defines the description method of resources, but does not define the data used to describe resources. RDF consists of three parts: RDF Data Model, RDF Schema, and RDF Syntax.

Architecture

Announce
edit
Semantic Web Hierarchy
Berners Lee proposed the architecture of the Semantic Web in 2000 and gave a brief introduction to it. The architecture has seven layers, and the functions of each layer are gradually enhanced from bottom to top.

first floor

first floor: character set Layer.
Unicode and URI. Unicode is a character set. All characters in this character set are represented by two bytes, which can represent 65536 characters, basically including characters from all languages in the world. The advantage of using Unicode as the data format is that it supports the mixing of all major languages in the world and can be retrieved at the same time. URI (Uniform Resource Identifier), that is Uniform Resource Locator , used to uniquely identify a concept or resource on the network. In the semantic web architecture, this layer is the foundation of the entire semantic web, where Unicode is responsible for processing the encoding of resources, and URI is responsible for identifying resources.

The second floor

Second layer: root Markup Language Layer.
XML+NS+xmlschema. XML is a compact Standard Universal Markup Language It combines the rich functions of standard generic markup language and the ease of use of HTML. It allows users to add arbitrary structures to documents without explaining the meaning of these structures. NS (Name Space) is the namespace, which is determined by the URI index. The purpose is to avoid different applications using the same characters to describe different things. XML Schema is a document type definition( DTD )It uses XML syntax, but is more flexible than DTD, provides more data types, and can better serve effective XML documents and provide data verification mechanisms. It is because of the flexible structure of XML, the data determinability brought by the NS indexed by URI, and the multiple data types and verification mechanisms provided by XML Schema that XML Schema has become an important part of the Semantic Web architecture. This layer is responsible for syntactically representing the content and structure of data, and separating the expression form, data structure and content of network information by using standard language.

The third floor

The third floor: Resource description framework ”Layer.
RDF +rdfschema。 RDF is a language for describing information resources on WWW. Its goal is to create a Metadata Standards A framework for coexistence. The framework can make full use of various metadata Web based data exchange and reuse. RDF solves the problem of how to use XML standard syntax to describe resource objects without ambiguity, so that the metadata information of the described resources becomes machine understandable information. If XML is regarded as a standard metadata syntax specification, RDF can be regarded as a standard metadata semantic description specification. Rdfschema uses a machine understandable system to define the vocabulary for describing resources. Its purpose is to provide a mechanism or framework for vocabulary embedding. Under this framework, multiple vocabularies can be integrated together to describe Web resources.

The fourth floor

The fourth floor: Ontological vocabulary Layer.
Ontological vocabulary ,( Foreign Languages :Ontology vocabulary)。 This layer is an abstract description of the concepts and their relationships defined on the basis of RDF (S), which is used to describe the knowledge in the application field, describe various resources and the relationships between resources, and realize the expansion of the vocabulary. At this level, users can not only define concepts but also define rich relationships between concepts.

5th to 7th floors

The fifth to seventh layers: Logic, Proof, and Trust. Logic is responsible for providing axioms and reasoning rules. Once Logic is established, it can verify resources, their relationships, and reasoning results through logical reasoning to prove its effectiveness. Through Proof exchange and digital signature , establish a certain trust relationship, so as to prove the reliability of the semantic web output and whether it meets the requirements of users.

Model definition

Announce
edit
Resource Description Framework data model ”( Foreign language: RDF Data Model A simple but powerful model is provided to describe specific resources through resources, attributes and their corresponding values. The model is defined as:
It contains a series of nodes N;
It contains a series of attribute classes P;
Each attribute has a certain value V;
The model is a triple: {node, attribute class, node or original value V};
every last data model ”( Foreign language: Data Model It can be regarded as a directed graph composed of nodes and arcs.
All the resources described in the model and the attribute values used to describe the resources can be regarded as nodes. A triple consisting of resource nodes, attribute classes, and attribute values is called RDF Statement (or RDF statement). In the model, statements can be both resource nodes and value nodes, so there is sometimes more than one node in a model. At this time, the value node used to describe the resource node itself also has attribute classes and values, and can be further refined.
RDF Schema uses a machine understandable system to define the vocabulary for describing resources. Its function is like a dictionary, which can be understood as an outline or specification. RDF Schema is used to:
Define the categories of resources and attributes;
Define the resource class to which the attribute applies and the type of attribute value;
Define the syntax of the above class declaration;
Affirming some definitions by other institutions or organizations Metadata Standards Property class of.
RDF Schema defines
Three core classes: rdf: Resource rdfs:Property、rdfs:Class;
Five core attributes: rdf: type rdfs:subClassOf、rdfs:seeAlso、rdfs:subPropertyOf、rdfs:isDefinedBy;
Four core constraints: rdfs: ConstrantResource rdfs:range、rdfs:ConstraintProperty、rdfs:domain。
RDF Syntax constructs a complete syntax system to facilitate the automatic processing of computers. It uses XML as its host language and integrates various metadata through XML syntax.
Ontology (noumenon or ontology) is originally a philosophical concept used to study the essence of the objective world. Ontology has been widely used in many fields, including computer science, electronic engineering, distance education, e-commerce, intelligent retrieval, data mining, etc. It is a document or document that formally defines the relationship between nouns. Ontology on the general Web includes classification and a set of inference rules. Classification is used to define the categories of objects and their relationships; Reasoning rules provide further functions to achieve the key goal of the Semantic Web, namely "machine understandable". The ultimate goal of ontology is to "accurately represent those implicit (or ambiguous) information".
The current understanding of ontology has not yet formed a unified definition. For example, ontology is a formal specification of shared conceptual model, which describes the semantics of concepts through the relationship between concepts; Ontology is a clear representation and description of conceptualized objects; Ontology is an explicit and formal shared conceptual specification of the domain. but Stanford University Gruber's definition has been recognized by many peers, that is, "ontology is a conceptual display specification". conceptualization (Foreign language: Conceptualization) is defined as: C=, where C represents the conceptualized object, D represents a domain, W is the set of related object states in the domain, and Rc is the set of conceptual relationships in the domain space. standard (Foreign language: Specification) is to form a unified understanding and understanding of concepts, knowledge and relationships between concepts in the field for sharing and reuse.
Ontology needs some kind of language to describe conceptualization. According to the degree of formalization of expression and description, ontology can be divided into completely informal ontology, semi informal ontology, semi formal ontology and strictly formal ontology. There are many languages that can be used to represent Ontology, some of which are based on XML syntax and used for the Semantic Web, such as XOL (Xml based Ontology exchange Language), SHOE (Simple HTML Ontology Language), OML (Ontology Markup Language), and RDF and RDF Schema (RDFS) created by W3C organizations. There are also relatively complete Ontology languages DAML (DARPA Agent Markup Language), OIL and DAML+OIL based on RDF and RDFS.
XOL is an ontology exchange language based on XML syntax and OKBC semantics. It is designed by the American Bioinformatics Academic Group for the exchange of ontology definitions between a group of heterogeneous software systems in its field. It is based on Ontolingua and OML, and combines the high-level expression of OKBC and the syntax of OML. Currently, there is no tool to support XOL ontology development, but because it uses XML syntax, XML can be used editor To create the XOL file. SHOE by University of Maryland Development, which combines machine readable semantic knowledge with HTML documents or other Web documents, and allows the design and application of ontology directly on the basis of WWW. Recently, the syntax of SHOE has shifted to XML, which enables agents to collect information about meaningful Web pages and documents, and improve the search mechanism and knowledge collection. OML was developed by Washington University, partly based on SHOE. It has four layers: OML core layer (related to the logic layer of the language); Simple OML (direct mapping of RDF and RDFS), simplified OML and standard OML.
RDF is an information description method recommended by W3C. Its purpose is to overcome the semantic limitations of XML and provide a simple schema to represent various types of resources. On the basis of RDF, RDFS establishes some basic model restrictions. RDF has strong expressive ability, but there are still some shortcomings, such as RDF does not define the mechanism of reasoning and axioms, it does not describe the inclusion features, and it does not have version control.
OIL is based on RDF. Its main advantage is to provide formal semantic reasoning based on description logic. OIL integrates three technologies: framework system, description logic and Web language based on XML and RDF syntax. The framework system adopts a method similar to object-oriented Method versus data modeling , providing modeling primitives; The description logic expresses structured knowledge, query and reasoning in a standardized way; Web language based on XML and RDF syntax provides language elements for OIL. The data objects of OIL mainly include: class definition, slot definition and axiom definition. Class definition includes definition type, class hierarchy, slot constraint or attribute constraint; The slot definition defines the binary relationship between entities, including the primitive slot def, domain, rang, inverse, and subslot of; Axiom is defined by defining some additional rules in the body, such as the extension relations between classes are disjoint, covering, intersecting, equivalent, etc.
DAML by DARPA( U.S. Department of Defense The Advanced Planning and Research Agency (APRA) is responsible for the development, trying to integrate the advantages of RDF, OIL, etc. It is based on RDF as well as OIL, and is based on the description logic. Its main goal is to develop a language that aims to express semantic relations in a machine readable manner and is compatible with current and future technologies, especially to develop a set of tools and technologies that enable Agent (Agent) programs to identify and understand information sources and achieve semantic based interoperability between Agent programs. The earliest version of DAML was DAML-ONT, but later it was closely combined with OIL to form DAML+OIL. DAML+OIL was jointly developed by the United States and the European Union in the context of DAML. It has the same goal as OIL and is the most widely used ontology language. It is an extension of RDF (S), has sufficient expressive ability (such as uniqueness, transitivity, reversibility, equivalence, etc.), has certain reasoning ability, and completely determines that knowledge representation The overall framework of language.
Of course, XML and RDF are not the only ones to implement the Semantic Web. The more important technical problem is that computers can carry out too much "thinking" and "inference". In the face of numerous and complicated problems, especially social problems, it is difficult for people to make decisions, let alone computers. Therefore, there is still a lot of work to be done to truly implement the practical Semantic Web.

Application examples

Announce
edit
various web technology Are likely to be applied to the Semantic Web (in Semantic World Wide Web In the sense of), for example:
  • DOM document object model, a set of standard interfaces for accessing XML and HTML document components
  • XPath、XLink、XPointer
  • XIncludeXML fragment XML query language XHTML
  • XML Schema,RDF(Resource Description Framework)
  • XSL,XSLTExtensible Stylesheet Language
  • SVG(Scalable Vector Graphic)
  • SMIL
  • SOAP
  • DTD
  • Microformat
  • Metadata concept

Research trends

Announce
edit
The Semantic Web is Network era Its advanced intelligent products have a wide range of applications and a bright future. The main application technologies and research trends will be introduced below.
Classic bottom-up and emerging top-down approaches. The bottom-up approach focuses on labeled information, which is represented by RDF, so it is machine-readable. From top to bottom, it focuses on using ready-made page information to extract meaningful information automatically. In recent years, each method has a certain development. The good news of the bottom-up approach comes from the statement that Yahoo search engine supports RDF and microformats. This is a win-win move for content publishers, Yahoo and consumers: publishers have the incentive to label their own information, Yahoo can use this information more effectively, and users can get better and more accurate results. Another good news comes from Dapper's statement about providing semantic web services, which allows content publishers to add semantic annotations to existing web pages. It can be expected that the more semantic tools, the easier it will be for publishers to label web pages. The development of automatic annotation tools and the increase of annotation incentives will make the bottom-up approach more attractive. Although there are tools and incentives, it is still quite difficult to make the bottom-up approach popular. In fact, today's Google technology has been able to understand those unstructured web information to a certain extent. Similarly, top-down semantic tools focus on how to deal with existing imperfect information. These methods mainly use natural language processing These methods include text analysis technology to identify specific entities (and names of people, companies, places, etc.) in the document, as well as information in specific fields Vertical search engine
The top-down technology focuses on acquiring knowledge from unstructured information, but it can also process structured information. The more bottom-up annotation technologies, the better the performance of the top-down method will be. Among the bottom-up annotation methods, there are several candidate annotation technologies, all of which are very powerful. The choice of them requires a trade-off between simplicity and completeness. The most complete method is RDF: a powerful graph based language used to represent things, attributes, and relationships between things. In short, you can think of RDF as a language that expresses facts in such a way: Alex IS human, Alex HAS a brain, and Alex IS the father of Alice, Lily, and Sofia. RDF is very powerful, but it is famous for its high recursion, accuracy and mathematization, and it is also very complex. Currently, most RDFs are used to solve data interoperability. For example, medical organizations use RDF to represent genome databases. Because the information is standardized, the original isolated databases can be queried together and compared with each other. Generally speaking, in addition to semantic meaning, the main advantage of RDF is to achieve interoperability and standardization, especially for enterprises (discussed below). Microfomats provides a simple method - CSS style - to add semantic tags to existing HTML documents. Concise meta data is embedded into the original HTML documents. Popular microformats tags include hCard: describing personal and company contact information; HReview: meta information added to the comment page; And hCalendar: labels describing events. Microformats is popular because of its simplicity, but its capabilities are still limited. For example, the description of hierarchy that is considered necessary by the traditional semantic community cannot be done. In addition, in order to minimize the set of tags, it is inevitable that the meaning they express is relatively vague. This leads to another question: Is it appropriate to embed tags in HTML documents? However, although there are still many problems, Microformats is still popular because of its simplicity. For example, Flickr, Eventful, LinkediIn and many other companies are adopting microformats, especially after Yahoo's search statement was released. A simpler method is to put the meta data in the meta header. This method has been used to a certain extent, but unfortunately it is not widely used. The New York Times recently launched an annotation expansion for their news pages, and the benefits of this method have been shown in those theme or event pages. For example, a news page can be identified by a set of keywords: place, date, time, person and category. Another example is the book page. Book information has been added to the meta header of the page: author, ISBN and book category. Although all of these methods are different, the same thing is that they are very useful. The more pages are labeled, the more standards will be implemented, and the information will become more powerful and accessible.
In the discussion of the Semantic Web, users and enterprises have different concerns. From the perspective of consumers, we need a killer app that can deliver real and simple value to users. Because users only pay attention to the practicality of the product, and do not care what technology it is built on. The problem is that the focus of the Semantic Web is still more on the theoretical level, such as tagging information to make it machine-readable. We can promise that once information is labeled, the network will become a large RDF database, and a large number of exciting applications will emerge as the times require. But some skeptics point out that you must first reach that assumption.
There are already many semantic web based applications, such as General and Vertical search engine , Text Assistant Personal Information Management System , semantic browsing tools, etc., but there is still a long way to go before they are accepted by the public. Even if these technologies are successful, users will not be interested in knowing what technologies are used behind them. Therefore, there is no prospect of promoting semantic web technology at the user level.
Enterprises are different. First, enterprises are more accustomed to technical arguments. For them, using semantic technology can increase the intelligence of products, thus forming market value. "Our products are better and smarter because we use the Semantic Web", which sounds like a good promotion for enterprises.
From the enterprise level, RDF solves the problem of data interoperability standards. This problem has actually appeared in the early days of the software industry. You can forget about the Semantic Web and just regard it as a standard protocol, a standard that allows two programs to exchange information. This is undoubtedly of great value to enterprises. RDF provides an XML based communication scheme, and the prospect it describes makes enterprises not care about its complexity. However, there is still a scalability problem Relational database Different, XML based databases are not popular Blame Its scalability and query capability. Just like the object database in the late 1990s, the XML based database carries too many expectations. Let's wait and see.
Semantic APIs are developed with the development of the Semantic Web. This kind of web service takes unstructured text as input and outputs some entities and relationships. For example, Reuters' Open Calais API, which accepts the input of original text, returns the person's name, location, company and other information in the text, and marks them in the original text. Another example is the Hacker API of TextWise, which also offers a reward of one million dollars to reward the best commercial semantic web applications based on its API. This API can divide the information in the document into different categories (called semantics fingerprint ), output entities and topics in the document. This is similar to Calais', but it also provides a hierarchy of topics. The actual objects in the document are leaf nodes in the structure. Another example comes from Dapper, which is a web service that helps to extract structured information from unstructured HTML pages. Dapper's work depends on the user defining some attributes for the object on the page. For example, an image publisher will define where the information about the author, ISBN and page number is. Then Dapper applications can create an identifier for the site, and then read its information through the API. From a technical point of view, this seems to be a step backwards, but in fact Dapper's technology is very useful in practice. Take a typical scenario as an example. For a website with no special API to read its information, even a person who does not know the technology can use Dapper to construct an API in a short time. This is the most powerful and fastest way to turn a website into a network service.
Perhaps the initial motivation for the development of the Semantic Web is that the quality of search has been difficult to improve for a long time. The hypothesis that the understanding of page semantics can improve the search quality has also been confirmed. Hakia and PowerSet, the two main competitors of Semantic Web search, have made a lot of progress, but still not enough. Because the statistics based Google algorithm performs as well as semantic technology when dealing with entities such as people, cities and companies. When you ask "who is the president of France", it can return a good enough answer. More and more people realize that it is difficult to defeat Google by improving the edge of search technology, so they turn to search for killer applications of the Semantic Web. It is possible that understanding semantics is helpful for search engines, but it is not enough to build a better search engine. The search experience of the next generation of search engines can be improved by fully combining semantics, novel presentation methods and user identification. Other methods try to apply semantics to search results. Google is also trying to divide search results into different categories, and users can decide which categories they are interested in. Search is a race that many semantic companies are chasing. There may be another possibility to improve the search quality: the combination of text processing technology and semantic database. Now we will talk about it. We have seen more and more text processing tools enter the consumer market. Text navigation applications such as Snap, Yahoo Shortcuts, or SmartLinks can "understand" the objects in text and links and attach corresponding information to them. The result is that users can get the understanding of information without searching at all. Let's think further, text tool The way to use semantics can be more interesting. Text tools no longer analyze the keywords entered by users in the search box, but rely on the analysis of network documents. In this way, the semantic understanding will be more accurate, or less speculative. Then the text tool provides users with several types of relevant results for selection. This method is fundamentally different from the traditional way of stacking the correct results from a large number of documents in front of users. There are also more and more text processing tools combined with browsers. The top-down semantic technology does not require the publisher to do anything, so it is conceivable that context and text tools can be combined in the browser. Firefox's recommended extension page provides many text browsing solutions, such as Interclue, ThumbStrips, Cooliris and BlueOrganizer.
Semantic database is a development direction of annotated semantic web applications. Twine is in the beta testing stage. It focuses on building a private knowledge base about people, companies, events and places. The data source is unstructured content of various forums, which can be submitted through bookmarks, emails or manual methods. This technology has yet to mature, but its benefits are obvious. You can imagine a Twine based application as a personalized search, filtering the search results through your personal knowledge base. The underlying data representation of Twine is RDF, which can be used by other semantic web services, but its core algorithms, such as entity extraction, are commercialized through semantic APIs. Reuters also provides a similar API interface. Another pioneer of semantic databases is a company called Metaweb, whose products Freebase From the form it presents, Freebase is just a more structured version of Wikipedia based on RDF. But Freebase's goal is to build a world information database like Wikipedia. The strength of this information database is that it can conduct accurate queries (just like relational databases). So its prospect is still better search. But the question is, how does Freebase keep pace with the world's information? Google indexes the network documents every day Network development And develop. Freebase The information of is only from personal editing and data retrieved from Wikipedia or other databases. If you want to expand this product, you must improve access from the whole network Unstructured information , parse and update the database. Keeping up with the world is a challenge for all database methods. For Twine, there needs to be constant user data addition, while for Freebase, there needs to be constant data addition from the network. These problems are not easy to solve. They must be properly handled before they are really practical. All new technologies need to define some concepts and get some categories. Semantic Web provides a very exciting prospect: improving the discoverability of information, realizing complex search, and novel web browsing methods. In addition, the Semantic Web has different meanings for different people. It has different definitions for enterprises and consumers. It also has different meanings in different types, such as top-down vs. bottom-up, microformats vs. RDF. In addition to these patterns, we have also seen the development of semantic APIs and text browsing tools. All these are still in their early stages of development, but they all bear the expectation of changing the way we interact with network information.
The advanced stage of the Semantic Web enables libraries, ticket sales and booking systems, Customer management system The decision system can play a good role. For example, if you want to travel, as long as you provide the specific time requirements and your favorite domestic tourism types to the query system supported by the Semantic Web, then the corresponding domestic attractions will be the best Tourism programme And precautions, tips and travel agency comments can be quickly prepared on the browser page.
The Semantic Web will eventually apply the advanced stage of the network to every corner of the world. Everyone has his or her own network IP identity. Personal consumption credit, medical care, files, etc. are all in his or her own network identity. meanwhile Network Community More active than the real community, the network society is more orderly and harmonious.