[New era, new journey, new great cause] Intelligent technology promotes the sorting of ancient books into the "fast lane"
Guangming Daily 2022-12-18
Author: Our reporter Chen Xue

[New era, new journey, new great cause]

Whether it is stone rubbings, periodicals of the Republic of China, or printed ancient books, it only takes 5 minutes to perform batch OCR (character recognition) on hundreds of pages of document images, and can start proofreading online. At the beginning of November, Gulian Company of Zhonghua Book Company released the "Gulian OCR System", which is an important achievement of the application of intelligent technology in ancient book sorting.

"The recognition rate is very high, and there are basically no errors." As soon as the OCR system was released, some users had a trial experience. In fact, the seemingly simple step from paper-based text to digital data is an important link in the study of ancient book sorting, which requires the use of multiple intelligent technologies.

"The entry of intelligent technology into the field of ancient book collation is a remarkable innovation in the working methods of ancient book collation for a long time. The key link of ancient book classics from paper to data is the acquisition of text, and the accuracy of text acquisition and the ease of operation have a great impact on the follow-up work." Hong Tao, general manager of China Bookstore Gulian Company, said that if the recognition effect is too poor, It will add a lot of workload to the subsequent proofreading and sorting work. The GULIAN intelligent OCR system is based on machine learning technology and rich font support, supplemented by a convenient online proofreading and editing environment, which can greatly reduce the workload of manual proofreading and help editors and authors process text more efficiently and conveniently.

The report of the 20th National Congress of the Communist Party of China proposed to promote the digitization of education and build a learning society and a learning country with lifelong learning for all. It is understood that the OCR system is combined with the ancient book automatic punctuation and simplified conversion tools launched by Gulian Company to expand the technical tools used by the ancient book sorting research community into intelligent products within the reach of ordinary users, serving ordinary readers. In addition, the system can also cooperate with the traditional disciplines such as classical literature in colleges and universities to turn to the construction of new liberal arts, so that students can understand the emerging technology and development direction of the front end of the industry at the learning stage.

Golgi is Golgi; "Shi Li" is Shelley; Jia Jiansheng, Gong Han and Sui Luowen are all Lu Xun's pseudonyms... When reading the early literature of Chinese translated literature, people often have the puzzle of "guessing names". This is because the Chinese translation names of early foreign writers and their works are extremely inconsistent, and translators often use pseudonyms and are fickle. For a long time, there has been a lack of basic and systematic sorting work in this field. On November 12, the Modern Chinese Translation Science Chronological Examination Database came into being. This is another important database product launched by Gulian since the 20th CPC National Congress. It is understood that the database is presided over by Professor Li Jin of Renmin University of China, reviewed and approved by Professor Xia Xiaohong and Fang Xide of Peking University, Professor Sun Yu of Renmin University of China, Professor Xie Zhixi of Tsinghua University, and many young scholars are cataloged. The development and construction of Gulian Company of China Book Company is committed to providing convenient tools for the academic community to study modern Chinese translated literature, and establishing a historical database of the topic Knowledge base and directory index base.

According to Hong Tao, the database has collected 226 modern and contemporary periodicals, and its Chinese translation of foreign literature involves 51 countries, 1580 foreign writers and 2130 translators, with a total number of nearly 9000 entries. In addition, a large number of important periodicals are still under examination, and will be added online soon. The project team catalogued, sorted out and explained the Chinese translated literature and its related phenomena in the journals from 1896 to 1949 on an unprecedented scale, and made a brief introduction to the life of the translators and their pen lists. The database is a comprehensive new research tool that integrates the thematic literature database, knowledge base, and catalog index database. It is applicable to the teaching and research of modern and contemporary Chinese literature, comparative literature, world literature, foreign language literature, and other disciplines, as well as related history, culture, and other humanities.

Distinguish the academic chapter and examine the origin and development of mirror. The reporter saw that the relevant item information was examined in detail in the database. For example, the novel Mourning Dust published on Zhejiang Tide in 1903 is a French "novel written by Xiao Ru and translated by Gengchen", and the page contains hundreds of words: Xiao Ru, today's translation Hugo, and the translator Gengchen is Lu Xun's pen name. It is understood that the database has successfully realized the "same reality but different names" relationship between different Chinese translations of foreign writers and today's translations, and between the translator's signature and commonly used names. For example, when searching for "Lu Xun", all relevant documents with other pseudonyms such as Suozi, Fengsheng and Zhang Luru that Lu Xun once used can be presented at the same time, which solves the problem that "the same reality but different names" cannot be retrieved in the past retrieval, thus revitalizing the historical materials of Chinese translated literature in journals.

The 20th National Congress of the Communist Party of China proposed to "implement the national cultural digital strategy". Hong Tao said that this has provided impetus and pointed out the direction for the development of Gulian Company. As a digital enterprise of Zhonghua Book Company, Gulian Company should make good use of technical means, extract elements of Chinese excellent traditional culture from literature, do a good job in transformation and dissemination, enhance cultural self-confidence, and make cultural products more contemporary and vital.

(Our reporter Chen Xue)