How to Build a Data Security Ecology in the AI Era of "Beating Magic with Magic"

In the AI era, data security governance needs to achieve full process control. In the opinion of some experts, "it is very difficult to solve the current problem by relying on some mature technologies in the past. Today, only by" using AI against AI "can we solve the problem of content security."

——————————

A new bill proposed by the United States has further aroused people's attention to data security in the era of big model.

Recently, a member of the US House of Representatives proposed a "Generic AI Copyright Disclosure Act", which requires enterprises to inform the government before launching the AI system, and also list all copyrighted works used to build or change the training dataset of the system. If this bill is passed, it will promote American AI companies to be more transparent in the use of model training data.

Coincidentally, the Artificial Intelligence Act (hereinafter referred to as the "Act"), which was voted and approved by the European Parliament more than a month ago, also clearly requires that data used for training, testing and verification of artificial intelligence tools should be properly managed.

"In recent years, countries have introduced legislation and policies related to artificial intelligence. Most countries have taken a cautious attitude when formulating relevant regulations to avoid excessive restrictions on the development of artificial intelligence, so as not to hinder innovation and progress, which highlights the importance of the international community on data security governance." At the recent "2024 Zhongguancun At the Forum - Data Security Governance and Development Forum, Guo Yike, the Chief Vice President of the Hong Kong University of Science and Technology and an academician of the Royal Academy of Engineering, said.

At present, the application of new technologies represented by artificial intelligence (AI) has become an important engine for the development of new quality productivity, and the ensuing data security governance and development have also attracted much attention. During the 2024 Zhongguancun Forum (hereinafter referred to as the "Forum"), a number of AI related forum activities were held. "Data security governance in the AI era" has also become a hot topic. Many domestic and foreign experts, scholars and industry insiders exchange forward-looking ideas, share research results, and deepen the effectiveness of data security governance in frontier fields.

New situation of data security governance

In the AI era, "data" is a keyword that has to be mentioned.

"Data is very important for the progress of AI." Piero Sgarufi, president of the Silicon Valley Institute of Artificial Intelligence, said at the opening ceremony of the 2024 Zhongguancun Forum Annual Conference. He mentioned that the main driving factor behind the rapid development of AI technology is a large amount of data, from IBM's "Deep Blue" computer defeated Kasparov, the then world chess champion, in 1997, to Google's artificial intelligence program AlphaGo defeated Lee Shishi, the South Korean Go world champion, in 2016, to the emergence of OpenAI's GPT.

However, with the rapid change of AI technology, especially the rapid development of generative artificial intelligence (AIGC), its performance has been optimized while also amplifying the data security risks, which may cause value bias, privacy disclosure, false information and other issues that have aroused public concern. "Now many diagnoses are made by AI. For example, if you want to use GPT to diagnose diseases, can we rest assured? Most of the time, we don't rest assured," said Piero Sgarufi.

According to Huang Minlie, professor of the Department of Computer Science and Technology of Tsinghua University and deputy director of the Basic Model Research Center of the Institute of Artificial Intelligence, the emergence of AIGC makes the current data security governance very different from the past. He said that AIGC can combine and generate some new content that has never appeared in past corpora through training, which may be harmful.

"In the era of generative artificial intelligence (AIGC), we are faced with the problem of constantly combining and creating new data security risks at the data security level. To completely solve this risk, we need the support of algorithms and tools, rather than relying on people or some static methods to do relatively fixed detection." Huang Minlie said, We can study and develop some targeted countermeasures and attack methods, such as letting the algorithm discover the vulnerabilities and risks of the model in advance; The algorithm can also be used to track the risks existing in the model, and then targeted security measures can be formulated.

Today, in addition to the general big model similar to ChatGPT, industry big models focusing on a certain vertical field are also emerging in endlessly. Qianxin Liu Qianwei, Vice President of the Group, has long been concerned about the data security governance of the industry model. He observed that many industry model manufacturers have a common concern: will the model pre training corpus be stolen by others?

He explained that most of the data "fed" by the industry model during the pre training were industry knowledge. "These corpora may be some of the most competitive data of different enterprises." If there are loopholes, these core data assets may be leaked. "This is a point that we did not pay special attention to in the general model".

At the same time, as users of the general big model, many people will ask a question: when I ask the big model question, will it cause the leakage of trade secrets and personal information. Liu Qianwei also raised the above questions on the forum.

Liu Qianwei also agrees with Professor Huang Minlie on how to solve the above problems. "It is very difficult to solve the current problem by relying on some mature technologies in the past. Today, the problem of content security can only be solved by 'using AI against AI'."

Build a secure and credible data governance ecosystem

In May 2023, 350 AI authorities, including Sam Altman, the "father of ChatGPT", signed a joint open letter, which wrote: "Mitigating the risk of extinction caused by AI should be given equal attention to other social scale risks (such as pandemics and nuclear war) as a global priority."

"AI highlights the importance of data, creating a global governance framework, and building mutual trust is the most important factor." Liu Junhong, Director of Singapore Information and Communication Media Development Bureau, pointed out in his speech at the forum that trust in data and AI is the basis for balanced protection and innovation in the digital era.

During the forum, many experts mentioned a word - trusted AI in the discussion. The core of the word is the security of AI big model. "We must realize that data security is an eternal proposition, and we must be 'safe and credible'." Shen Changxiang, academician of the Chinese Academy of Engineering, said at the forum that we should use active immune trusted computing to build a strong defense line for AI security.

In his opinion, data security governance needs to achieve full process control and achieve six "no": first, let the attackers "cannot enter", and then "cannot get" data, even if the attackers get the data, they will get it for nothing, because the data is encrypted "cannot understand", and the system has an automatic immune function, so that the attackers "cannot change" the data. At the same time, it is necessary to ensure that the equipment can not be paralyzed, that measures can be taken in time to ensure stable operation after the fault is found, and finally that the attack can be traced back to "cannot be relied on".

"Achieving these six 'no' effects can make data security governance achieve effective goals." Shen Changxiang introduced that after more than 30 years of development, China has built a relatively complete new industrial space. He said that at present, China has domestic CPUs with trusted computing functions, embedded trusted chips and trusted roots, and equipment with trusted computing 3.0 technology.

In addition to trusted computing, a series of cutting-edge technologies also provide support for building a secure and trusted data governance ecosystem. Guo Yike, Chief Vice President of the Hong Kong University of Science and Technology and academician of the Royal Academy of Engineering, believes that blockchain and quantum cryptography and other technologies have great application prospects in enhancing data security.

"Blockchain has transparent and unchangeable technical characteristics." Guo Yike said that the application of blockchain technology in the field of data security governance can further reduce the risk of data authorization and data tampering while ensuring data integrity. He also mentioned that intellectual property rights can be protected and unauthorized data disclosure can be prevented through data anonymization technology, user consent and privacy design rules, as well as data classification, access control and encryption.

In Guo Yike's opinion, encryption technology can protect the security of data during static and transmission. In addition, anonymous technologies such as differential privacy and data shielding can also be used to delete personal identity information to ensure the confidentiality of data, while retaining its usefulness for AI model training.

"Use magic (i.e. AI technology) to defeat magic" is a major way out for data security governance in the AI era proposed by Liu Qianwei. In the discussion on the technical path related to AI data security governance on the forum, privacy computing, federated learning, etc. were repeatedly mentioned.

In addition to problems, AIGC also provides new and more effective means for data governance. Sun Maosong, academician of the European Academy of Sciences and executive vice president of the Artificial Intelligence Research Institute of Tsinghua University, mentioned that there are many privacy issues in the data now, but you can use generative artificial intelligence (AIGC) to generate data that conforms to the real situation, while avoiding users' real privacy data. "Therefore, in fact, generative AI also has a very important positive role in promoting our data governance."

Innovate AI supervision mechanism

"Data security governance is a crucial and evolving issue in the era of artificial intelligence and digital transformation." Guo Yike said that in the era of artificial intelligence, it is necessary to establish an international institutional framework and regulations to regulate data security while protecting data privacy and sensitive information.

In recent years, many countries and regions, including the UK's Artificial Intelligence Regulatory Rules for Innovation and the EU's Artificial Intelligence Act, have successively introduced relevant policies and laws to regulate the development of artificial intelligence, many of which are related to data security governance.

On August 15, 2023, the world's first special legal norm on generative AI governance, the Interim Measures for the Management of Generative AI Services (hereinafter referred to as the Measures), was officially implemented in China. Wu Shenkuo, deputy director of the Research Center of China Internet Association, believes that the Measures focus on data governance in the context of generative AI services and introduce a number of special norms, which have important institutional guiding significance for building a new ecology of data governance oriented to the AI era.

In the process of exploring the controllable development of AI, the regulatory sandbox mechanism is an innovative means. Regulatory sandbox refers to a mechanism that allows innovative products, technical services or business models to conduct in-depth tests on real users in the real market environment by setting restrictive conditions and developing risk control measures under the premise of controllable risks. The EU AI Act clearly requires its member states to create AI regulatory sandbox. At present, Norway, Spain and other countries have begun to supervise sandbox related work.

Liu Junhong mentioned that at present, there are many technologies and policies that can enable enterprises to obtain value from consumer data sets and also ensure that consumer data sets are protected. He believes that the next step should be to establish a regulatory sandbox in which these technologies and policies should be used and developed, so as to maximize the use of data while ensuring data security.

During the forum, the "Beijing AI Data Training Base Supervision Sandbox" was officially released, which is the first supervision sandbox in the field of AI in China. Mao Dongjun, deputy director of Beijing Municipal Bureau of Economy and Information Technology, said that in terms of management, the use of regulatory sandbox management mechanism can help enterprises avoid data risks within the scope of legal compliance; Technically, advanced technologies such as data encryption, desensitization, cloud desktop operation, and security management can provide basic support for model enterprises and data enterprises, and "truly realize the availability, visibility, and unavailability of data, and avoid high-risk events such as data leakage".

From the introduction of policies and laws to the use of regulatory sandbox for institutional and technical experiments, the purpose is to promote the construction of a new ecology of data security governance in the era of artificial intelligence (AI).

At present, installing "brakes" for AI has become a hot topic in the industry, and it is not uncommon in the forum. "The 'brake' is a governance system for technical risks." Xue Lan, president of the Su Shimin Academy of Tsinghua University, said in an interview at the forum that the purpose of building a governance system is to regulate the abuse, misuse and misuse of AI.

Xue Lan introduced that China's AI governance system has three layers. The first layer is the extensive basic principles that all social subjects should abide by, such as the Code of Ethics for the New Generation of Artificial Intelligence; The second layer is specific laws and regulations for specific areas of AI, such as the Methods; The third level is to encourage enterprises to strengthen the construction of internal mechanisms, such as the establishment of ethics committees.

If the "brake" is not installed in place, how to deal with it? Xue Lan said, "We especially encourage enterprises to strengthen their own mechanism construction, which is very critical. On the other hand, it also needs public supervision from the whole society."

China Youth Daily · Intern reporter of China Youth Network Jia Jiye and reporter Zhu Caiyun Source: China Youth Daily

April 30, 2024 Version 06