Information Center

Deep research on neural networks and deep learning

  

Machine learning technology is coming into the data center. It can not only improve internal IT management, but also make key business processes more intelligent. You may have heard of the mystery of deep learning, which covers all fields, from system management to autonomous vehicles. In the end, is deep learning a very smart emerging AI that has just been unveiled in front of the world, or is it just a marketing propaganda means to repackage the existing complex machine learning algorithm into a new selling point?

Deep learning undoubtedly stimulates the imagination of the public, but it is not so complicated. On the technical level, deep learning mainly refers to large-scale computing intensive neural networks. These neural networks are often trained from large data sets that are difficult to process with logic and rule-based machine learning methods, such as images, voice, video and other dense data with complex patterns.

Neural networks themselves are not new. Almost since the pioneering stage of modern computers, neural network algorithms have been studied to help identify hidden patterns in complex data streams. In this sense, deep learning is based on the well-known machine learning technology. However, when new neural network algorithms with higher computational complexity are applied together with today's big data sets, significant new opportunities are created. Using low-cost cloud services or commercial scale out big data structures, these "deep" models can be created and applied to large-scale application scenarios in real time.

Sensitive neural network

Neural network research started in the 1950s and 1960s, and was first modeled to study how the human brain works. The neural network is composed of multi-layer nodes, which are connected to form a large network, like neurons in the brain. Each node receives an input signal. Next, it sends an output signal through a predefined "activation function" and transmits it to other nodes. At the same time, it determines when the node should enter the active state. Simply, you can think that how a node works depends on its excitement. When a node becomes excited after receiving a set of inputs, it can generate a certain degree of output signal and transmit it to its downstream nodes. Interestingly, when a node is excited, its output signal can be positive or negative; The activation of some nodes will actually inhibit the excitation of others. Nodes are interconnected through links, and each link has its own weight variable. The weight of a link will adjust the signal transmitted through it. By gradually adjusting the link weight of its entire network, the neural network adapts and learns how to recognize patterns. Finally, only the correctly identified patterns will produce a complete excitement transmission throughout the network.

Generally, the input data is formatted as an input signal, which is linked to the external nodes of the first layer. These nodes then send signals to one or more hidden layers, and finally the output layer nodes send a "feedback" to the outside world. Since learning (i.e. intelligence) is implicit in link weights, the core problem in practical applications is to figure out how to adjust or train all link weights to achieve correct mode responses. Today, the neural network mainly uses the incremental learning technology of backward propagation to find the correct mode in the training data to complete the learning process. When the neural network generates a useful way to identify the correct samples, the method will correspondingly give the link "reward", and when the neural network identifies the wrong samples, it will give punishment.

However, it is impossible to have a neural network architecture that can be applied to any given problem. At this point, machine learning expertise is very important, because given a certain number of nodes, their incentive functions, a certain number of hidden layers, and the connection relationship of all nodes (such as dense connection or sparse connection, whether there is internal feedback or circular loop), there may be countless potential neural network configurations. In traditional research, limited by hardware conditions, the number of hidden layers of neural networks is set very little. Even so, neural networks have shown an amazing and skilled learning ability that exceeds human beings. Today, deep learning neural networks may have hundreds of layers of networks, and can be fully competent to deal with deep mystery problems.

The key to the practical application of deep learning is to figure out how to effectively expand large-scale neural networks on hundreds of thousands of parallel computing cores, and then use huge data sets for efficient training. In the past, this was achieved by HPC, a high-performance computing device far beyond the scale of enterprise data center. Today, NVIDIA, Mellanox and DataDirect Networks are launching HPC products that are suitable for the size of general enterprise data centers. For example, NVIDIA's DGX-1, in essence, is a super integrated supercomputer designed for in-depth learning and integrated with eight high-end GPU computing cards. Surprisingly, it is only 4U in size, which is obviously acceptable to ordinary companies.

Google's AlphaGo Era

Cloud computing providers such as Google also provide hosted machine learning tools. For example, Google's AlphaGo game program recently defeated the world-class Go champion in the highest level game. Go is considered to be one of the last frontier fields where machines cannot compete with human intelligence, because it cannot be solved in a limited time by simple and violent calculation (to calculate the best position completely on the 19x19 Go board, the computing power required exceeds the existing strongest computer power). However, you can think that the AlphaGo team has taken a shortcut, by training the deep learning program, so that it can play Go with the best result any human player has ever played. Moreover, this program can become stronger by playing Go with yourself.

Underneath its appearance, AlphaGo is mainly composed of two large neural networks linked together (through some Monte Carlo simulations, the large set of "too many optional drop positions need to be calculated" is reduced to a small set that may be composed of better drop positions). The first neural network uses the chess scores of millions of games in the past to train, so it can determine which positions are most likely to be the positions of the winner. The second neural network is trained to estimate the value of each new position. In principle, it gives a higher value to the position occupied by the winner at the end of the chess spectrum. These two neural networks are used together recursively to select the best placement by predicting the next finite steps of the two sides and the board. The essence here is that the deep learning method can now defeat the best chess player in real time without relying on brute force computing by learning chess scores from the best chess player.

How can deep learning be applied in enterprise IT?

The deep learning program will even surprise its designers and trainers. By learning complex patterns from historical data, it can work well in scenes that exceed expectations or seem irrelevant. But fundamentally, the deep learning program can't really predict a model that has never been trained - it can only learn from the scenes that have already been encountered. Moreover, the deep learning program cannot use logical terms or rules to describe what it has learned, so it cannot simply abstract its learning results.

For any machine learning technology, it is always necessary to balance between becoming too special (for example, it needs to record historical data too accurately, which is too rigid like a lookup table) and keeping too general (for example, no matter what the input is, simply give a single most likely value) to avoid invalid calculations. The job of a data scientist is to try to find the best balance for a particular problem.

In any scenario where we have a large amount of training data available, in-depth learning will undoubtedly be very useful. Every day, in the IT department, we generate more and more machine data that can be used to develop useful artificial intelligence. For example, in security applications, neural networks can learn to identify the deep patterns of possible intrusion or hacker behavior. Neural networks can even be trained with time series data to learn to identify dynamic normal (abnormal) behaviors under the influence of workload and resources. Google is likely to be studying how to use AlphaGo like AI capabilities to help it manage cloud infrastructure, avoid loss and optimize resource allocation.

If you want to learn more about neural networks, I suggest you take a few minutes to try some relevant interactive examples. We should all be ready to connect our brains directly to the data center network one day.