Lei Fengwang Note: The author of this article is Shan Shiguang, Ph.D., Institute of Computing, Chinese Academy of Sciences Researcher, Ph.D., and Deputy Director of the Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences. Mainly engaged in computer vision, pattern recognition, machine learning and other related research work, especially research related to face recognition. The original title of this article is "Discussion on the Inspiration and Openness of Deep Learning in Computer Vision."
The revival of deep learning supported by big data is certainly a milestone progress in the AI ​​field, but it does not mean that deep learning has the potential to solve all AI problems.
ã€Abstract】In recent years, deep learning technology has achieved great success in many computer vision tasks such as image classification, object detection and recognition, and even "Image Captioning". This article first explores some of the implications of the success of deep learning and then discusses related open issues. The above is only a sketchy view of the individual.
| Inspiration for Deep Learning SuccessThe success of deep learning has not only brought about rapid progress in artificial intelligence-related technologies, it has solved many problems that were considered difficult to solve in the past, and more importantly, it has brought us revolutionary ideas. The individual believes that this is mainly reflected in the following Aspects.
1. The reform of optimization methods is the key to opening the door to deep learning revival
Looking back at the decade of deep learning since 2006 (the so-called first year of deep learning), we must first take note of the important role of the continuous improvement of the optimization method. It should be noted that deep learning is not a completely new technological method, but a renaissance of multilayer neural networks that emerged in the 1980s. Depth models (such as the deep convolutional neural network DCNN) that are now hot in the CV field have basically taken shape in the 1980s. There were many reasons for this failure. One of them was the lack of effective optimization of multi-layer networks for a long time. The method, especially an efficient method for initializing a multi-layer neural network.
In this sense, the main contribution of Hinton et al. in 2006 was to pioneer the unsupervised, layered, pre-trained multi-layer neural network, which enabled many researchers to regain confidence in multi-layer neural networks.
However, in fact, in the last three years, the prosperity of DCNN has not had much to do with unsupervised, layered pre-training, but more with optimization methods or techniques that are conducive to optimization, such as Mini-Batch SGD, ReLU activation function, Batch Normalization, The shortcuts in ResNet and so on, especially the means to deal with the problem of the disappearance of gradients, contributed to the continuous deepening of the DCNN network and the continuous improvement of its performance.
2. From an empirically-driven artificial feature paradigm to a data-driven representational learning paradigm
Before the rise of deep learning, expert knowledge and experience-driven AI paradigm dominated many fields such as speech processing, computer vision and pattern recognition for many years, especially in information representation and feature design. In the past, relying heavily on labor, severely affected the intelligent processing. The effectiveness and versatility of technology. Deep learning has completely overturned this paradigm of "man-made features" and opened up a data-driven "expression learning" paradigm. Specifically reflected in two points:
1) The so-called experience and knowledge are also in the data. When the amount of data is large enough, there is no need for explicit experience or embedding of knowledge, which can be learned directly from the data.
2) It is possible to start learning a representation directly from the original signal without having to artificially switch to some so-called "better" space for further study. The data-driven presentation learning paradigm eliminates the need for R&D personnel to design different processes for different problems based on experience and knowledge, which greatly improves the versatility of AI algorithms and greatly reduces the difficulty of solving new problems.
3, from "step-by-step, divide-and-conquer" to "end-to-end learning"
The divide-by-step or step-by-step method, which breaks down complex issues into a number of simple sub-problems or sub-steps, was once a common idea for solving complex problems. In the AI ​​field, it is also a widely used methodology.
For example, in order to solve the problem of image pattern recognition, it has often been decomposed into several steps such as preprocessing, feature extraction and selection, and classifier design. For another example, in order to solve the nonlinear problem, a piecewise linear approach can be used to approximate the global nonlinearity. The motivation for doing so is very clear, that is, the sub-problems or sub-steps become simple, controllable and easier to solve. However, from the perspective of deep learning, its disadvantages are equally obvious: sub-optimal sub-problems do not necessarily mean global optimality, and each sub-step is optimal, and does not mean that the entire process is optimal.
In contrast, deep learning places more emphasis on end-to-end learning, that is, instead of artificial steps or sub-problems, it is entirely up to the neural network to directly learn the mapping from the original input to the desired output. Compared to the divide-and-conquer strategy, end-to-end learning has the advantage of synergy, and it is more likely to obtain a globally better solution. Of course, if we have to think of layering as a “sub-step or sub-problem,†it’s okay, but the function of these layers is not what we set in advance, but it’s automatic through global optimization based on data. educational.
4, deep learning with superior nonlinear modeling capabilities
Many complex problems are highly nonlinear in nature, and deep learning achieves a nonlinear transformation from input to output, which is one of the important reasons for deep learning to achieve breakthroughs in many complex problems.
Before deep learning, many linear models or approximate linear models have been popular. In particular, since the 1990s, linear subspace methods for discriminant dimensionality reduction have gained attention, such as principal component analysis, Fisher linear discriminant analysis, and independent component analysis.
Later, in order to deal with non-linear problems, nonlinear processing methods such as Kernel tricks, manifold learning, etc. have been paid attention to. The Kernel method attempts to implement a nonlinear transformation of the original input, but it cannot define an explicit nonlinear transformation. Only a limited number of kernel functions can be used to define the dot product in the target space and indirectly achieve nonlinearity.
However, the manifold learning methods that were once highly valued after 2000 tried to learn nonlinear mappings by retaining the geodesic distances or local neighborhood relationships between sample points. Unfortunately, such methods are difficult to truly implement non-training. Explicit nonlinear transformation of the sample. Deep learning has acquired the ability to adapt a sufficiently complex nonlinear transformation by acting on a large number of nonlinear activation functions of neurons (such as Sigmoid or ReLU).
5, the big model is not always bad. Occam's razor principle is widely known in many fields, especially machine learning. It tells people: "If it is not necessary, do not increase the entity." In other words, the model for solving problems can be simple and not complicated.
This principle is an important law in the field of machine learning to improve the ability of the model to promote, but also makes complex large models are often not optimistic. Deep learning is exactly puzzling at this point. Taking AlexNet as an example, the parameters (weights) it needs to learn are up to 60 million. Such a huge parameter seems to indicate that this is a very complicated (if not too complicated) The model of the words).
Of course, the number of parameters that need to be learned in the model is not directly equal to the complexity of the model, but it goes without saying that deep learning at first seems to be a very “complexityâ€. So, does Occam's razor principle fail? Or is the complexity of seemingly complex deep learning models not high? At present, there seems to be no clear theoretical support. Recent work shows that many well-trained and complex deep learning models can be reduced by pruning, compression, etc., and their performance is not reduced or even increased.
The key here may lie in the "dividends" brought about by "big data." One possibility is that researchers have long been confronted with "small data" problems and therefore have been too fond of simple models. In today's steep increase in data volume, moderately complex models have become more adaptable to the complex problems faced by researchers. When the amount of training data is large enough to be the same as the test data, even the test data can't run out of the range of training data. At that time, "overfitting" on training data became less terrible.
6. Ideas inspired by brain neuroscience deserve more attention
Deep learning as a multi-layer neural network is inspired by brain neuroscience.
In particular, the convolutional neural network is rooted in the cognitive machine model proposed by Fukushima in the 1980s. The motive of this model is to simulate that the receptive field of the mammalian visual nervous system gradually becomes larger and the characteristics of the simple and complex features are extracted layer by layer. In order to realize the semantic abstraction of visual neural pathways. With the joint efforts of the Nobel Prize winners Hubel and Wiesel, the path has gradually become clear from the 1960s, providing a good reference for the birth of CNN. However, it is worth noting that the biological visual nerve pathway is extremely complex. Neuroscientists have a clear edge extraction function for simple neurons in the primary visual cortex, and they also have some explorations of the functions of nerve cells that are increasingly complex behind the pathway. The function of ultra-complex cells at higher levels and their mechanisms of action are not yet clear.
This means that whether depth models such as CNN can really simulate the biovisual pathway are still unknown. But what is certain is that the connection of the biological nervous system is extremely complex. It not only has bottom-up feed-forward and peer recursion, but also has a lot of top-down feedback and external connections from other neural subsystems. These are the current depth models that have not yet been modeled.
But in any case, advances in brain neuroscience can provide more possibilities for the development of in-depth models and are worthy of attention. For example, more and more recent neuroscience studies have shown that neurons that were once thought to be extremely functional have actually good plasticity. For example, a large number of nerve cells in the visual cortex are "reshaped" to handle tactile or other modal data shortly after the visual processing needs are lost. This plasticity of the nervous system means that different intelligent processing tasks have good versatility, providing a reference for the development of general artificial intelligence.
| Open IssuesThe revival of deep learning supported by big data is certainly a milestone progress in the AI ​​field, but it does not mean that deep learning has the potential to solve all AI problems. The following is a discussion of open issues in the field of deep learning.
1. By analogy: Is big data necessary to learn?
Big data is the cornerstone of successful deep learning. Big data is for deep learning just as fuel is for rockets. More and more applications are continuously accumulating increasingly rich application data, which is crucial for the further development and application of deep learning. However, relying too heavily on labeled big data is precisely one of the limitations of deep learning.
Data collection is costly, and the cost of labeling has begun to rise, and there are still problems in some areas where it is difficult to collect data. For example, in the field of medical diagnosis, the collection of relevant data for some of the more rare diseases is difficult.
More importantly, when we use human intelligence as a reference system, we naturally ask: Is human intelligence the result of big data learning? The answer is not obvious. From the standpoint of human individuals, the answer is likely to be negative: we can even see an apple (or even just an apple picture) and learn to recognize apples without having to observe hundreds of different apples. However, this criticism of deep learning seems to be reasonable and well-founded, but it may not be fair: As a population, human beings have seen more than a thousand apples in their evolutionary process.
But in any case, how "small data" drives deep learning or other machine learning methods is a new direction worth exploring. In this sense, methods based on unsupervised data learning, similar areas of transfer learning, domain adaptation of common models, and embedding of knowledge and experience are very worthy of attention.
2. No teacher self-communication: how to obtain unsupervised learning ability?
The time and money required to obtain labeled data is high, but the cost of obtaining unsupervised data is minimal. At present, the ability of deep learning to learn from unsupervised data is so severely inadequate that a large amount of unsupervised data is like gold-rich sand sea. However, we do not have the tool for efficient gold panning. Interestingly, recalling the history of deep learning, we should remember that in 2006 Prof. Hinton and others advocated precisely the use of unsupervised learning to pre-train deep neural networks. But since then, especially after the rise of the DCNN, unsupervised pre-training seems to have been abandoned by many researchers (especially in the CV field).
It is indeed very difficult to learn the model directly from a large number of unsupervised data. Even if it is the "machine" of man, there are examples of "wolf-child" warning us that it seems unrealistic to have "unassisted self-reliance". However, the model of “a small amount of tutor data + a large amount of data without tutor†may be worth studying more.
3. Learning from parameters to structure learning?
Deep learning subverts the paradigm of "man-made features" in a "data-driven" paradigm. This is a major advancement. At the same time, however, it itself fell into a "man-made structure."
Regardless of whether Alex Netton originally designed AlexNet, or later VGG, GoogLeNet, ResNet, etc., it was designed by experienced experts. Given a new question, what kind of network structure is optimal (such as how many convolutional layers) is unknown, which to some extent hinders the popularization and application of deep learning in more intelligent tasks. Therefore, learning network structure and network parameters at the same time is a research direction worthy of great attention.
From a computational point of view, a comprehensive learning network structure is extremely complex. Although there have been some recent attempts in this area, such as pruning algorithms and network reduction, the network structure can be adjusted to some extent. At the same time, there has been a small amount of exploratory work on learning the structural parameters of the network (such as the number of kernel banks of the DCNN), but it is still in its infancy.
4. How to perform feedback and network modulation in the prediction phase?
We know that in the process of "seeing" by the human visual system, the input received by nerve cells in the visual pathway is not only from the lower layer or the same layer of neurons, but also receives a large number of feedback signals from high-level neurons and receives from other neurons. Modulation of the signal of the system (eg audible).
In contrast, in the feature extraction or prediction phase of the deep neural network we are currently using (especially DCNN) after training is completed, most low-level neurons cannot accept the feedback signals of high-level neurons, and there is no mechanism to receive other signals (such as first Test or other modal information). This means that "intelligence" capabilities such as prior knowledge, context, guessing and imagination (brain supplementation) are difficult to apply and embody on existing deep networks. How to break through this predicament and give the adaptive modulation capability in the deep network awareness stage is worth researching.
5, how to give the machine deductive reasoning ability?
Deep learning based on big data can be considered as an induction method, and deduction from general principles is another important human ability. Especially in the cognitive and decision-making process, we rely heavily on deductive reasoning.
Deductive reasoning seems to have nothing to do with the data in many cases. For example, we can rely on symbolic (language) descriptions to learn to recognize certain objects that we have never seen before, even without giving any examples.
Such a zero-shot learning problem seems to be beyond the reach of deep learning, but it may not be impossible. For example, in recent years, more and more production models based on deep learning are trying to achieve the generation of symbols (concepts) to images.
ã€Author introduction】 Shan Shiguang, Ph.D., Researcher, Ph.D., Institute of Computing, Chinese Academy of Sciences, Executive Deputy Director, Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences. Mainly engaged in computer vision, pattern recognition, machine learning and other related research work, especially research related to face recognition. He has published more than 50 papers in Class A international journals and conferences recommended by the Computer Society. The paper was cited by Google Scholar more than 9,000 times. He has been invited to serve as the Area Chair of ICCV, ACCV, ICPR, FG and other international academic conferences. He is currently the editorial board member (AE) of IEEE Trans. on Image Processing, Neurocomputing and Pattern Recognition Letters. The research results won the second prize of National Science and Technology Progress Award in 2005 and the second prize of National Natural Science Award in 2015. The winner of “Youqing†of the 2012 Committee of the Foundation and the winner of the 2015 CCF Youth Science Award.
Lei Fengwang Note: This article is authorized by the Deep Learning Lecture Hall (Rank) to be published by Lei Feng Network (search for the “Lei Feng Net†public number). Please reprint and contact the authorisation, and be sure to retain the author and source and not delete the content.
Shenzhen Innovative Cloud Computer Co., Ltd. , https://www.xcycomputer.com