In the upcoming computer vision top conference CVPR 2018 held in Salt Lake City in June, Tencent's two selected papers received attention from the academic and industrial circles because of its high application value.
As one of the highest-level conferences in the field of computer vision, CVPR's essays usually represent the latest development direction and level in the field of computer vision.
Tencent excellent multi-papers were included in CVPR2018. Based on Scale-recurrent Network for Deep Image Deblurring (Scale-recurrentNetworkforDeepImageDeblurring), the application of AI technology in dealing with the deblurring of non-specific scene images was introduced. Facelet-Bank for Fast Portrait Manipulation (Facelet-Bank for Fast Portrait Manipulation) introduces the use of AI technology to quickly process portraits. These two techniques solve some of the problems that have long plagued image processing because of the enormous The application of value has attracted the attention of the industry.
Decrypt motion blur: Towards practical non-specific scenes
When taking pictures at slow exposure or fast motion, blurred images often plague photo shooters. Researchers at the Tutu Lab have developed effective new algorithms that can recover blurred images.
Prior to this, image deblurring has been a problem that has plagued the industry in image processing. The causes of image blurring can be very complicated. For example, the camera shakes, loses focus, and the subject moves at high speed. The tools in existing photo editing software are usually not satisfactory. For example, the “camera shake reduction†tool in Photoshop CC can only handle simple camera shake jitter. This type of blur is known as "uniform blur" in the computer vision industry. Most of the blurred pictures are not "uniformly blurred", so the application of existing picture editing software is very limited.
Blurred photo
Deblurred photos
The new algorithm in Tencent's superior picture laboratory can handle picture blurring in non-specific scenes. The algorithm is based on a fuzzy model hypothesis called "motion blur". It models the motion of each pixel individually and can handle almost any type of motion blur. For example, in the above figure, each person's movement trajectory is different due to the translation and rotation caused by camera shake. After processing by the new algorithm of Tencent's excellent diagram laboratory, the picture has been restored to almost complete clarity, and even the words on the books in the background are clearly visible.
According to a researcher at Tencent's Youtu laboratory, the basic technique used by Tencent's superior maps is deep neural networks. After experiencing the training of thousands of pairs of fuzzy/clear image groups, a powerful neural network automatically learned how to sharpen the blurred image structure.
Although the use of neural networks for image deblurring is not a new idea, Tencent's excellent diagram lab incorporates physical intuition to promote model training. In the paper of the new algorithm of Tencent's excellent diagram laboratory, its network imitates a mature image recovery strategy called "coarse to fine". The strategy first reduces the blurred image to a variety of sizes, and then gradually processes a larger-sized image starting from a smaller, sharper image that is easier to recover. The clear images produced in each step can further guide the recovery of larger images, reducing the difficulty of network training.
AI Portrait Artist: Quickly Process Portrait Attributes in a Clean, Elegant Way
Modifying face attributes (not just beautification) in portrait photos is very difficult. Artists usually need to do a lot of processing on portraits to make the modified images beautiful and natural. Can AI take over these complex operations?
Prof. Jia Jiaya’s researchers from Tencent's Tutu Lab presented the latest model of “automatic portrait manipulationâ€. With this model, the user simply provides a high-level description of the desired effect, and the model automatically renders the photo according to the command, for example, to make him younger/older.
The main challenge in accomplishing this task is the inability to collect "input-output" samples for training. Therefore, the "generating confrontation" network popular in unsupervised learning is usually used for this task. However, this method proposed by the TUTU team does not depend on generating a confrontation network. It trains neural networks by generating noisy targets. Due to the denoising effect of the deep convolutional network, the output of its network is even better than the learning target.
"Generating cyberspace is a powerful tool, but it is difficult to optimize. We hope to find an easier way to solve this problem. We hope that this work will not only reduce the burden on artists, but also reduce the burden on engineers who train models. Tencent researchers said.
According to reports, another attractive feature of the model is that it supports local model updates, that is, when switching different operational tasks, only a small part of the model needs to be replaced. This is very friendly to system developers. Moreover, from the application level, applications can also be "incrementally updated."
Even if the face in the photo is not cropped and aligned well, the model can implicitly participate in the correct face area. In many cases, the user simply inputs the original photo to the model is sufficient to produce high quality results. Even if the video is input into the model frame by frame, the attributes of the face in the entire video can also be processed.
Attached: In addition to the above two articles, the introduction of the remaining Tencent Tutu laboratory selected CVPR2018 article
1. Referring Image Segmentation via Recurrent Refinement Networks
Semantic Segmentation of Designated Image Region by Recurrent Neural Network
Dividing a specific area of ​​a picture according to the description of natural language is a challenging problem. Previous methods based on neural networks have been segmented by merging the features of the image and language, but ignore the multi-scale information, which results in a poor quality of the segmentation results. In this regard, we propose a model based on a circular convolutional neural network, adding the features of the underlying convolutional neural network during each iteration to enable the network to gradually capture information at different scales of the picture. We visualized the intermediate results of the model and achieved the best level in all relevant open data sets.
2. Weakly SupervisedHuman Body Part Parsing via Pose-Guided Knowledge Transfer
Weakly supervised and semi-supervised human body segmentation through attitude-guided knowledge transfer
Analysis of human body parts, or segmentation of human semantic parts, is the basis of many computer vision tasks. In traditional semantic segmentation methods, we need to provide manually labeled tags for end-to-end training using Full Convolutional Networks (FCN). Although past methods can achieve good results, their performance is highly dependent on the quantity and quality of training data.
In this paper, we propose a new method for obtaining training data, which can use the data of human body key points that are easily available to generate the human body part analysis data. Our main idea is to use the morphological similarities between humans to pass the result of a person's site analysis to another person with a similar posture. Using the results we generated as additional training data, our semi-supervised model outperformed the strongly supervised method of 6 mIOUs on the PASCAL-Person-Part data set and achieved the best human site resolution results. Our method is very versatile. It can easily be extended to other objects or animal parts of the analytical tasks, as long as their morphological similarities can be represented by key points. Our model and source code will be published later.
3. Learning DualConvolutional Neural Networks for Low-Level Vision
Method of processing low-level vision based on double convolutional neural network
In this paper, a two-layer convolutional neural network is proposed to deal with some low-level visual problems, such as image super-resolution, edge-preserving image filtering, image de-raining, and image de-fog. These low-level visual problems usually involve the estimation of the structure and details of the target result. Inspired by this, the two-level convolutional neural network proposed in this paper contains two branches, where the two branches can end-to-end estimate the structure and details of the target result. Based on the estimated structure and detail information, the target results can be obtained separately from the imaging model of the specific problem. The two-layer convolutional neural network proposed in this paper is a general framework that can use the existing convolutional neural network to deal with related low-level visual problems. A large number of experimental results show that the proposed double-convolutional neural network can be applied to most of the low-level visual problems, and has achieved good results.
4. GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation
GeoNet: Joint depth and plane normal vector estimation through geometric neural networks
In this paper, we propose a geometric neural network that simultaneously predicts the depth and plane normal vector of a picture scene. Our model is based on two different convolutional neural networks and iteratively updates the depth information and the plane normal vector information by modeling the geometric relations, which makes the final prediction results have extremely high consistency and accuracy. We validate our proposed geometric neural network on the NYU dataset. The experimental results show that our model can accurately predict the depth and plane normal vectors with uniform geometric relations.
5. Path AggregationNetwork for Instance Segmentation
Instance segmentation through path aggregation network
In a neural network, the quality of information delivery is very important. In this paper, we propose a path aggregation neural network designed to improve the quality of information transfer in an instance-based segmentation framework. Specifically, we built a bottom-up pathway to deliver accurate location information stored in the lower neural network layer, shorten the distance between the underlying network and the higher-level network, and enhance the quality of the entire feature hierarchy. We show the adaptive feature pooling, which connects the area features with all the feature levels, so that all useful information can be passed directly to the subsequent area subnetworks. We added a complementary branch to capture the different characteristics of each region and ultimately improved the mask's prediction quality.
These improvements are very easy to implement and add less additional computational effort. These improvements helped us to take the first place in the 2017 COCO example segmentation competition and took the second place in the object detection competition. And our method has achieved the best results in the MVD and Cityscapes datasets.
6. FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors
FSRNet: Face-to-End Training Face Super Resolution Network Based on Prior Information
This article was led by Tencent's Youtu Lab and Nanjing University of Science and was selected as a Spotlight article. Face super resolution is a particular area of ​​super resolution, and its unique face prior information can be used to better super-resolution face images. This paper proposes a new end-to-end training face super-resolution network, which can improve very low resolution people without face alignment by better utilizing the geometric information such as facial feature heat map and segmentation map. The quality of the face image. Specifically, this paper first constructs a coarse-grained hyperscale network to recover a coarse-resolution high-resolution image. Next, the image is sent to a fine-grained hyper-encoder and a priori information estimation network. The fine-grained hyper-encoder extracts the image features, and the prior network estimates the feature points and segmentation information of the face. The results of the last two branches are merged into a fine-grained hyperscale decoder to reconstruct the final high-resolution image.
In order to further generate a more realistic face, this paper proposes a facial super-resolution generation confrontation network and integrates confrontation ideas into hyper-sub-networks. In addition, we introduce two related tasks, face alignment and face segmentation, as new evaluation criteria for face hyper-fractionation. These two criteria overcome the inconsistency of numerical and visual quality in traditional guidelines (such as PSNR/SSIM). A large number of experiments show that the proposed method is significantly superior to the previous hyperscale method in both numerical and visual quality when dealing with very low resolution face images.
7. Generative Adversarial Learning Towards Fast Weakly Supervised Detection
Fast weakly supervised target detection based on generating confrontation learning
The paper proposes a generational confrontation learning algorithm for fast and weakly supervised target detection. In recent years, there has been a lot of work in the field of weak supervision target detection. Without manual labeling of bounding boxes, most of the existing methods are multi-stage flows, including the candidate zone extraction phase. This makes online testing an order of magnitude slower than fast, supervised target detection (eg, SSD, YOLO, etc.). The paper is accelerated by a novel generational learning algorithm. In this process, the generator is a single-phase target detector. An agent is introduced to mine high-quality bounding boxes, and discriminators are used to determine the source of bounding boxes. The final algorithm combines structural similarity loss and confrontation loss to train the model. Experimental results show that the algorithm has achieved a significant performance improvement.
8. GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints
Image-based automatic description based on grouping with structured relevancy and differential constraints
This paper proposes an image auto-description method (GroupCap) based on the analysis of the semantic association of group images, and models the semantic correlation and differences between images. Specifically, the paper first uses the deep convolutional neural network to extract the semantic features of the image and uses the proposed visual analytic model to build the semantic association structure tree. Then it adopts triple loss and classification loss on the basis of the structural tree and semantic relationship between images ( Dependencies and differences are modeled, and relevance is then used as a constraint to guide deep-cycle neural networks to generate text. The method is novel and effective, and it solves the defect that the current automatic image description method is not accurate and has poor discriminability, and achieves higher performance on a plurality of indicators automatically described by the image.
Jiangsu Stark New Energy was founded in 2018. It is an emerging new energy manufacturer and trader. We produce and sell high-quality lithium battery cells, such as 3.2V50ah 3.2V100AH 3.2V200AH and other square aluminum-shell lithium iron phosphate batteries. , 18650 3.7V2000mah 3000mah ternary lithium battery cells, etc.
Batteries are mainly used in energy storage lithium battery packs, electric bicycles, electric golf carts, power tools, toys, etc. Our lithium iron phosphate batteries are set to cycle cycles to 6000 times, and battery voltage life in ternary reaches 2000 times ,According to the requirements of charging and discharging in different fields. The discharge rate of our battery design is 1C 2C 3C 5C to provide options
Lithium Ion Battery Cell,3.2V150Ah Lifepo4 Cell,Prismatic Lithium Iron Battery,Lithium Iron Phosphate Battery Lifepo4
Jiangsu Stark New Energy Co.,Ltd , https://www.stark-newenergy.com