Node.js method for identifying objects in images using a trained neural network model

Editor's note: Vincent MÃ¼hler, maintainer of opencv4nodejs and face-recognition.js, describes how to use a trained neural network model to identify objects in an image in a Node.js environment.

Node.js recognizes objects in the image using a trained neural network model.

Today we will look at the OpenCV deep neural network module of Node.js.

If you want to release the magic of neural networks to recognize and classify objects in an image but have no idea how deep learning works (like me) and don't know how to create and train neural networks, then this article is for you. !

So what will we create today?

In this tutorial, we will learn how to load pre-trained models from Tensorflow and Caffe through OpenCV's DNN module, and then we will delve into two examples of object recognition based on Node.js and OpenCV.

First we will use Tensorflow's Inception model to identify the objects in the image, and then we will use the COCO SSD model to detect and identify multiple different objects in the same image.

You can find the sample code on my github repository: justadudewhohacks/opencv4nodejs

Tensorflow Inception

The trained Tensorflow Inception model can discern about 1000 classified objects. If you pass the image to the network, it will give the likelihood of each category of the object in the image.

To use the Inception model with OpenCV, we need to load the binary file tensorflow_inception_graph.pb and the category name imagenet_comp_graph_label_strings.txt. You can download inception5h.zip and unzip it to get these files (download link below):

// The path to the path for you to unzip the inception model

Const inceptionModelPath = '../data/dnn/tf-inception'

Const modelFile = path.resolve(inceptionModelPath, 'tensorflow_inception_graph.pb');

Const classNamesFile = path.resolve(inceptionModelPath, 'imagenet_comp_graph_label_strings.txt');

If (!fs.existsSync(modelFile) || !fs.existsSync(classNamesFile)) {

Console.log('Exit: inception model not found');

Console.log ('Download model from https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip');

Return;

}

// read classNames and store them in an array

Const classNames = fs.readFileSync(classNamesFile).toString().split("");

// Initialize tensorflow inception module from modelFile

Const net = cv.readNetFromTensorflow(modelFile);

Classified items in the image

In order to classify the items in the image, we will write the following helper functions:

Const classifyImg = (img) => {

// The inception model uses 224 x 224 images,

// So we adjust the size of the input image,

// and use white pixels to fill in the image

Const maxImgDim = 224;

Const white = new cv.Vec(255, 255, 255);

Const imgResized = img.resizeToMax(maxImgDim).padToSquare(white);

// The network accepts blobs as input

Const inputBlob = cv.blobFromImage(imgResized);

net.setInput(inputBlob);

// Forward input to the entire network,

// will return a 1xN matrix classification result that contains the confidence for each category

Const outputBlob = net.forward();

// Find all tags that are greater than the minimum confidence

Const minConfidence = 0.05;

Const locations =

outputBlob

.threshold(minConfidence, 1, cv.THRESH_BINARY)

.convertTo(cv.CV_8U)

.findNonZero();

Const result =

Locations.map(pt => ({

Confidence: parseInt(outputBlob.at(0, pt.x) * 100) / 100,

className: classNames[pt.x]

}))

// Sort by confidence

.sort((r0, r1) => r1.confidence - r0.confidence)

.map(res => `${res.className} (${res.confidence})`);

Return result;

}

This function does these things:

Prepare input image

The Tensorflow Inception network accepts 224x224 input images. So we resize the image so that its maximum size is 224, then fill it with white pixels.

Let the image go through the network

We can create a blob directly from the image, then call net.forward() to propagate the input forward and then get the output blob.

Extract the result from the output blob

For commonality, the expression of the output blob is directly a matrix (cv.Mat), and its dimensions depend on the model. This is simple under Inception. A blob is simply a 1xN matrix (where N is equal to the number of categories) and describes the probability distribution for all classes. Each entry is a floating-point number that represents the confidence level of the corresponding classification. The sum of all entries is 1.0 (100%).

We want to take a closer look at the classification of the most likely image, so we look at all confidence levels greater than minConfidence (5% in this case). Finally, we sort the results based on confidence and return the className and confidence pairs.

test

Now we will read some sample data that we want to identify on the network:

Const testData = [

{

Image: '../data/banana.jpg',

Label: 'banana'

{

Image: '../data/husky.jpg',

Label: 'husky'

{

Image: '../data/car.jpeg',

Label: 'car'

{

Image: '../data/lenna.png',

Label: 'lenna'

}

];

testData.forEach((data) => {

Const img = cv.imread(data.image);

Console.log('%s: ', data.label);

Const predictions = classifyImg(img);

predictions.forEach(p => console.log(p));

Console.log();

cv.imshowWait('img', img);

});

The output is: (You can refer to the picture at the beginning of this article)

Banana:

Banana (0.95)

Husky:

Siberian husky (0.78)

Eskimo dog (0.21)

Car:

Sports car (0.57)

Racer (0.12)

Lenna:

Sombrero (0.34)

Cowboy hat (0.3)

very funny. We got a very accurate description of the image of the Iggy and the bananas. For a car image, the specific category of the car is not accurate, but the model does recognize the car in the image. Of course, the network cannot be trained on an infinite classification, so it does not return a â€œwomenâ€ description for the last image. However, it does recognize the hat.

COCO SSD

Well, the model performs well. But how do we deal with images that contain multiple objects? To identify multiple objects in a single image, we will use the Single Shot Multibox Detector (SSD). In our second example, we will look at an SSD model trained on a COCO (Common Object in Context) data set. The model we used was trained on 84 different categories.

This model comes from Caffe, so we will load the binary VGG_coco_SSD_300x300_iter_400000.caffemodel and the protoxt file deploy.prototxt:

// Replace the path for you to unzip the coco-SSD model

Const ssdcocoModelPath = '../data/dnn/coco-SSD_300x300'

Const prototxt = path.resolve(ssdcocoModelPath, 'deploy.prototxt');

Const modelFile = path.resolve(ssdcocoModelPath, 'VGG_coco_SSD_300x300_iter_400000.caffemodel');

If (!fs.existsSync(prototxt) || !fs.existsSync(modelFile)) {

Console.log('Exit: cannot find ssdcoco model');

Console.log('Download model https://drive.google.com/file/d/0BzKzrI_SkD1_dUY1Ml9GRTFpUWc/view' from the following URL);

Return;

}

// Initialize the ssdcoco model from prototxt and modelFile

Const net = cv.readNetFromCaffe(prototxt, modelFile);

Classification based on COCO

Our classification function is almost the same as the Inception-based classification function, but this time the input will be a 300x300 image and the output will be a 1x1xNx7 matrix.

Const classifyImg = (img) => {

Const white = new cv.Vec(255, 255, 255);

// ssdcoco model accepts 300 x 300 images

Const imgResized = img.resize(300, 300);

// The network accepts blobs as input

Const inputBlob = cv.blobFromImage(imgResized);

net.setInput(inputBlob);

// Forward input to the entire network,

// will return 1x1xNxM matrix as the classification result

Let outputBlob = net.forward();

// Extract NxM matrix

outputBlob = outputBlob.flattenFloat(outputBlob.sizes[2], outputBlob.sizes[3]);

Const results = Array(outputBlob.rows).fill(0)

.map((res, i) => {

Const className = classNames[outputBlob.at(i, 1)];

Const confidence = outputBlob.at(i, 2);

Const topLeft = new cv.Point(

outputBlob.at(i, 3) * img.cols,

outputBlob.at(i, 6) * img.rows

);

Const bottomRight = new cv.Point(

outputBlob.at(i, 5) * img.cols,

outputBlob.at(i, 4) * img.rows

);

Return ({

className,

Confidence,

topLeft,

bottomRight

})

});

Return results;

};

I'm not quite sure why the output is a 1x1xNx7 matrix, but we actually only care about the Nx7 part. We can use the flattenFloat tool function to map the third and fourth dimensions to the 2D matrix. Compared to the Inception output matrix, this time N does not correspond to each category but each detected object. In addition, there are 7 entries for each object.

Why are 7 items?

Remember, the problems we encountered here are a bit different from before. We want to detect multiple objects in a single image, so we can't just give the confidence of each classification. What we actually want is a rectangle that indicates the position of each object in the graph. The seven entries are:

I actually have no idea

Classification labels for objects

Classification confidence

Left side of the rectangle x

The bottom of the rectangle y

Right side of the rectangle x

y at the top of the rectangle

The output matrix gives us a lot of information about the results, which looks pretty neat. We can also filter the results again based on confidence and draw a border around each recognized object in the image.

Look at its effect!

For the sake of brevity, I will skip drawing rectangular code and other visual code. If you want to know exactly how to do it, you can visit the github repository mentioned earlier.

Let's pass in a car image to the network, then filter the results to see if the car classification is detected:

Node.js recognizes objects in the image using a trained neural network model.

Great! Let's raise the difficulty below. Let's try...a breakfast table?

Node.js recognizes objects in the image using a trained neural network model.

not bad!

ZGAR Disposable Vape 25

ZGAR Disposable Vape 25

ZGAR electronic cigarette uses high-tech R&D, food grade disposable pod device and high-quality raw material. All package designs are Original IP. Our designer team is from Hong Kong. We have very high requirements for product quality, flavors taste and packaging design. The E-liquid is imported, materials are food grade, and assembly plant is medical-grade dust-free workshops.

Our products include disposable e-cigarettes, rechargeable e-cigarettes, rechargreable disposable vape pen, and various of flavors of cigarette cartridges. From 600puffs to 5000puffs, ZGAR bar Disposable offer high-tech R&D, E-cigarette improves battery capacity, We offer various of flavors and support customization. And printing designs can be customized. We have our own professional team and competitive quotations for any OEM or ODM works.

We supply OEM rechargeable disposable vape pen,OEM disposable electronic cigarette,ODM disposable vape pen,ODM disposable electronic cigarette,OEM/ODM vape pen e-cigarette,OEM/ODM atomizer device.

Disposable Vape, bar 3000puffs, ZGAR bar disposable, Disposable E-cigarette, OEM/ODM disposable vape pen atomizer Device E-cig, ZGAR 25 Vape

ZGAR INTERNATIONAL(HK)CO., LIMITED , https://www.oemvape-pen.com