There are many different methods for image recognition. Google has recently released a new API to identify images called Tensorflow, which improves computer visibility everywhere. None of the suggestions that Google makes can be accomplished. So we decided to check this new API and use some YouTube videos. You can see the results below.
Identify objects using tensorflu API
Definition of API
Introducing API
Use documentation
First, the lightest model (SSD_MOBILENET) was reviewed. The basic steps were:
- Download the frozen model (PB – Protobuf) and load it to memory
- Using the internal Helper code to upload labels, categories, imagery tools, and so on.
- Starting a new meeting and running the model on the picture
The above steps are not very complicated. API documentation is a good guide for the main steps.
Run on video
Below we reviewed the performance of this API on some videos. We used the Moviepy Library. The main steps were as follows:
- Use the VideoFileclip function to extract images from the video
We used the FL_IMage function to identify all the images extracted from the video. This is an interesting function that can capture an image and replace the modified image.
Finally, all modified images were combined in a new video.
It took about a minute to run this code on a 2 to 5 second clip. But since we used a freezer model that was loaded in memory, we could also run it on a computer that does not have GPU.
The result was very amazing. Only a little coding can be drawn rectangles with great accuracy on a large number of conventional objects. The birds were not generally identified in the video below.
Future steps
Various ideas for further examining this API can be put forward:
- Examination of more accurate and overflow models to see the performance difference
- Find ways to speed up API and facilitate its use to identify objects immediately on mobile devices
- Google has also provided the usability of these models for learning transfer. That is, the frozen models can be loaded and another output layer with different visual categories.