Earlier this week at its GPU Technology Conference, Nvidia announced a new video analytics platform, Metropolis, that promises to make cities safer and smarter and should eventually bring game-changing capabilities to other industries.
The heart of Metropolis is deep learning enabled by Nvidia’s range of GPUs that provide the necessary horsepower for artificial intelligence to be performed on every video stream.
+ Also on Network World:Smart city tech growing in the U.S. +
The GPU Technology Conference is the right place to show off advancements in something like video analytics, as it has become the flagship event to showcase how GPUs can literally change the world by enabling AI to do some things smarter and faster than people.
This topic has been theorized for a number of years, but it’s quickly becoming a reality for the following reasons:
- Pervasiveness of video cameras creating massive amounts of data to be analyzed. According to Nvidia, there are currently about 500 million security cameras worldwide, growing to 1 billion by 2020. Given how fast and easy these are to deploy, I could see the 2020 number easily exceeding 1 billion in that time frame.
- Growth of the cloud as the dominant compute model. Data that is stored on premises has value, but it’s typically limited to that location. Moving it to the cloud makes it easier to access from more places and enables the video content to be shared with whoever may need it.
- Maturity of GPUs. These chips have come a long way in a very short period of time. What was once a niche technology used primarily to improve the experience for gamers has exploded and become the de facto standard for tasks such as AI and machine learning.
Another important factor is that Nvidia has grown into a company that is thinking “architecturally” about how its technology can be used to enable some of these advanced use cases, such as an “AI City.” With video analytics, it’s critical to understand that processing of the video streams needs to be done differently for the wide variety of use cases, and this requires a “data center to edge” architecture.
Making video analytics a reality
The first step in video analytics is to train the AI. This requires the AI to go through thousands of hours of video to “learn” what different objects, such as people, cars, license plates and trucks are. Initially, a human must help the AI by tagging different objects, but very quickly the AI can become self-learning and educate itself at a rate that’s orders of magnitude faster than what people can do. The learning process needs to take place in a data center that is enabled by the Nvidia DGX platform or on Nvidia GPUs in the cloud. Cloud service providers such as Amazon Web Services, Microsoft Azure and IBM Bluemix all offer GPUs as a service for deep learning training.
Once the AI understands what objects are what, it can scan through the massive amounts of video data to quickly and accurately infer things. Historically, organizations did not trust video analytics because the process was slow and error prone. A person can look through so much data for only so long before fatigue sets in, causing them to easily miss an individual, license plate or any other object, particularly in low light or bad weather. In the movie Terminator, one of the most famous AIs uttered the phrase, “I see everything,” and that is indeed true. The AI sees all and can accurately identify all.
Powering the inference phase is a wide range of different Nvidia GPUs, including its Jetson and Tesla models, depending on the use case. The location of the analysis is specific to the task and the processing needed, hence the different flavors of GPUs.
For example, at the entrance to a parking garage or on a wearable camera for law enforcement purposes, the analytics coming from that single camera should be done on the camera itself. However, at an airport or in a building lobby where tens to hundreds of camera feeds need to be analyzed, the inference could be done on a localized server. However, for city-wide resource optimization, where thousands of video feeds are searched and analyzed, the cloud is the right platform, and Nvidia has a product for each—hence the “cloud to edge” strategy.
Business uses for video analytics
At the GPU Technology Conference, Nvidia showed off video analytics in the context of building a safer, smarter city, but the fact is most industries can benefit from the technology. For example, a retailer could use the solution to quickly determine the demographics of its customer base. Each store could customize inventory based on how many males or females entered and if there was any variance based on day of the week or time of the year.
Building lobbies could use facial recognition to scan people and direct them to the right elevator to get them to their meetings faster instead of having to stop at a desk with human who then needs to take a picture and access a separate system to correlate the information. If you want to understand how much of a problem this is, go to the lobby of any Midtown Manhattan office building at 8 a.m. any day of the week.
From license plate identification to emotion detection to object recognition, there are literally thousands of use cases for video analytics that are now made possible by the deep learning algorithms that can be performed on GPUs. GTC gave us a glimpse of what an “AI City” would look like, but soon we will be seeing the solutions used in many other verticals.