Think of the modern, software-defined car as a smartphone on wheels. Just like the smartphone is still a phone, a car will still be moving people around. But just like the smartphone, it will offer much more than its original, basic functionality – and this is where the data infrastructure for automotive AI comes to play.
Cars will reach new levels of sophistication regarding infotainment and interactivity. They will integrate seamlessly with the digital life of their users and offer just as many features and possibilities for personalization than a state-of-the-art smartphone.
And what is more, they will perform advanced analytics for autonomous driving and active safety. Automotive AI will integrate data from their surroundings, from other vehicles, from traffic monitoring systems, from GPS-services and much more.
Massive amounts of data
To develop the software-defined dream car of the future you need Artificial Intelligence, and you need data. A lot of data. No, in fact it’s not enough to have a lot of data. You need massive amounts of data.
Machine Learning algorithms need to be trained on data, the more the better, to achieve higher precision and refinement in developing for instance driver assist technologies. To do their job, automotive software developers need a pool of car data from on-board sensors and cameras, and other information relevant for designing new services.
Heart and brain
For a large company in the German automotive industry, Stefan Jahncke from Data Respons subsidiary EPOS CAT GmbH is designing the heart and brain of such a computer infrastructure containing massive automotive data.
However, while it’s no secret that software is eating the automotive industry, there are a lot of secrets surrounding the work he’s doing. Exactly what data the system will be storing, and what the engineers will be doing with it, that’s a business secret. To newcomers these work conditions might seem difficult, but they’re actually rather normal to the EPOS infrastructure team.
They have installed similar data infrastructures before, and customers often have strict secrecy rules. And basically, according to Stefan, the exact nature of the data isn’t that important.
- The customer has asked us to do a proof-of-concept of an infrastructure to be used by approximately 100 developers for designing and training machine learning solutions for automotive applications, Stefan explains.
- The infrastructure will contain mostly sensor and video data, together with other data sources needed to increase the “intelligence” of the vehicle. To put this into context, you could compare a car to a human being. When you’re driving and you see a red light, you’ll react right away and stop the car. But the on-board camera of a car only registers a red light. It doesn’t know what to do with that information. You must build a system that translates the red light into computer language and makes the car react correctly to that input.
- To develop autonomous or semi-autonomous vehicles we need to teach the car what to do in various situations. For that you need massive amounts of data. But there’s much more to it than just processing data. In fact, data is only used to form a kind of pattern or frame. To get the car to put the data into context, to really “understand” it, you need neural networking and a lot of processing time. The infrastructure we’re building is a tool for developers designing software to make vehicles react as correctly as possible, and to make them as autonomous as possible.
Hardware and software
Stefan looking into hardware as well as software, with Graphics Processing Units from Nvidia forming the core of the infrastructure. Initially, GPUs were developed for the gaming industry, and have spread from there to other areas requiring ultra-fast image processing. Nvidia is widely used in the automotive industry.
- Regarding software we’re currently looking into what software to choose. We’ll be running the Kubernetes cluster management software, but apart from that we haven’t decided yet. For our proof-of-concept we’ve chosen the VMware Tanzu vSphere virtualization software to manage and optimize our Kubernetes cluster. Hopefully, in a few weeks we’ll have the proof of concept ready for the customer to test.
20 to 30 terabyte per day
Provided that the customer approves the design, Stefan and his team will start building the actual infrastructure, gradually adding more components and equipment depending on the customer’s requirements.
The new infrastructure will incorporate data from an existing data center containing several petabytes of data, primarily produced by onboard cameras and sensors of the cars connected to it. And more data will be coming in. Stefan estimates that between 20 and 30 terabytes will be added per day.
He predicts that once the new infrastructure has been tested and approved by the customer, more developers will start using the system.
When fully operational, probably up to 300 software developers will be working on it. To accommodate them, and to secure enough storage space for the ever-increasing amount of data, a team of EPOS specialists will be permanently assigned to maintain the infrastructure, including managing updates, working with security, and all the other nitty-gritty of running an infrastructure of massive automotive data – growing every day.