Car Navigation is a generic name for an application I was developing in 2012. This is an Android project which sought to discover the potential of car navigation with mobile devices and their built-in cameras. A whitepaper about the project can be found here [HTML]|[PDF] . Many videos with contextual blog posts are listed at the bottom, and a short introduction follows for those wishing too extend the application or use it. Car Navigation is not market-ready though, so be warned. It is not features-rich and it lacks accuracy because the classifier was trained on a relatively small-sized annotated set.
Introduction
For this tracking-based computer vision project to be versatile, I have added some car templates which act as variable program input (the program accesses the files using a WWW pipe, connecting to this Web site, schestowitz.com , or others) and the program now latches onto features using local binary patterns rather than the sliding window, template-based approach, which slowed things down a great deal. Frame rate is steady at around 5 FPS, for now, depending on the hardware at hand. I am training a classifier based on Local Binary Patterns. I leave Haar methods aside for the time being, simply because they're slower and older (particularly the methods and their implementation in OpenCV).
I tried finding out if there are any large database of car images in our lab; in particular, any database containing images of the back of cars, with a rectangular selection around all the cars in the images. I should be able to detect those at a rate of about 5 FPS on my Archos hardware, depending on the amount of preprocessing and postprocessing (e.g. comparing sizes between frames). The training basically works, but the sample set for positives and the background (negatives, i.e. non-matches) is meager. I needed to turn to other labs.
Training and Targets
We are using a Caltech-produced database of the rear of cars (there are more options here). Training is done on those images and testing is done on unseen videos or images similar to the training set. I've begun showing on my Android 4.0 tablet (would be same for Android phone) some cars which were not included in the training set and it's picking them up correctly most of the time, despite training on just 15 positives and 1000 negatives (commonly called "background"). The pace of processing is 1.5-3 FPS when it's done improperly. This can be sped up easily. To get more positives I will need to do manual work or just write a good script (which would not be so trivial as it requires a scanner/parser to pick up and collate pertinent tokens, then do some maths because formats vary, PASCAL being different from OpenCV).
Visual and Vocal Cues
Sounds are cool but we do not care for those, unless we are debugging the program. Both are incorporated into the problem. They help me when I point the camera at the screen. The camera is positioned at the front of the tablet, so when I give it some input sequence I cannot quite see -- in real-time at least -- when an obstruction is detected (this is shown on the screen which faces the monitor). The sounds vary depending on the number of detected obstructions. At the very least it helps debugging when screencasting is disabled (see the videos for better understanding of this process).
Performance
We are sure we can do better than 5 FPS. but while developing and debugging the problem we do not worry about performance too much, at least not yet. For standard video sequences on this ARM-based device I can only get about 8 FPS (idle). Maybe it's a hardware limitation, related perhaps to throughput or lack of hardware acceleration (some x86 chipsets have MPEG acceleration on-board). The Tagra tablets I saw at the shop (they're all expensive) had very smooth video. It was HD also.
So basically, the program at present is not much slower than the actual capture frame on this particular device. If needed we can do off-line processing, get the needed functionality and accelerate right afterwards. If I remove some cruft in the code (debugging stuff for example, of which there is plenty), it will speed up threefold. In the videos one can see I use up a lot of CPU capacity and bandwidth just running a Java-based desktop app that captures in real-time (non-lossy) what one can see on the tablet. If I disconnect this development service, then the frame rate shoots up. So it's not nearly as bad as the video seems to show -- that's why I tried to just shoot it with a webcam at first.
Implemented Features
To say a word about what we are running, a mostly green square is highlighting the current area -- if any -- which is detected as looking like the back of a car. There may be multiple such objects. The baseline object from which to calculate distances to nearby rears of cars is marked in green hues. But a large lot of the videos so far used a classifier trained on just a few positives; we need to improve that in order to account for more types of cars, distances (scale), etc. We did not test it heavily at the time of writing, but it is work in progress. We have trained about 30 different cascade classifiers and I am trying to see how sensitive to make them, what objects to include as positives, etc.
Rather than taking lots of movies with a phone and running on those, we used a few static images and several from YouTube (dashboard-mounted cameras and urban cruising). We trigger for collision, e.g. when the size of the object in front of the car seems too great. The size of the red blobs grows or shrinks depending on the distance measured between bounding boxes. A sound is used if that distance is beneath a particular (predefined) threshold. We do not yet try to detect cars in the opposite lane as well. Since they're the front of cars I am hoping they don't get falsely detected as the rear of cars (on which I train the classifier, exclusively). Pedestrians are not recognised either. There is great variation there depending on the angle, scale, how one moves, etc. I have not trained for detection of pedestrians, but we could use multiple classifiers in turn, e.g. one for rear of cars, front of cars, pedestrians, and so on.
Future Extensions
An application as such could, in fact, just simply strive to map the scene, perhaps even doing a mashup of satellite data, Google StreetView, and some of the arbitrary objects that happen to be temporarily at the scene. At the time of writing we did not have a clear plan and a time line. I could attempt to add more classifiers within weeks of additional work. Depending on what our main objective is (e.g. collision detection, mapping, automatic congestion statistics generation) one can take the next steps along those logical lines.
Since I had laid the foundations for development in place (SDK and jailbroken tablet) I have made decent progress towards a system of real utility and value. A successor program would just need a little more time to make it mature, and in part this depends on how big a training set one can pass to it (some further manual work is required for that). I have only had about 50 hours planning, thinking, organising, then coding for this project. If possible, I'd like to have had more time, but duty calls. I hope someone else uses the code, so I made it Free/open source software.
Latest Steps
I have created about 30 different classifiers so far, then tested them. I took a dozen videos for comparative analysis. The aim has been to find the right sensitivity levels (minimum hit rate now at 0.993/1 and maximum false alarm rate at 0.6/1), sample sizes (24x24 pixels would work well), and sizes for the sets (requires a lot of manual work to expand). This is an essential stage because without good detection of moving obstructions (mostly cars) the alerting would be spurious and therefore counter-productive.
One of the difficulties one encounters is that of multi-scale, meaning, when objects get closer and drift further away their level of granularity varies, so a stack of classifiers becomes needed. Otherwise, the classifier is insensitive to objects as they shrink or grow relative to the viewpoint (shape and aspect ratio remain unchanged). While there's a level of flexibility, scale-wise, for each classifier, it is not enough to account for distant objects and nearby objects in one fell swoop. One workaround is to train multiple classifiers and another is to make multiple passes at different scales (with the same classifier), which is what's currently done, highlighting obstructions differently depending on distance and marking visually the distance, changing colours depending on whether the object gets closer (to collision) or moves further away. The framerate attained is still above 5 FPS, which is actually very good for this hardware (idle, standalone video capture is only at about 10 FPS on this inexpensive MIPS/RISC board). For further information and some example see the pages below and consider getting the code. The program requires the Android SDK, NDK, typically Eclipse, and of course OpenCV (I used version 2.4.x).
Related Blog Posts and Videos
Source Code
|