Summary: | Traditional vision-based traffic monitoring struggles with cost and low-light conditions. This research explores audio-based vehicle classification using deep learning to address these limitations. This exploration investigates the impact of different spectrogram types namely Short-time Fourier Transform (STFT), Mel, Gammatonegram, Constant-Q, Wavelet and their combinations on classification accuracy using a Convolutional Neural Network based on AlexNet on the IDMT-traffic dataset. Results showed that Individual spectrogram performance varied by vehicle class, with STFT excelling for cars and no-traffic, and GAM for motorcycles and trucks. Furthermore, combining spectrograms yielded some slight improvements (up to 3%), but also occasional drops, likely due to feature loss during fusion. Our study highlights the importance of choosing the right spectrogram type based on the target vehicle class. While simple combinations showed limited improvements, exploring more sophisticated fusion techniques holds promise for further enhancing audio-based vehicle classification accuracy. © 2024 IEEE.
|