Robustness Analysis of Machine Learning Models using Domain-Specific Test Data Perturbation

EPIA 2023

Sep 8, 2023 Azores, Portugal

In my talk at EPIA 2023 in the Azores, I presented our research on “Robustness Analysis of Machine Learning Models using Domain-Specific Test Data Perturbation.” The study focuses on how image, audio, and text input perturbations affect the performance of various classification models.

The increasing application of machine learning (ML) systems in critical fields has highlighted the importance of testing their robustness. Robustness, in this context, refers to a model’s ability to function correctly on noisy or perturbed data. This is crucial because models often encounter data in real-world settings that deviate from the clean, high-quality training data.

Our research uniquely contributes by examining a broad spectrum of perturbation types, especially in audio and text domains, and employing a higher sampling frequency for a more granular analysis. We conducted experiments on three datasets (ImageNetV2 for image, SpeakerRecognition for audio, and AclImDB for text) and various ML models, applying different perturbations like noise, compression, pitch changes, and typos, at varying strengths.

Marian Lambert presenting

The key findings revealed a consistent relationship between larger perturbations and lower model performance, which varied depending on the model, dataset, and perturbator. For instance, in the image domain, models showed rapid decline in performance with low-strength noise and occlusion perturbations but were more robust against compression and pixelization at low strengths. The audio domain displayed minimal impact from white noise at low strengths but significant decline at higher levels, while compression had no significant impact until high perturbation strengths. In the text domain, typo and word removal perturbations linearly decreased model performance, but word switch had negligible impact, suggesting that keyword extraction might be crucial in sentiment analysis.

Our study suggests that the type of noise introduced and the model’s properties significantly affect performance. This underscores the need for comprehensive testing with various perturbators and strength levels to uncover robustness issues. Future work includes expanding experiments to more models, datasets, and perturbation types, particularly in under-researched audio and text domains.

I thoroughly enjoyed giving this talk at EPIA 2023. It was an enriching experience to share our findings and engage with the audience. Meeting and interacting with so many people passionate about AI and machine learning was particularly rewarding, and I am grateful for the opportunity to contribute to this vibrant community.

EPIA 2023 Stage