Advancements in Machine Learning for Static Application Security Testing (ML-SAST) in Germany

A comprehensive study on the application of machine learning techniques for static application security testing (ML-SAST) has been concluded, revealing insights into the potential and challenges of this emerging field. The study was accompanied by the development of a prototype under a free and open-source license, providing a practical foundation for future research.

The study encompassed various sources of information, including interviews with software security experts, user opinions through an online questionnaire, and a systematic mapping study guided by established methodologies. This multi-faceted approach led to the identification of key requirements and existing research gaps in the ML-SAST domain.

One of the primary findings of the study was the scarcity of suitable training data for supervised ML-SAST approaches. Synthetic datasets, such as the Juliet test suite, lacked realism, while real-world datasets often suffered from skewed ground truth and contextual limitations. To mitigate this challenge, the prototype introduced a feedback loop, allowing users to refine models based on real-world findings, thus enhancing model effectiveness.

The prototype itself utilized supervised and clustering methods to differentiate between defective and benign code. It employed a novel technique to map program code into a metric space, addressing challenges in embedding the interprocedural control flow graph. The prototype’s adaptable nature and incorporation of user feedback made it a valuable contribution to the ML-SAST landscape.

Future research opportunities were highlighted in several areas. The scarcity of free and open-source tools for static analysis, particularly for C and C++ programming languages, necessitates further development and improvement of existing tools. Graph embedding methods for program graph representations also require attention, especially for more complex graphs.

The study underscored the relevance of ML-SAST, as evidenced by expert interviews and literature exploration. However, unanswered questions remain, particularly concerning the limitations and suitable application scenarios for ML-based approaches.

In conclusion, the study illuminated the potential of ML-SAST while highlighting areas for further investigation. The prototype, alongside its findings, provides a stepping stone for researchers and practitioners to delve deeper into the intricacies of machine learning in the context of application security testing.

As the landscape of ML-SAST evolves, continued collaboration, the creation of realistic datasets, and the development of tools will be pivotal to enhance the capabilities of machine learning in identifying and mitigating security vulnerabilities in software systems.