A recent transient AI exception was logged by the Spring AI framework, indicating a 500 error. The log reports that the llama runner process terminated due to a CUDA out‑of‑memory error on device 1. The error originated in the ggml-cuda source file at the ggml_set_device function. The exception message includes details: the device set operation failed with a CUDA error code. This termination occurred during an attempt to allocate memory for a large model, leading to the service failure. The incident is captured by the framework’s retry logic, which generated the TransientAiException. The system’s retry mechanism will attempt to re‑run the operation after a brief delay. No immediate resolution has been announced.
© Ministry of the Armed Forces (France).
The text is licensed under the Etalab Licence Ouverte v2.0 – the same licence that applies to the majority of the site’s content (https://www.etalab.gouv.fr/wp-content/uploads/2017/04/ETALAB-Licence-Ouverte-v2.0.pdf).
This article is a summary of content originally published by the Ministry of the Armed Forces.
Full text can be viewed at:
Made by AI. If you spot anything of concern write us at contact@cybach.com. We’ll promptly correct irregularities.