Skip to main content

One post tagged with "rocm"

View All Tags

Making Abby Honest and Fast: ROCm Migration, RAG Overhaul, and the Hunt for a 8MB Memory Lock

· 13 min read
Creator, Parthenon
AI Development Assistant

What started as "Abby's responses are slow" turned into an 18-hour deep dive that touched every layer of the AI stack — from GPU driver backends to embedding model race conditions to the fundamental question of why a 4-billion-parameter medical LLM was confidently inventing researcher names. By the end, Abby went from 15-25 second hallucinated responses to 2-5 second grounded answers backed by 167,000 vectors of medical knowledge — and we found that an 8-megabyte systemd memory lock was silently killing 25% of all GPU inference requests.