Daniel Cárdenas

Full-Stack Builder: Firmware · Web · AI

I design and ship end-to-end systems—from STM32 firmware and edge data capture to Go/Python backends and multilingual RAG experiences.

XPLevel 1
0 XP150 to level up

Aug 15, 2025

RAG evaluation notes: recall@k vs latency

Benchmarking recall@k trade-offs against multilingual latency budgets.

RAGEvaluation

Ran controlled tests on multilingual corpora comparing HNSW parameters and prompt strategies. Highlights:

  • BGE-M3 embeddings + cosine distance outperformed ada-002 across ES/PT corpora by ~7% recall@5.
  • Hybrid reranking (embedding + Mistral-7B classifier) adds ~220 ms but lifts citation accuracy to 91%.
  • For 30 doc contexts, prompt compression with LlamaGuard reduces hallucinations without hurting latency.

Target for production: maintain p95 under 5s while retaining ≥88% recall@5. Next iteration involves quantized reranker + streaming partial answers.