LLM Automation for Clinical Simulations (Magnus)
LLM-powered automation tools for clinical simulations and survey feedback, with Azure-hosted RAG and alignment workflows.
Confidential Context
This case study is sanitized. Client data and proprietary integrations are omitted. Work performed at Magnus; content is anonymized and code is not public.
Outcomes
- Azure-hosted LLM pipelines for clinical simulations and survey feedback analysis
- FAISS + PostgreSQL vector search backends enabling RAG-style workflows
- Internal tools for prompt evaluation and response quality monitoring
Context
At Magnus I worked as a Machine Learning Engineer on automation tools powered by large language models (LLMs) for two primary use cases: clinical simulations and survey feedback analysis. The work was production-oriented, deployed on Azure, and focused on retrieval-augmented generation (RAG), evaluation, and alignment. The code is private, so this page summarizes my contributions at a high level.
What I Worked On
- LLM automation for clinical simulations and surveys: contributed to systems that used LLMs to generate, adapt, and analyze clinical simulation scenarios and to summarize and categorize open-ended survey feedback.
- Azure-hosted LLM pipelines: assisted in deploying and fine-tuning LLM pipelines on Azure, including configuration of prompt templates, safety checks, and monitoring for inference services.
- Retrieval-Augmented Generation: implemented vector search using FAISS and PostgreSQL as the backbone for RAG workflows, wiring retrieval steps into LLM prompts so outputs were grounded in domain-specific data.
- Embedding model evaluation: evaluated embedding providers such as OpenAI, SBERT, and Cohere to optimize semantic search quality for different domains and languages.
- Alignment workflows: participated in reinforcement-learning-style workflows to align smaller LLMs using feedback generated from stronger models, improving stability and task performance.
- Serving and APIs: supported the development of Flask APIs and asynchronous services responsible for inference and summarization, integrating them into existing product surfaces.
- Internal evaluation tools: helped design internal tools for prompt evaluation and response quality monitoring, collaborating with senior ML engineers to close the loop between qualitative feedback and model iterations.
Skills Demonstrated
- Practical experience with Retrieval-Augmented Generation (RAG) systems combining FAISS, PostgreSQL, and LLM prompts.
- Cloud deployment and monitoring of LLM pipelines on Azure.
- Comparative evaluation of embedding models (OpenAI, SBERT, Cohere) for semantic search and retrieval quality.
- Collaboration with senior ML engineers and non-technical stakeholders to translate domain requirements into ML workflows.