LLM Automation for Clinical Simulations (Magnus)

Context

At Magnus I worked as a Machine Learning Engineer on automation tools powered by large language models (LLMs) for two primary use cases: clinical simulations and survey feedback analysis. The work was production-oriented, deployed on Azure, and focused on retrieval-augmented generation (RAG), evaluation, and alignment. The code is private, so this page summarizes my contributions at a high level.

What I Worked On

LLM automation for clinical simulations and surveys: contributed to systems that used LLMs to generate, adapt, and analyze clinical simulation scenarios and to summarize and categorize open-ended survey feedback.
Azure-hosted LLM pipelines: assisted in deploying and fine-tuning LLM pipelines on Azure, including configuration of prompt templates, safety checks, and monitoring for inference services.
Retrieval-Augmented Generation: implemented vector search using FAISS and PostgreSQL as the backbone for RAG workflows, wiring retrieval steps into LLM prompts so outputs were grounded in domain-specific data.
Embedding model evaluation: evaluated embedding providers such as OpenAI, SBERT, and Cohere to optimize semantic search quality for different domains and languages.
Alignment workflows: participated in reinforcement-learning-style workflows to align smaller LLMs using feedback generated from stronger models, improving stability and task performance.
Serving and APIs: supported the development of Flask APIs and asynchronous services responsible for inference and summarization, integrating them into existing product surfaces.
Internal evaluation tools: helped design internal tools for prompt evaluation and response quality monitoring, collaborating with senior ML engineers to close the loop between qualitative feedback and model iterations.

Skills Demonstrated

Practical experience with Retrieval-Augmented Generation (RAG) systems combining FAISS, PostgreSQL, and LLM prompts.
Cloud deployment and monitoring of LLM pipelines on Azure.
Comparative evaluation of embedding models (OpenAI, SBERT, Cohere) for semantic search and retrieval quality.
Collaboration with senior ML engineers and non-technical stakeholders to translate domain requirements into ML workflows.