Engineering Manager, Model Inference

Company: Abridge
Location: San Francisco
Posted on: April 2, 2026

Job Description:

About Abridge Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems. We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense. We have offices located in the Mission District in San Francisco, the SoHo neighborhood of New York, and East Liberty in Pittsburgh. The Role Our generative AI-powered products are transforming the practice of medicine—and the inference systems that power them need to be fast, reliable, and world-class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The Inference team owns the end-to-end technical direction of how our models are served: from architecting low-latency, high-throughput infrastructure to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers, partner closely with ML Research and the broader AI Platform, and ensure the systems underpinning every clinician interaction are operating at peak efficiency and reliability. What You’ll Do Lead and grow a high-performing team of AI inference engineers focused on building and scaling infrastructure for Abridge’s products and APIs Own the technical direction of our inference systems—making key decisions around batching, throughput, latency, and GPU utilization Architect and scale inference infrastructure for reliability, efficiency, and observability; lead incident response Benchmark and eliminate bottlenecks throughout the inference stack Partner with ML Research teams on model optimization, quantization, and deployment Develop APIs for AI inference used by both internal teams and external customers Recruit, mentor, and develop engineering talent; establish team processes, engineering standards, and operational excellence Work closely with the GenAI Platform, Data, and Product teams to plan and execute projects that directly impact clinicians and patients What You’ll Bring 5 years of engineering experience with 1 years in a technical leadership or management role Deep, hands-on experience with ML systems and inference frameworks (e.g., PyTorch, TensorRT, vLLM, TensorFlow) Strong understanding of LLM architecture (eg. Multi-Head Attention, Multi/Grouped-Query Attention, and common transformer components) Experience with inference optimizations (eg. batching, quantization, kernel fusion, FlashAttention) Familiarity with GPU characteristics, roofline models, and performance analysis Experience deploying reliable, distributed, real-time systems at scale Experience with parallelism strategies: tensor parallelism, pipeline parallelism, expert parallelism Skilled at hiring and mentorship, with a demonstrated track record of helping engineers grow their skills and careers Strong technical communication and cross-functional collaboration skills Comfortable giving constructive feedback on technical designs and code reviews Has thrived in a fast-growing startup and knows how to operate with urgency and focus Added Bonus Background in training infrastructure and RL workloads Skilled in building secure, compliant systems on major cloud platforms (GCP preferred, AWS experience welcome) Experience with Kubernetes and container orchestration at scale Published work or contributions to inference optimization research Why Work at Abridge? At Abridge, we’re transforming healthcare delivery experiences with generative AI, enabling clinicians and patients to connect in deeper, more meaningful ways. Our mission is clear: to power deeper understanding in healthcare. We’re driving real, lasting change, with millions of medical conversations processed each month. Joining Abridge means stepping into a fast-paced, high-growth startup where your contributions truly make a difference. Our culture requires extreme ownership—every employee has the ability to (and is expected to) make an impact on our customers and our business. Beyond individual impact, you will have the opportunity to work alongside a team of curious, high-achieving people in a supportive environment where success is shared, growth is constant, and feedback fuels progress. At Abridge, it’s not just what we do—it’s how we do it. Every decision is rooted in empathy, always prioritizing the needs of clinicians and patients. We’re committed to supporting your growth, both professionally and personally. Whether it's flexible work hours, an inclusive culture, or ongoing learning opportunities, we are here to help you thrive and do the best work of your life. If you are ready to make a meaningful impact alongside passionate people who care deeply about what they do, Abridge is the place for you. How we take care of Abridgers: Generous Time Off : 14 paid holidays, flexible PTO for salaried employees, and accrued time off for hourly employees Comprehensive Health Plans : Medical, Dental, and Vision coverage for all full-time employees and their families. Generous HSA Contribution : If you choose a High Deductible Health Plan, Abridge makes monthly contributions to your HSA. Paid Parental Leave : Generous paid parental leave for all full-time employees. Family Forming Benefits: Resources and financial support to help you build your family. 401(k) Matching : Contribution matching to help invest in your future. Personal Device Allowance : Tax free funds for personal device usage. Pre-tax Benefits: Access to Flexible Spending Accounts (FSA) and Commuter Benefits. Lifestyle Wallet : Monthly contributions for fitness, professional development, coworking, and more. Mental Health Support : Dedicated access to therapy and coaching to help you reach your goals. Sabbatical Leave : Paid Sabbatical Leave after 5 years of employment. Compensation and Equity : Competitive compensation and equity grants for full time employees. and much more! Equal Opportunity Employer Abridge is an equal opportunity employer and considers all qualified applicants equally without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, or disability. Staying safe - Protect yourself from recruitment fraud We are aware of individuals and entities fraudulently representing themselves as Abridge recruiters and/or hiring managers. Abridge will never ask for financial information or payment, or for personal information such as bank account number or social security number during the job application or interview process. Any emails from the Abridge recruiting team will come from an @ abridge.com email address. You can learn more about how to protect yourself from these types of fraud by referring to this article . Please exercise caution and cease communications if something feels suspicious about your interactions.

Keywords: Abridge, Davis , Engineering Manager, Model Inference, IT / Software / Systems , San Francisco, California

Didn't find what you're looking for? Search again!

Let San Francisco recruiters find you. Post your resume for free!

Get San Francisco IT / Software / Systems jobs via email.

View more Davis IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Senior Applied AI/ML Scientist - Search
Description: About Faire Faire is an online wholesale marketplace built on the belief that the future is local independent retailers around the globe are doing more revenue than Walmart and Amazon combined, but (more...)
Company: Faire
Location: San Francisco
Posted on: 04/3/2026

Senior Front-End Engineer
Description: Healthcare is broken at the first step: patients can't find the right care, understand what it costs, or access the medications they need. Mochi Health is fixing this. We're building an AI-driven marketplace (more...)
Company: Mochi Health
Location: San Francisco
Posted on: 04/3/2026

SDE - Robotics and Physics Simulation, Frontier AI Robotics
Description: In this role, you will act as the primary specialist for physics engine internals and dynamics, developing high-fidelity, vectorized simulation environments for robotics locomotion, navigation, and interaction/manipulation. (more...)
Company: Amazon
Location: San Francisco
Posted on: 04/3/2026

Salary in Davis, California Area | More details for Davis, California Jobs |Salary

Senior Product Marketing Manager
Description: About Arcade Our mission is to empower teams to become great storytellers. Our vision is to build dynamic visual experiences. More than 22,000 teams use Arcade to tell more engaging product stories and (more...)
Company: Arcade
Location: San Francisco
Posted on: 04/3/2026

Sales Onboarding Program Manager
Description: Meter is growing quickly, with continued investment in hiring and expansion into new regions. As the company scales, Sales Enablement is focused on building repeatable, well-run onboarding and enablement (more...)
Company: Meter
Location: San Francisco
Posted on: 04/3/2026

Sr. Frontend Engineer
Description: We're assisting a well-funded startup in the AI organizational space with their search for a sr. frontend engineer. Their product
Company: DRH Search
Location: San Francisco
Posted on: 04/3/2026

Recruiting Operations Specialist
Description: About us At Sierra, we re creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco, with growing offices (more...)
Company: Sierra
Location: San Francisco
Posted on: 04/3/2026

Machine Learning, Platform Engineer
Description: About the Role Our team focuses on enabling custom models and dedicated inference on Together. We are responsible for building a container platform, optimizing autoscaling, minimizing cold starts, achieving (more...)
Company: Together AI
Location: San Francisco
Posted on: 04/3/2026

Applied AI / ML Scientist - Search Ads
Description: About Faire Faire is an online wholesale marketplace built on the belief that the future is local independent retailers around the globe are doing more revenue than Walmart and Amazon combined, but (more...)
Company: Faire
Location: San Francisco
Posted on: 04/3/2026

Software Engineer - Storage & Observability (Early Career)
Description: About the Role Together AI is building the AI Acceleration Cloud , an end-to-end platform for the full generative AI lifecycle. Our AI Infrastructure team is at the forefront of scaling the foundational (more...)
Company: Together AI
Location: San Francisco
Posted on: 04/3/2026

Loading more jobs...

Engineering Manager, Model Inference

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account