Engineering Manager, Model Inference
Company: Abridge
Location: San Francisco
Posted on: April 2, 2026
|
|
|
Job Description:
About Abridge Abridge was founded in 2018 with the mission of
powering deeper understanding in healthcare. Our AI-powered
platform was purpose-built for medical conversations, improving
clinical documentation efficiencies while enabling clinicians to
focus on what matters most—their patients. Our enterprise-grade
technology transforms patient-clinician conversations into
structured clinical notes in real-time, with deep EMR integrations.
Powered by Linked Evidence and our purpose-built, auditable AI, we
are the only company that maps AI-generated summaries to ground
truth, helping providers quickly trust and verify the output. As
pioneers in generative AI for healthcare, we are setting the
industry standards for the responsible deployment of AI across
health systems. We are a growing team of practicing MDs, AI
scientists, PhDs, creatives, technologists, and engineers working
together to empower people and make care make more sense. We have
offices located in the Mission District in San Francisco, the SoHo
neighborhood of New York, and East Liberty in Pittsburgh. The Role
Our generative AI-powered products are transforming the practice of
medicine—and the inference systems that power them need to be fast,
reliable, and world-class. We’re looking for an Engineering Manager
to lead and grow our Model Inference team. The Inference team owns
the end-to-end technical direction of how our models are served:
from architecting low-latency, high-throughput infrastructure to
pushing the frontier of LLM serving techniques. You’ll lead a
high-performing team of AI inference engineers, partner closely
with ML Research and the broader AI Platform, and ensure the
systems underpinning every clinician interaction are operating at
peak efficiency and reliability. What You’ll Do Lead and grow a
high-performing team of AI inference engineers focused on building
and scaling infrastructure for Abridge’s products and APIs Own the
technical direction of our inference systems—making key decisions
around batching, throughput, latency, and GPU utilization Architect
and scale inference infrastructure for reliability, efficiency, and
observability; lead incident response Benchmark and eliminate
bottlenecks throughout the inference stack Partner with ML Research
teams on model optimization, quantization, and deployment Develop
APIs for AI inference used by both internal teams and external
customers Recruit, mentor, and develop engineering talent;
establish team processes, engineering standards, and operational
excellence Work closely with the GenAI Platform, Data, and Product
teams to plan and execute projects that directly impact clinicians
and patients What You’ll Bring 5 years of engineering experience
with 1 years in a technical leadership or management role Deep,
hands-on experience with ML systems and inference frameworks (e.g.,
PyTorch, TensorRT, vLLM, TensorFlow) Strong understanding of LLM
architecture (eg. Multi-Head Attention, Multi/Grouped-Query
Attention, and common transformer components) Experience with
inference optimizations (eg. batching, quantization, kernel fusion,
FlashAttention) Familiarity with GPU characteristics, roofline
models, and performance analysis Experience deploying reliable,
distributed, real-time systems at scale Experience with parallelism
strategies: tensor parallelism, pipeline parallelism, expert
parallelism Skilled at hiring and mentorship, with a demonstrated
track record of helping engineers grow their skills and careers
Strong technical communication and cross-functional collaboration
skills Comfortable giving constructive feedback on technical
designs and code reviews Has thrived in a fast-growing startup and
knows how to operate with urgency and focus Added Bonus Background
in training infrastructure and RL workloads Skilled in building
secure, compliant systems on major cloud platforms (GCP preferred,
AWS experience welcome) Experience with Kubernetes and container
orchestration at scale Published work or contributions to inference
optimization research Why Work at Abridge? At Abridge, we’re
transforming healthcare delivery experiences with generative AI,
enabling clinicians and patients to connect in deeper, more
meaningful ways. Our mission is clear: to power deeper
understanding in healthcare. We’re driving real, lasting change,
with millions of medical conversations processed each month.
Joining Abridge means stepping into a fast-paced, high-growth
startup where your contributions truly make a difference. Our
culture requires extreme ownership—every employee has the ability
to (and is expected to) make an impact on our customers and our
business. Beyond individual impact, you will have the opportunity
to work alongside a team of curious, high-achieving people in a
supportive environment where success is shared, growth is constant,
and feedback fuels progress. At Abridge, it’s not just what we
do—it’s how we do it. Every decision is rooted in empathy, always
prioritizing the needs of clinicians and patients. We’re committed
to supporting your growth, both professionally and personally.
Whether it's flexible work hours, an inclusive culture, or ongoing
learning opportunities, we are here to help you thrive and do the
best work of your life. If you are ready to make a meaningful
impact alongside passionate people who care deeply about what they
do, Abridge is the place for you. How we take care of Abridgers:
Generous Time Off : 14 paid holidays, flexible PTO for salaried
employees, and accrued time off for hourly employees Comprehensive
Health Plans : Medical, Dental, and Vision coverage for all
full-time employees and their families. Generous HSA Contribution :
If you choose a High Deductible Health Plan, Abridge makes monthly
contributions to your HSA. Paid Parental Leave : Generous paid
parental leave for all full-time employees. Family Forming
Benefits: Resources and financial support to help you build your
family. 401(k) Matching : Contribution matching to help invest in
your future. Personal Device Allowance : Tax free funds for
personal device usage. Pre-tax Benefits: Access to Flexible
Spending Accounts (FSA) and Commuter Benefits. Lifestyle Wallet :
Monthly contributions for fitness, professional development,
coworking, and more. Mental Health Support : Dedicated access to
therapy and coaching to help you reach your goals. Sabbatical Leave
: Paid Sabbatical Leave after 5 years of employment. Compensation
and Equity : Competitive compensation and equity grants for full
time employees. and much more! Equal Opportunity Employer Abridge
is an equal opportunity employer and considers all qualified
applicants equally without regard to race, color, religion, sex,
sexual orientation, gender identity, national origin, veteran
status, or disability. Staying safe - Protect yourself from
recruitment fraud We are aware of individuals and entities
fraudulently representing themselves as Abridge recruiters and/or
hiring managers. Abridge will never ask for financial information
or payment, or for personal information such as bank account number
or social security number during the job application or interview
process. Any emails from the Abridge recruiting team will come from
an @ abridge.com email address. You can learn more about how to
protect yourself from these types of fraud by referring to this
article . Please exercise caution and cease communications if
something feels suspicious about your interactions.
Keywords: Abridge, Davis , Engineering Manager, Model Inference, IT / Software / Systems , San Francisco, California