Job Overview
We’re looking for an HPC Storage Engineer to design, operate, and improve high-throughput storage platforms that power demanding compute workloads. This role is for someone who can move comfortably between architecture and hands-on operations—building reliable parallel filesystems, tuning performance under real load, and keeping availability high when the cluster is busy.
You’ll work across the full lifecycle: deploying and upgrading Lustre, IBM Spectrum Scale (GPFS), and/or BeeGFS; integrating with compute schedulers and network fabrics; and establishing clean operational practices around monitoring, capacity planning, and incident response. You’ll be expected to diagnose complex I/O and metadata bottlenecks, validate changes with meaningful benchmarks, and translate findings into practical improvements.
What success looks like
You leave storage in a clearly better state than you found it—faster, more stable, and easier to operate. Stakeholders trust the platform because performance is measured, issues are understood, and changes are executed with discipline.
- Predictable performance for mixed HPC/AI workloads through tuning and evidence-based benchmarking
- Operational resilience via sensible automation, monitoring, and upgrade/rollback planning
- Clear ownership of capacity trends, failure modes, and documentation that other engineers can use
This is a collaborative engineering role: you’ll partner closely with HPC, Linux, networking, and platform teams to ensure storage design matches workload reality—and you’ll communicate tradeoffs clearly when reliability, cost, and performance pull in different directions.



%2520(2).avif&w=3840&q=75)


.avif&w=3840&q=75)



