Fellow - Data Center GPU Systems Design Validation Architect

Oct 08, 2024
Austin, United States
... Not specified
... Intermediate
Full time
... Office work


WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. 

AMD together we advance_




THE TEAM: 

AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.

 

THE ROLE:  

We are seeking an exceptional Validation Architect to join our Data Center GPU Systems Design and Enablement team. In this pivotal role, you will lead the charge in the AI and HPC domains, ensuring robust system-level integration and validation, particularly in the domains of Baseboard & System Management Controller, PCIe and HBM technologies by working closely with our customers. Our cutting-edge Data Center GPU solutions, encompassing APUs and GPUs, demand a proactive approach to testing, aiming not just for performance but also for identifying and mitigating potential failures.

THE PERSON:  

As a Systems Design Engineering Architect and validation visionary, your mission is to orchestrate an end-to-end system validation plan and lead a team to rigorously test our products, deliberately pushing them to their limits to uncover vulnerabilities. This is a hands-on technical leadership position that requires your expertise in systems design engineering which will be crucial for comprehensive product development, innovative validation strategies, and efficient problem-solving.

  

KEY RESPONSIBILITIES:  

  • Orchestrating the development and implementation of advanced validation strategies, specifically designed to stress and break the system, thereby identifying potential product weaknesses.
  • Leading a team in pioneering technical validation initiatives, focusing on high-impact areas like PCIe, HBM and SMC/BMC firmware to identify vulnerabilities during system-level integration.
  • Creating and executing validation test plans that address both functional and stress scenarios, including emulation of end-customer systems.
  • Ensuring compliance with OCP standards and secure solution development, including Out of Band Management and Redfish features.
  • Collaborating with multiple teams to devise and execute exhaustive validation test plans that simulate real-world stress scenarios and customer workloads.
  • Championing the process of debugging, root cause analysis, and resolution of issues discovered during the validation phases of AI and HPC systems.
  • Working closely with development teams to ensure all identified issues are addressed and rectified before production.
  • Advancing end-to-end validation test content, utilizing creative debugging skills and innovative approaches.

 

PREFERRED EXPERIENCE:  

  • Proficiency in programming/scripting languages (e.g., C/C++, Perl, Ruby, Python).
  • Expertise in state-of-the-art debugging techniques and methodologies.
  • Extensive experience with lab equipment such as protocol/logic analyzers and oscilloscopes.
  • Deep knowledge in board/platform-level debug, including delivery, sequencing, analysis, and optimization.
  • Comprehensive understanding of system architecture, with a focus on technical debug and validation strategy development.
  • Exceptional analytical and problem-solving skills, with meticulous attention to detail.
  • Self-driven with the ability to lead tasks independently to successful completion.

  

ACADEMIC CREDENTIALS:  

  • Bachelors or Masters degree in electrical or computer engineering 

 

LOCATION:

Austin, TX

 

#LI-BW1

 

#LI-hybrid

 




At AMD, your base pay is one part of your total rewards package.  Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD’s Employee Stock Purchase Plan. You’ll also be eligible for competitive benefits described in more detail here.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

At AMD, your base pay is one part of your total rewards package.  Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD’s Employee Stock Purchase Plan. You’ll also be eligible for competitive benefits described in more detail here.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

THE TEAM: 

AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.

 

THE ROLE:  

We are seeking an exceptional Validation Architect to join our Data Center GPU Systems Design and Enablement team. In this pivotal role, you will lead the charge in the AI and HPC domains, ensuring robust system-level integration and validation, particularly in the domains of Baseboard & System Management Controller, PCIe and HBM technologies by working closely with our customers. Our cutting-edge Data Center GPU solutions, encompassing APUs and GPUs, demand a proactive approach to testing, aiming not just for performance but also for identifying and mitigating potential failures.

THE PERSON:  

As a Systems Design Engineering Architect and validation visionary, your mission is to orchestrate an end-to-end system validation plan and lead a team to rigorously test our products, deliberately pushing them to their limits to uncover vulnerabilities. This is a hands-on technical leadership position that requires your expertise in systems design engineering which will be crucial for comprehensive product development, innovative validation strategies, and efficient problem-solving.

  

KEY RESPONSIBILITIES:  

  • Orchestrating the development and implementation of advanced validation strategies, specifically designed to stress and break the system, thereby identifying potential product weaknesses.
  • Leading a team in pioneering technical validation initiatives, focusing on high-impact areas like PCIe, HBM and SMC/BMC firmware to identify vulnerabilities during system-level integration.
  • Creating and executing validation test plans that address both functional and stress scenarios, including emulation of end-customer systems.
  • Ensuring compliance with OCP standards and secure solution development, including Out of Band Management and Redfish features.
  • Collaborating with multiple teams to devise and execute exhaustive validation test plans that simulate real-world stress scenarios and customer workloads.
  • Championing the process of debugging, root cause analysis, and resolution of issues discovered during the validation phases of AI and HPC systems.
  • Working closely with development teams to ensure all identified issues are addressed and rectified before production.
  • Advancing end-to-end validation test content, utilizing creative debugging skills and innovative approaches.

 

PREFERRED EXPERIENCE:  

  • Proficiency in programming/scripting languages (e.g., C/C++, Perl, Ruby, Python).
  • Expertise in state-of-the-art debugging techniques and methodologies.
  • Extensive experience with lab equipment such as protocol/logic analyzers and oscilloscopes.
  • Deep knowledge in board/platform-level debug, including delivery, sequencing, analysis, and optimization.
  • Comprehensive understanding of system architecture, with a focus on technical debug and validation strategy development.
  • Exceptional analytical and problem-solving skills, with meticulous attention to detail.
  • Self-driven with the ability to lead tasks independently to successful completion.

  

ACADEMIC CREDENTIALS:  

  • Bachelors or Masters degree in electrical or computer engineering 

 

LOCATION:

Austin, TX

 

#LI-BW1

 

#LI-hybrid

 

COMPANY JOBS
1080 available jobs
WEBSITE