GPU RAS/RESET Post Silicon Validation Engineer
What you do at AMD changes everything
At AMD, we push the boundaries of what is possible. We believe in changing the world for the better by driving innovation in high-performance computing, graphics, and visualization technologies – building blocks for gaming, immersive platforms, and the data center.
Developing great technology takes more than talent: it takes amazing people who understand collaboration, respect, and who will go the “extra mile” to achieve unthinkable results. It takes people who have the passion and desire to disrupt the status quo, push boundaries, deliver innovation, and change the world. If you have this type of passion, we invite you to take a look at the opportunities available to come join our team.
GPU RAS/RESET Post Silicon Validation Engineer
The Datacenter Graphics and Accelerated Computing Validation Team is looking for dynamic and energetic RAS Validation Engineers to join our growing team. As a key contributor to the success of AMD’s IP, you will be part of a leading team to drive and enhance AMD’s abilities to deliver the highest quality, industry leading technologies used for Datacenter, Machine Learning, and High Performance Computing. The team fosters and encourages continuous technical innovation to showcase successes as well as facilitate continuous career development.
A self-starter with the ability to execute complex test plans independently and collaborate with others to resolve problems found. As a lead you are expected to guide junior engineers, you should feel comfortable working in a lab environment.
Technical, hands-on engineer responsible for Datacenter GPU SoC Post-Silicon features validation and enablement at the silicon SoC level up through VBIOS, system firmware, and OS levels on AMD Server products. This individual will be primarily responsible for interfacing with silicon, firmware, and platform design groups to develop and execute validation test plans for Datacenter Server SoC products during emulation and post-silicon bring-up / validation, to debug SoC feature issues, and to run and maintain validation infrastructure.
- Drive post-silicon debug efforts to identify root cause and resolution of AMD's newest Datacenter GPU SoC
- Perform system-level debug and root cause analysis to narrow down the issues to various HW/SW blocks
- Develop and execute feature enablement and validation test plans for SoC- and system-level SoC features across all AMD Server products
- Develop post-silicon validation infrastructure (software, hardware, automation environment, and lab setup)
- Execute test cases in pre-silicon environment.
- Test interactions between various Datacenter GPU SoC features using validation infrastructure
- Work with cross-functional teams to improve post-silicon validation test strategy, methodology, and process
- Leading collaborative technical discussions to drive resolution on technical issues and roll out technical initiatives
- Be able to work in a high demand, fast paced environment with lots of real-time problem solving and critical thinking
- Develop knowledge of system architecture, technical debug, and validation strategy
- Drive technical innovation to enhance AMD’s capabilities in RAS IP validation including tools and script development, technical and procedural methodology enhancement, and various internal and cross-functional technical initiatives.
- Support on customer platforms as requested by customer support teams.
- Provide meaningful execution and issue update to program management.
- Experience in following technical areas:
- RAS IP design and architecture
- Digital logic / platform design, verification, or post-silicon validation
- Strong programming/scripting skills (C/C++, Python, Perl)
- Silicon debug techniques and methodologies both SoC and system level
- Board / platform-level debug, including clock/power delivery, sequencing, analysis, and optimization
- Physical and protocol levels of common high speed interfaces an asset
- Common lab equipment, including protocol/logic analyzers, oscilloscopes, etc.
- Knowledge of computer hardware architecture (CPU/GPU, x86, PCIe, memory, bus logic) and software architecture (driver, bios, firmware usage)
- Working knowledge of Server OSes (Linux, Windows Server)
- Test plan and test development experience
- Experience developing validation methodologies and infrastructure
- Strong communication and collaboration skills
- Must excel in a dynamic team working environment
- Must be a self-starter and be able to independently drive tasks to completion
- Leadership and mentoring skills a definite asset
- Graphics and PCIe protocol experience a plus
- Bachelors or Masters Degree in Electrical or Computer Engineering
- 2-3+ years of experience in SoC validation and debug
Austin, Texas, US
Markham, Ontario, Canada
Requisition Number: 164501
Country: Canada Province: Ontario City: Markham
AMD is an inclusive employer dedicated to building a diverse workforce. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective provincial human rights codes throughout all stages of the recruitment and selection process. Any applicant who requires accommodation should contact AskHR@amd.com.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies or fee based recruitment services.