Reliability Engineer
Milwaukee, WI 
Share
Posted Today
Job Description

The future is being built today, and Johnson Controls is making that future more productive, more secure and more sustainable. We are harnessing the power of cloud, data analytics, the Internet of Things, and user design thinking to deliver on the promise of intelligent buildings and smart cities that connect communities in ways that make people's lives and the world better.

What you will do

The Site Reliability Engineer is responsible for the operations activities, reliability, scalability, and resilience of the products/platforms related tothe Site Reliability Engineering (SRE) managed subscriptions. This individual will serve as a lead and escalation point for unresolved issues and problems, oversee and drive the diagnosis, root cause, automation and fix of critical application and infrastructure issues. Must also engage with product teams to understand and transition the new features/projects to SRE by maintaining scalability and reliability standards/best practices.

How you will do it

  • Drive reliability, observability, and efficiency in SRE platforms, tools and supported products by identifying and maintaining reliability best practices (e.g. SLIs/SLOs/Error Budgets), by following product development life cycle
  • Reduce toil by automating repeated activities, reduce risk of manual failures with automation/scripting and responsible for platform automation. Automation of infrastructure provisioning, deployments, issue detection, remediation and self-heal
  • Optimize the resources by conducting assessments of systems behavior and consumption, and by doing capacity planning
  • Test, Maintain and Improve the flexibility and resilience of the systems by using chaos engineering techniques such as destructive testing, failure modes, and DR
  • Continually evaluate and adopt the latest industry technologies to streamline processes
  • Drive incident response and problem management exercises, bringing skills to effectively respond to critical production issues and security events in line with SRE incident response and problem management playbooks.
  • Perform analytics on previous alerts, incidents and usage patterns to better predict issues and take proactive actions
  • Drive blameless retrospectiveand root-cause analysis (RCAs) to drive a culture of continuous improvements.
  • Work with product teams, SRE design, and L2 teams to identify, maintain reliability best practices (e.g. SLIs/SLOs/Error Budgets), and to ensure proper monitoring, process and tools are in place before the application moves into a live environment.
  • Ensure critical product issues, security events and alerts are being addressed in compliant manner, coordinates incidents with product business units and act as Incident commander of critical incidents, and act as escalation contact for all product issues.
  • Reduce operational deficiencies, refine/improve processes, improve efficiencies in systems/tools and optimize resources by constantly looking for improvement opportunities with continuous improvement mindset.
  • Create and maintain terraform & ansible automation scripts for essential tools, services, and configurations.
  • Assist in preparing operational reports and present to products teams.

Qualifications

What we look for

Required:

  • 5 years of experience in Site Reliability Engineering/System Administration or DevOps
  • 3 years of experience in PaaS & Containers - Open shift, Kubernetes, Docker, AKS, GKE, EKS etc.
  • 5 years of experience with one or more configuration management systems like Chef, Ansible, Jenkins, GitHub/Azure DevOps, Sophos and Nginx
  • 4 years of experience with application and handling Infrastructure troubleshooting, On call, Incident, Problem management and continuous integration/ deployment

Preferred:

  • Strong experience with customer support, project management, operations, or related area
  • Strong organizational and communication skills
  • Good understanding of systems architecture, UNIX internals, networking topologies, multi-cluster applications, multi-tenant platforms, and systems/network security

Johnson Controls is a global diversified technology and multi industrial leader serving a wide range of customers in more than 150 countries. Our commitment to sustainability dates back to our roots in 1885, with the invention of the first electric room thermostat. We are committed to helping our customers win everywhere, every day and creating greater value for all of our stakeholders through our strategic focus on buildings.

Johnson Controls is an equal employment opportunity and affirmative action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, protected veteran status, status as a qualified individual with a disability, or any other characteristic protected by law. For more information, please view EEO is the Law. If you are an individual with a disability and you require an accommodation during the application process, please visit .

JobEngineering
Primary LocationUS-WI-Milwaukee
OrganizationBldg Technologies & Solutions


Johnson Controls is an equal employment opportunity and affirmative action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, protected veteran status, status as a qualified individual with a disability, or any other characteristic protected by law. For more information, please view EEO is the Law. If you are an individual with a disability and you require an accommodation during the application process, please visit www.johnsoncontrols.com/tomorrowneedsyou.

 

Job Summary
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Experience
5+ years
Email this Job to Yourself or a Friend
Indicates required fields