Site Reliability Engineer - Observability Job at Second Front Systems, Remote

ZVJGNUFOeEFhZjNTTWxiRm04VGtzc0F1SkE9PQ==
  • Second Front Systems
  • Remote

Job Description

ABOUT THE ROLE

Second Front Systems' (2F) Product team is seeking a highly skilled and motivated Senior Site Reliability Engineer to join our Observability team. We are a small team working to accelerate the deployment of emerging technology into national security use-cases. We are seeking technical professionals who want to operate on the front lines of an exciting and disruptive mission.

As a Senior SRE for Second Front Systems, you'll be responsible for deploying, maintaining, and scaling our observability infrastructure across multiple DoD networks. You'll work with Kubernetes-based platforms, BigBang charts from DoD Platform One, and build automation to make our monitoring stack easier to deploy for new customers. You'll be empowered to collaborate with others to implement infrastructure that delivers unique capabilities for our commercial and government customers, including the Department of Defense.

The Observability team is looking for a strong SRE with deep DevSecOps and Kubernetes experience. Someone who has deployed and maintained monitoring infrastructure at scale, with an eye for security in highly-regulated environments. Experience with DoD software deployments, Platform One, and single-tenant architectures is highly valued.

We are a fast-growing entrepreneurial team working at the convergence of technology and national security. If this type of effort interests you, come join us!

Note: This position requires U.S. citizenship due to government contract requirements.

What You’ll Do
  • Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
  • Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
  • Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
  • Configure and maintain BigBang charts and DoD Platform One integrations
  • Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
  • Work with Istio service mesh and Keycloak for authentication in secure environments
  • Monitor and optimize performance of monitoring infrastructure across multiple environments
  • Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
  • Participate in on-call rotation and incident response for production environments
Skills You’ll Bring to Our Team
  • 5+ years of Site Reliability Engineering or DevOps experience
  • Deep experience with Kubernetes administration, troubleshooting, and scaling
  • Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
  • Strong understanding of Helm charts, GitOps practices, and CNCF tooling
  • Experience with service mesh technologies (Istio preferred)
  • Proven ability to debug complex distributed systems and networking issues
  • Understanding of authentication systems and security in regulated environments
  • Ability to work independently and collaborate with team members in a remote environment
Preferred Qualifications
  • Active security clearance or ability to obtain a Secret-level security clearance
  • Previous experience with DoD software deployments and Platform One
  • Experience with BigBang charts and Iron Bank containers
  • Experience working in national security or highly regulated environments
  • Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
  • Experience with infrastructure as code (Pulumi, Terraform)
Technologies we Use
  • Observability : Grafana stack, Prometheus, custom alerting tools
  • Kubernetes : Helm, ArgoCD, Flux, Tekton, BigBang charts
  • Security : Istio, Keycloak, Kyverno
  • Infrastructure : AWS/GCP/Azure, Pulumi, Git/GitLab
  • Languages : YAML, Bash, Go

Job Tags

Remote job, Full time, Contract work,

Similar Jobs

UCLA Health

Patient Advocate Job at UCLA Health

DescriptionThe Patient Advocate supports and assists patients and families across the UCLA Health System. Listens with empathy and caring, then researches, problems solves and advocates, ensuring that patients are assured that their concerns are addressed. When handling... 

Lifemd

Inside Sales Representative Job at Lifemd

 ...outcomes across more than 200 health concerns. To support our expanding patient base, LifeMD leverages a vertically-integrated, proprietary digital care platform, a 50-state affiliated medical group, a 22,500-square-foot affiliated pharmacy, and a U.S.-based patient... 

DSV - Global Transport and Logistics

Customs Specialist, FTZ Administrator Job at DSV - Global Transport and Logistics

 ...DSV - Global transport and logistics In 1976, ten independent hauliers joined forces and founded DSV in Denmark. Since then, DSV has evolved to become the world's 3rdlargest supplier of global solutions within transport and logistics. Today, we add value to our customers... 

GE Aerospace

Tig Welder Job at GE Aerospace

 ...Job Description Summary Our Lynn, MA team is looking to hire a Tig Welder to perform layout, set-up and welding on a variety of aircraft engine component parts and assemblies. In this role you can expand your career welding various exotic materials into critical parts... 

Syneos Health Commercial Solutions

Cardiovascular Specialty Pharmaceutical Sales Representative Job at Syneos Health Commercial Solutions

 ...have what it takes: a competitive drive coupled with exceptional sales ability. In this role, you will be an integral part of a...  ...'s degree ~5+ years' account sales experience in the pharmaceutical/biotechnology industry with 3 years of relevant cardiovascular...