Elastic Platform Governance at Scale

Managing distributed log ingestion in Elastic while balancing source owner autonomy with security team control over platform availability and log integrity.

The Challenge

Enterprises running Elastic Stack at scale face a common log management and governance problem. Hundreds of different teams within the organization need to ingest their own logs, build dashboards according to their requirements, and configure monitoring alerts in Kibana, or tweak security rules that are noisy and produce false positives. Each team knows their data sources and use cases best. Yet they have a shared responsibility with the security and platform engineering teams, that are responsible for the availability of the platform, the data quality, the confidentiality and integrity of the logs. This shared responsibility comes with several challenges

Fully centralized control is inflexible: Security and site-reliability teams become bottlenecks when they control fully all agent policies, ingest pipelines, index patterns, and dashboards. Managing the observability into the SIEM turns into a huge load of tickets and requests
Full decentralized control comes with security and quality risks: Teams without guardrails might deploy misconfigured agents, create expensive queries that affect the availability of the platform, send or expose sensitive data that based on their classification should never end up in the SIEM (e.g. PII), or overwrite/mix critical security indices.
Ingestion pipeline sprawl: Uncoordinated Elastic Agent policies, Logstash configurations, and elastic ingest pipelines create duplicate processing and parsing inconsistencies. Teams know their data-sources best but do not necessarily have the expertise to parse and manage their logs according the Elastic Common Schema and log management best practices
Data confidentiality and integrity risks: Data source owners might accidentally modify or delete security-critical logs through bulk API operations or misconfigured agent outputs.
Limited auditing: No visibility and therefore no accountability into who deployed which agent policies, modified ingest pipelines, or changed index lifecycle policies.

Solution

Implementing governance across the full Elastic Stack is best done using Infrastructure as Code (IaC) and best DevSecOps practices such as CI/CD workflows (e.g Git) that separate test and production environments, but keep a single source of truth. Source owners get full control in test, site-reliability engineers and security teams get full control in production. Tickets exist, but security reviews and approvals are less tedious through automated deployment to production and standardized processes. Particularly for Elastic there are two different needs on managing infrastructure.

Infrastructure Layer (Terraform-managed)

Using the official Elastic Terraform provider organizations can store Elastic Agent Fleet policies, ingest pipelines, index templates, ILM policies, and data streams as Terraform configurations. Source owners create pull requests with their changes:

The CI pipeline automatically validates syntax, runs terraform plan against test environment, and auto-deploys (to test) on merge. For production deployment, the same PR requires security or site-reliability team approval (which can be automated through CODEOWNERS), then CI runs terraform apply and deploys to staging and production. This enables Detection engineering as Code principles across the platform layer.

Kibana Objects Layer (API-managed)

Not all objects in Elastic can be efficiently managed with Terraform. For example dashboards and their visualizations exist as NDJSON exports (Kibana saved objects). The solution is to use custom CI/CD pipelines, possibly automated with Python or Bash scripts in order to move and validate JSON objects across environments

Source owners create dashboards and alerts directly in test Kibana space
Automated scripts pull saved objects from test via Kibana API and commit NDJSON to Git
Pull request shows the JSON diff for review
CI pipeline validates object structure and references
On approval, import scripts push NDJSON to production Kibana via API

Security-owned spaces (like SOC dashboards) have stricter RBAC,source owners can view but not edit.

Impact

Architecting Elastic based on this indicative solution is common among enterprises and allows teams operate independently within defined guardrails that protect the CIA of the SIEM and logs but give them the freedom they need. As such:

The deployment speed and velocity increases as teams are able to self-service their logs and observability into the SIEM
The availability guarantees of the platform become stronger through enforced quotas and limits that prevent resource contention
Storage costs decrease, as standardized ILM policies managed optimally by the site-reliability and platform engineering teams automatically transition old data to hot, warm, cold and frozen tiers
Data quality improves through standardized ingest pipelines that enforce parsing consistency according to the Elastic Common Schema (ECS)
Security log integrity is guaranteed through appropriate RBAC and control of the production across the whole log management pipeline: From the agent collection, to the ingestion and presentation/observability
Audit compliance becomes easier through Git version control and history
Alert quality improves as teams iterate on their own detection rules without cross-team noise, while security teams keep full autonomy over the security alerts
Cross-team collaboration becomes frictionless through shared, well-defined processes that depend on automation rather than time and energy intensive ticketing
Incident response becomes easier and more effective with consistent field and index naming across log sources and well understood detection and response playbooks across the team

Elastic Platform Governance at Scale

The Challenge

Solution

Infrastructure Layer (Terraform-managed)

Kibana Objects Layer (API-managed)

Impact

Contact Us

Get informed without financial commitment

Cyber Defence

Cloud Security

Product Security

Security Research & Development

Elastic Platform Governance at Scale

The Challenge

Solution

Infrastructure Layer (Terraform-managed)

Kibana Objects Layer (API-managed)

Impact

Related articles

Engineering Detection as Code

Securing and monitoring AWS infrastructure

Contact Us

Get informed without financial commitment