Integrations
Jaegar
Overview
This document provides a comprehensive guide to integrating Jaeger with Callgoose SQIBS using Prometheus and Alertmanager as the alerting pipeline.
The integration enables:
- Automatic incident creation in Callgoose SQIBS when Jaeger metrics indicate issues such as high latency, dropped spans, or collector saturation.
- Automatic incident resolution when the underlying alert clears.
- Real-time distributed tracing–driven alerting for production systems.
Prerequisites
Before continuing, ensure the following:
- A running Kubernetes cluster.
- Helm v3 installed.
- Prometheus and Alertmanager deployed (kube-prometheus-stack or standalone).
- A Callgoose SQIBS API Token.
- Network connectivity from Alertmanager to Callgoose.
1. Install Jaeger
1.1 Install Jaeger using Helm (Recommended)
Add the official Jaeger Helm chart repository:
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm repo update
Install Jaeger with production-ready structure (collector, query, agent):
helm install jaeger jaegertracing/jaeger \ --namespace observability --create-namespace \ --set collector.replicas=2 \ --set storage.type=elasticsearch \ --set query.ingress.enabled=true
If using in-memory storage (Testing Only):
--set storage.type=memory
1.2 Install Jaeger Using OpenTelemetry Operator
This is the new preferred deployment model for many orgs.
Step 1 — Install Cert-Manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.yaml
Step 2 — Install the OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Step 3 — Deploy Jaeger Instance
Create jaeger.yaml:
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata:name: jaeger spec:strategy: productioncollector:maxReplicas: 2storage:type: elasticsearchoptions:es:server-urls: http://elasticsearch-master:9200
Apply:
kubectl apply -f jaeger.yaml
2. Expose Jaeger Metrics to Prometheus
Jaeger components expose Prometheus metrics such as:
- Collector metrics
- Query metrics
- Agent metrics
- Span ingestion/dropping statistics
2.1 Add Scrape Config for Prometheus (Standalone Prometheus)
Create jaeger-scrape-config.yaml:
scrape_configs:
- job_name: jaeger-collector
static_configs:
- targets:
- 'jaeger-collector.observability.svc.cluster.local:14269'
metrics_path: /metrics
- job_name: jaeger-query
static_configs:
- targets:
- 'jaeger-query.observability.svc.cluster.local:16687'
metrics_path: /metrics
Apply:
kubectl create configmap prometheus-additional-scrape --from-file=jaeger-scrape-config.yaml -n monitoring
(Then reload Prometheus or redeploy based on your stack.)
2.2 Prometheus Operator Users (kube-prometheus-stack)
Create jaeger-servicemonitor.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: jaeger-collector-sm
namespace: observability
spec:
selector:
matchLabels:
app.kubernetes.io/name: jaeger-collector
endpoints:
- port: metrics
path: /metrics
Apply:
kubectl apply -f jaeger-servicemonitor.yaml
3. Create Prometheus Alert Rules for Jaeger
Create jaeger-alert-rules.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: jaeger-alerts
namespace: monitoring
spec:
groups:
- name: jaeger.rules
rules:
- alert: HighP95Latency
expr: |
histogram_quantile(
0.95,
sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service)
) > 1
for: 5m
labels:
severity: critical
annotations:
summary: "High p95 latency detected for {{ $labels.service }}"
description: "Latency exceeded 1s for more than 5 minutes."
- alert: JaegerCollectorDroppingSpans
expr: increase(jaeger_collector_spans_dropped_total[5m]) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "Jaeger collector dropping spans"
description: "Spans were dropped in the last 5 minutes."
- alert: JaegerQueueSaturation
expr: jaeger_collector_queue_length > (jaeger_collector_queue_capacity * 0.8)
for: 3m
labels:
severity: warning
annotations:
summary: "Jaeger collector queue saturation"
description: "Queue usage is above 80% capacity."
Apply:
kubectl apply -f jaeger-alert-rules.yaml
4. Configure Alertmanager Webhook for Callgoose SQIBS
Edit your Alertmanager configuration or create alertmanager-config.yaml:
global:
resolve_timeout: 5m
route:
receiver: callgoose-sqibs
group_by: ['alertname', 'service']
group_wait: 10s
group_interval: 30s
repeat_interval: 1m
receivers:
- name: callgoose-sqibs
webhook_configs:
- url: 'https://****.callgoose.com/v1/process?from=Jaegar&token=xxxx'
send_resolved: true
Apply (for kube-prometheus-stack):
kubectl apply -f alertmanager-config.yaml -n monitoring
5. Create API Filters in Callgoose SQIBS
5.1 Trigger Filter (Incident Creation)
Configure the Trigger filter as:
- Payload JSON Key: status
- Key Value Contains: firing
- Map Incident Using: groupKey
- Incident Title: alerts[0].annotations.summary
- Incident Description: alerts[0].annotations.description
5.2 Resolve Filter (Auto-Resolution)
Configure the Resolve filter as:
- Payload JSON Key: status
- Key Value Contains: resolved
- Incident Mapped Using: The same field used in the Trigger filter (e.g., groupKey)
6. Sample Payloads
6.1 Firing Payload
{
"receiver": "callgoose-sqibs",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighP95Latency",
"service": "checkout-service",
"severity": "critical"
},
"annotations": {
"summary": "High p95 latency detected for checkout-service",
"description": "Latency exceeded 1s for more than 5 minutes."
}
}
],
"groupKey": "{}:{alertname=\"HighP95Latency\",service=\"checkout-service\"}",
"version": "4"
}
6.2 Resolved Payload
{
"receiver": "callgoose-sqibs",
"status": "resolved",
"alerts": [
{
"status": "resolved",
"labels": {
"alertname": "HighP95Latency",
"service": "checkout-service",
"severity": "critical"
},
"annotations": {
"summary": "High p95 latency detected for checkout-service",
"description": "Alert has returned to normal thresholds."
}
}
],
"groupKey": "{}:{alertname=\"HighP95Latency\",service=\"checkout-service\"}",
"version": "4"
}
7. Verification Steps
- Confirm Jaeger is running.
- Confirm Prometheus is successfully scraping Jaeger metrics.
- Confirm alert rules are loaded in Prometheus.
- Trigger a test alert (e.g., modify expression temporarily).
- Check Alertmanager for firing/resolved alerts.
- Verify incidents appear automatically in Callgoose SQIBS.
- Ensure resolved alerts close incidents automatically.
8. Troubleshooting
No incidents created
- Check Alertmanager logs for webhook errors.
- Confirm outbound access to Callgoose.
Incidents not resolving
- Ensure send_resolved: true is configured.
- Verify Resolve Filter JSON paths.
Alerts never fire
- Validate Prometheus is scraping Jaeger metrics.
- Check rule syntax and metric names.
9. Conclusion
This integration provides a complete tracing-to-incident workflow by combining Jaeger, Prometheus, Alertmanager, and Callgoose SQIBS. Teams gain real-time observability and automated incident handling based on distributed tracing signals.
For further customization or advanced use cases, refer to the official documentation for both Jaegar and Callgoose SQIBS:
