logo

CALLGOOSE

Jaegar

Overview

This document provides a comprehensive guide to integrating Jaeger with Callgoose SQIBS using Prometheus and Alertmanager as the alerting pipeline.

The integration enables:

  • Automatic incident creation in Callgoose SQIBS when Jaeger metrics indicate issues such as high latency, dropped spans, or collector saturation.
  • Automatic incident resolution when the underlying alert clears.
  • Real-time distributed tracing–driven alerting for production systems.

Prerequisites

Before continuing, ensure the following:

  • A running Kubernetes cluster.
  • Helm v3 installed.
  • Prometheus and Alertmanager deployed (kube-prometheus-stack or standalone).
  • A Callgoose SQIBS API Token.
  • Network connectivity from Alertmanager to Callgoose.

1. Install Jaeger

1.1 Install Jaeger using Helm (Recommended)

Add the official Jaeger Helm chart repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update

Install Jaeger with production-ready structure (collector, query, agent):

helm install jaeger jaegertracing/jaeger \
  --namespace observability --create-namespace \
  --set collector.replicas=2 \
  --set storage.type=elasticsearch \
  --set query.ingress.enabled=true

If using in-memory storage (Testing Only):

--set storage.type=memory

1.2 Install Jaeger Using OpenTelemetry Operator

This is the new preferred deployment model for many orgs.

Step 1 — Install Cert-Manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.yaml

Step 2 — Install the OpenTelemetry Operator

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Step 3 — Deploy Jaeger Instance

Create jaeger.yaml:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:name: jaeger
spec:strategy: productioncollector:maxReplicas: 2storage:type: elasticsearchoptions:es:server-urls: http://elasticsearch-master:9200

Apply:

kubectl apply -f jaeger.yaml

2. Expose Jaeger Metrics to Prometheus

Jaeger components expose Prometheus metrics such as:

  • Collector metrics
  • Query metrics
  • Agent metrics
  • Span ingestion/dropping statistics

2.1 Add Scrape Config for Prometheus (Standalone Prometheus)

Create jaeger-scrape-config.yaml:

scrape_configs:
  - job_name: jaeger-collector
    static_configs:
      - targets:
          - 'jaeger-collector.observability.svc.cluster.local:14269'
    metrics_path: /metrics
  - job_name: jaeger-query
    static_configs:
      - targets:
          - 'jaeger-query.observability.svc.cluster.local:16687'
    metrics_path: /metrics

Apply:

kubectl create configmap prometheus-additional-scrape --from-file=jaeger-scrape-config.yaml -n monitoring

(Then reload Prometheus or redeploy based on your stack.)

2.2 Prometheus Operator Users (kube-prometheus-stack)

Create jaeger-servicemonitor.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jaeger-collector-sm
  namespace: observability
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: jaeger-collector
  endpoints:
    - port: metrics
      path: /metrics

Apply:

kubectl apply -f jaeger-servicemonitor.yaml

3. Create Prometheus Alert Rules for Jaeger

Create jaeger-alert-rules.yaml:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: jaeger-alerts
  namespace: monitoring
spec:
  groups:
  - name: jaeger.rules
    rules:

    - alert: HighP95Latency
      expr: |
        histogram_quantile(
          0.95,
          sum(rate(http_server_duration_seconds_bucket[5m])) by (le, service)
        ) > 1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High p95 latency detected for {{ $labels.service }}"
        description: "Latency exceeded 1s for more than 5 minutes."

    - alert: JaegerCollectorDroppingSpans
      expr: increase(jaeger_collector_spans_dropped_total[5m]) > 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Jaeger collector dropping spans"
        description: "Spans were dropped in the last 5 minutes."

    - alert: JaegerQueueSaturation
      expr: jaeger_collector_queue_length > (jaeger_collector_queue_capacity * 0.8)
      for: 3m
      labels:
        severity: warning
      annotations:
        summary: "Jaeger collector queue saturation"
        description: "Queue usage is above 80% capacity."

Apply:

kubectl apply -f jaeger-alert-rules.yaml

4. Configure Alertmanager Webhook for Callgoose SQIBS

Edit your Alertmanager configuration or create alertmanager-config.yaml:

global:
  resolve_timeout: 5m

route:
  receiver: callgoose-sqibs
  group_by: ['alertname', 'service']
  group_wait: 10s
  group_interval: 30s
  repeat_interval: 1m

receivers:
  - name: callgoose-sqibs
    webhook_configs:
      - url: 'https://****.callgoose.com/v1/process?from=Jaegar&token=xxxx'
        send_resolved: true

Apply (for kube-prometheus-stack):

kubectl apply -f alertmanager-config.yaml -n monitoring

5. Create API Filters in Callgoose SQIBS

5.1 Trigger Filter (Incident Creation)

Configure the Trigger filter as:

  • Payload JSON Key: status
  • Key Value Contains: firing
  • Map Incident Using: groupKey
  • Incident Title: alerts[0].annotations.summary
  • Incident Description: alerts[0].annotations.description

5.2 Resolve Filter (Auto-Resolution)

Configure the Resolve filter as:

  • Payload JSON Key: status
  • Key Value Contains: resolved
  • Incident Mapped Using: The same field used in the Trigger filter (e.g., groupKey)

6. Sample Payloads

6.1 Firing Payload

{
  "receiver": "callgoose-sqibs",
  "status": "firing",
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighP95Latency",
        "service": "checkout-service",
        "severity": "critical"
      },
      "annotations": {
        "summary": "High p95 latency detected for checkout-service",
        "description": "Latency exceeded 1s for more than 5 minutes."
      }
    }
  ],
  "groupKey": "{}:{alertname=\"HighP95Latency\",service=\"checkout-service\"}",
  "version": "4"
}

6.2 Resolved Payload

{
  "receiver": "callgoose-sqibs",
  "status": "resolved",
  "alerts": [
    {
      "status": "resolved",
      "labels": {
        "alertname": "HighP95Latency",
        "service": "checkout-service",
        "severity": "critical"
      },
      "annotations": {
        "summary": "High p95 latency detected for checkout-service",
        "description": "Alert has returned to normal thresholds."
      }
    }
  ],
  "groupKey": "{}:{alertname=\"HighP95Latency\",service=\"checkout-service\"}",
  "version": "4"
}

7. Verification Steps

  • Confirm Jaeger is running.
  • Confirm Prometheus is successfully scraping Jaeger metrics.
  • Confirm alert rules are loaded in Prometheus.
  • Trigger a test alert (e.g., modify expression temporarily).
  • Check Alertmanager for firing/resolved alerts.
  • Verify incidents appear automatically in Callgoose SQIBS.
  • Ensure resolved alerts close incidents automatically.

8. Troubleshooting

No incidents created

  • Check Alertmanager logs for webhook errors.
  • Confirm outbound access to Callgoose.

Incidents not resolving

  • Ensure send_resolved: true is configured.
  • Verify Resolve Filter JSON paths.

Alerts never fire

  • Validate Prometheus is scraping Jaeger metrics.
  • Check rule syntax and metric names.

9. Conclusion

This integration provides a complete tracing-to-incident workflow by combining Jaeger, Prometheus, Alertmanager, and Callgoose SQIBS. Teams gain real-time observability and automated incident handling based on distributed tracing signals.

For further customization or advanced use cases, refer to the official documentation for both Jaegar and Callgoose SQIBS:

CALLGOOSE
SQIBS

Advanced Automation platform with effective On-Call schedule, real-time Incident Management and Incident Response capabilities that keep your organization more resilient, reliable, and always on

Callgoose SQIBS can Integrate with any applications or tools you use. It can be monitoring, ticketing, ITSM, log management, error tracking, ChatOps, collaboration tools or any applications

Callgoose providing the Plans with Unique features and advanced features for every business needs at the most affordable price.



Unique Features

  • 30+ languages supported
  • IVR for Phone call notifications
  • Dedicated caller id
  • Advanced API & Email filter
  • Tag based maintenance mode
Book a Demo

Signup for a freemium plan today &
Experience the results.

No credit card required