Integrations
Splunk Observability
Overview
This document provides a detailed guide to integrating Splunk Observability (formerly SignalFx) with Callgoose SQIBS for real-time Incident Management, Incident Auto Remediation, Event-Driven Automation, and other Automation purposes. The integration enables automatic creation, updating, and resolution of incidents in Callgoose SQIBS based on alerts triggered in Splunk Observability. The guide includes steps for setting up alerts in Splunk Observability, configuring notifications, creating API filters in Callgoose SQIBS, and troubleshooting.
Prerequisites
- Splunk Observability Account: Access to Splunk Observability for creating alerts and managing notifications.
- Callgoose SQIBS Account: With valid privileges to set up API filters and receive notifications.
- Webhook/API Endpoint: Available in Callgoose SQIBS to receive alerts from Splunk Observability.
1. Obtain API Token and Endpoint Details
To integrate with Callgoose SQIBS, you first need to obtain an API token and find the API endpoint details.
- Generate an API Token:
- Follow the guide on How to Create API Token in Callgoose SQIBS.
- Find the API Endpoint:
- Refer to the Callgoose SQIBS API Endpoint Documentation to get the endpoint details where the JSON payloads from Splunk Observability will be sent.
2. Debugging and Troubleshooting
You can enable debugging in the API tokens used with Splunk Observability notifications for troubleshooting purposes.
- Enable Debugging:
- You can update the debug value when adding or updating an API token.
- When API tracking is enabled, logs are stored in the API log section for your review. The debugging option will automatically disable after 48 hours.
- When API tracking is turned off, no logs are saved in the API log.
- Using API Log for Troubleshooting:
- The API log provides detailed information on all API calls made to Callgoose SQIBS.
- You can check the JSON values in each API log entry for troubleshooting purposes.
- Use the information in the API log to create or refine API filters to ensure incidents are created correctly based on the API payloads received.
- Callgoose SQIBS creates incidents according to your API filter configuration, giving you full control over how alerts from different services trigger incidents and alerts for your support team or automation processes.
3. Configuring Splunk Observability to Send JSON Payloads
To configure Splunk Observability (formerly SignalFx) to generate the JSON payloads similar to the examples provided, follow the steps outlined below. These steps will guide you through setting up the necessary alerts and webhook notifications within Splunk Observability to ensure that the JSON payloads match those expected by Callgoose SQIBS.
3.1 Setting Up Alerts in Splunk Observability
To generate the required JSON payloads, you first need to set up alerts within Splunk Observability.
- Log in to Splunk Observability:
- Access the Splunk Observability platform using your account credentials.
- Navigate to the Detectors & SLOs section in Splunk Observability.
- Create a New Alert:
- Click on Create Detector and select the appropriate detector (e.g., Custom Detector).
- Enter the detector name, then click Create Alert Rule.
- In the new window, you’ll see sections for Alert Signal, Alert Condition, Alert Settings, and Alert Message.
- In the Alert Signal section, choose the metric signal you want to alert on (e.g., Latency). Proceed to Alert Condition.
- Select a condition for this alert. Refer Builtin Alert Condition for more info. Proceed to Alert Settings.
- Customize the threshold in Alert Settings. (e.g., Latency > 200ms, CPU usage > 80%, Memory usage > 90%).
- Optionally, customize the message body for each alert in the Alert Message section.
3.2 Configuring the Webhook Notification
- Add Alert Recipients:
- Click on Add Recipients, select Webhook, and then choose Custom.
- In the URL Field, provide the API endpoint URL obtained from Callgoose Sqibs. Refer to the API Endpoint documentation for more details on selecting the final endpoint format. Then click on Update.
- Click Proceed to Alert Activation, and then Activate Alert Rule.
3.3 Finalizing and Testing
- Validate the Integration:
- Manually trigger the alert condition if possible to verify that the correct JSON payload is sent to Callgoose SQIBS.
- Resolve the alert to ensure the resolved state payload is also correctly sent and processed.
4. Configuring Callgoose SQIBS
4.1 Create API Filters in Callgoose SQIBS
To correctly map incidents from the Splunk Observability alerts, you need to create API filters based on the JSON payloads received.
4.1.1 Example JSON Payloads from Splunk Observability
Alert Triggered (status: "anomalous")
json { "severity": "Critical", "originatingMetric": "demo.trans.latency", "detectOnCondition": "when(A > threshold(275))", "messageBody": "Rule \"test\" in detector \"test\" triggered at Wed, 4 Sep 2024 06:07:59 GMT.\n\nTriggering condition: The value of demo.trans.latency is above 275.\n\nSignal value for demo.trans.latency: 276.01743470331917\n\nSignal details:\n{demo_datacenter=Tokyo, sf_metric=demo.trans.latency, demo_customer=zibobodesign.net, demo_host=server4}", "src": null, "inputs": { "A": { "value": "276.01743470331917", "fragment": "data('demo.trans.latency').publish(label='A')", "key": { "demo_customer": "zibobodesign.net", "demo_datacenter": "Tokyo", "demo_host": "server4", "sf_metric": "demo.trans.latency" } }, "_S2": { "value": "275", "fragment": "threshold(275)" } }, "rule": "test", "description": "The value of demo.trans.latency is above 275.", "messageTitle": "Critical Alert: test (test)", "sf_schema": 2, "eventType": "GWmfi3aCEAA__GWmfi3fCMAA__test", "runbookUrl": null, "orgId": "GWmYhVoCMAA", "triggeredWhileMuted": false, "detectorId": "GWmfi3fCMAA", "imageUrl": "https://static.jp0.signalfx.com/signed/GWmfi3fCMAA async", "tip": null, "statusExtended": "anomalous", "incidentId": "GWmQ3k6CEFY", "detector": "test", "detectorUrl": "https://app.jp0.signalfx.com/#/detector/GWmfi3fCMAA/edit?incidentId=GWmQ3k6CEFY&is=anomalous", "status": "anomalous", "timestamp": "2024-09-04T06:07:59Z", "dimensions": { "demo_datacenter": "Tokyo", "sf_metric": "demo.trans.latency", "demo_customer": "zibobodesign.net", "demo_host": "server4" } }
Alert Resolved (status: "ok")
json { "severity": "Critical", "originatingMetric": "demo.trans.latency", "detectOnCondition": "when(A > threshold(275))", "messageBody": "Rule \"test\" in detector \"test\" cleared at Wed, 4 Sep 2024 06:08:00 GMT.\n\nCurrent signal value for demo.trans.latency: 254.76949684719662\n\nSignal details:\n{demo_datacenter=Tokyo, sf_metric=demo.trans.latency, demo_customer=zibobodesign.net, demo_host=server4}", "src": null, "inputs": { "A": { "value": "254.76949684719662", "fragment": "data('demo.trans.latency').publish(label='A')", "key": { "demo_customer": "zibobodesign.net", "demo_datacenter": "Tokyo", "demo_host": "server4", "sf_metric": "demo.trans.latency" } }, "_S2": { "value": "275", "fragment": "threshold(275)" } }, "rule": "test", "description": "The value of demo.trans.latency is above 275.", "messageTitle": "Back to normal: test (test)", "sf_schema": 2, "eventType": "GWmfi3aCEAA__GWmfi3fCMAA__test", "runbookUrl": null, "orgId": "GWmYhVoCMAA", "triggeredWhileMuted": false, "detectorId": "GWmfi3fCMAA", "imageUrl": "https://static.jp0.signalfx.com/signed/****/async", "tip": null, "statusExtended": "ok", "incidentId": "GWmQ3k6CEFY", "detector": "test", "detectorUrl": "https://app.jp0.signalfx.com/#/detector/GWmfi3fCMAA/edit?incidentId=GWmQ3k6CEFY&is=ok", "status": "ok", "timestamp": "2024-09-04T06:08:00Z", "dimensions": { "demo_datacenter": "Tokyo", "sf_metric": "demo.trans.latency", "demo_customer": "zibobodesign.net", "demo_host": "server4" } }
4.2 Configuring API Filters
4.2.1 Integration Templates
If you see a Splunk Observability integration template in the "Select Integration Template" dropdown in the API filter settings, you can use it to automatically add the necessary Trigger and Resolve filters along with other values. The values added by the template can be modified to customize the integration according to your requirements.
4.2.2 Manually Add/Edit the Filter
There are two filters that you can manually edit: Trigger and Resolve.
- Trigger Filter (For Creating Incidents):
- Payload JSON Key: "status"
- Key Value Contains: [anomalous]
- Map Incident With: "incidentId"
- This corresponds to the unique incidentId from the Splunk Observability payload.
- Incident Title From: "messageTitle"
- This will use the message title as the title for the incident in Callgoose SQIBS.
- You can also configure the API endpoint URL from Callgoose Sqibs using the 'from' URL parameter. The 'from' parameter value will be used if the payload does not contain the specific Incident title key specified in the API filter. Refer to the API endpoint documentation for more details.
- Incident Description From: Leave this empty unless you want to use a specific key-value from the JSON payload. If a key is entered, only the value for that key will be used as the Incident Description instead of the full JSON. By default, the Incident Description will include the full JSON values.
- Example: If you use the "description" key in the Incident Description From field, the incident description will be the value of the "description” key. In the example JSON payload provided earlier, this would result in a description like "The value of demo.trans.latency is above 275.".
- Resolve Filter (For Resolving Incidents):
- Payload JSON Key: "status"
- Key Value Contains: [ok]
- Incident Mapped With: "incidentId"
- This ensures the incident tied to the specific incidentId is resolved when the alert state returns to ok.
Refer to the API Filter Instructions and FAQ for more details.
4.3 Finalizing Setup
- Save the API Filters:
- Ensure that the filters are correctly configured and saved in Callgoose SQIBS.
- Double-check that all key mappings, incident titles, and descriptions are correctly aligned with the payload structure sent by Honeycomb.
5. Testing and Validation
5.1 Triggering Alarms
- Simulate High CPU Usage:
- Increase the CPU load on a host or application monitored by Splunk Observability to trigger the alert.
- Verify that an incident is created in Callgoose SQIBS with the correct title and urgency.
5.2 Resolving Alarms
- Reduce CPU Usage:
- Bring the CPU usage back below the threshold to resolve the alert in Splunk Observability.
- Verify that the incident in Callgoose SQIBS is marked as resolved.
6. Security Considerations
- API Security: Ensure that the Callgoose SQIBS API endpoint is correct and that you are using the correct API token.
- Splunk Observability Permissions: Restrict access to your Splunk Observability alerts and notifications with appropriate roles and permissions to ensure that only authorized actions can be performed.
7. Troubleshooting
- No Incident Created: Verify that the Splunk Observability webhook is correctly set up and that the JSON payload structure matches the API filters configured in Callgoose SQIBS.
- Incident Not Resolved: Ensure the resolve filter is correctly configured and that the payloads from Splunk Observability are being received and processed by Callgoose SQIBS.
8. Conclusion
This guide provides a comprehensive overview of how to integrate Splunk Observability (formerly SignalFx) with Callgoose SQIBS for effective incident management. By following the steps outlined, you can ensure that alerts from Splunk Observability are automatically reflected as incidents in Callgoose SQIBS, with proper resolution tracking when the issues are resolved.
For further customization or advanced use cases, refer to the official documentation for both Splunk Observability and Callgoose SQIBS:
- Splunk Observability Documentation
- Callgoose SQIBS API Token Documentation
- Callgoose SQIBS API Endpoint Documentation
- API Filter Instructions and FAQ
- How to Send API
This documentation will guide you through the integration process, ensuring that your incidents are managed effectively within Callgoose SQIBS based on real-time alerts from Splunk Observability.