Integrations
Gremlin
Overview
This document provides a complete, beginner-friendly guide to integrating Gremlin with Callgoose SQIBS using direct webhook notifications.
Gremlin is a leading Chaos Engineering platform used to safely and systematically inject failure into systems to proactively identify weaknesses. When a Gremlin attack is initiated, completed, or halted, Gremlin can send webhook payloads directly to Callgoose SQIBS. Callgoose will automatically create, update, and resolve incidents based on these attack state changes.
This integration enables:
- Real-time Incident Creation in Callgoose SQIBS when a Chaos Engineering attack starts.
- Automatic Resolution when the Gremlin attack successfully halts or is manually stopped.
- End-to-end Observability from controlled chaos $\rightarrow$ SQIBS incidents.
- Support for linking specific attack details (e.g., target, magnitude) to the incident for quicker context.
Prerequisites
Before proceeding, ensure you have the following:
- A running Gremlin Account with permissions to configure webhooks.
- A deployed Gremlin Agent on your target infrastructure (Kubernetes, VMs, etc.).
- A valid Callgoose API token and Api endpoint URL.
- Admin access to the Callgoose SQIBS API Filters section.
- Basic understanding of Gremlin Attacks (Attacks, Scenarios, Statuses).
1. Configure Gremlin Webhooks
Gremlin uses webhooks to notify external systems of key changes in an attack or scenario lifecycle. You will configure one webhook to send state changes directly to the Callgoose SQIBS endpoint.
1.1. Access Gremlin Webhook Settings
- Log in to the Gremlin UI.
- Navigate to:Settings → Webhooks
- Click New Webhook.
1.2. Configure the Callgoose Webhook
Provide the following configuration details in the Gremlin UI:
- Name: Callgoose SQIBS Incident Bridge
- URL: Paste the full Callgoose SQIBS Webhook URL
- Request Method: POST
- Request Body: JSON (Gremlin uses a standard JSON payload).
- Events: Select the following events, which cover all incident lifecycle stages:
- Attack: Started
- Attack: Halted
- Scenario: Started
- Scenario: Halted
1.3. Gremlin Payload Key Fields
Gremlin sends a rich JSON payload. We will use the following standard Gremlin keys for mapping incidents in Callgoose SQIBS:
- event_type: The status key (e.g., "Attack: Started", "Attack: Halted"). Callgoose uses this for Trigger/Resolve filters.
- attackId / scenarioId: The unique identifier for the specific run. Callgoose uses this as the Incident Mapped With key.
- name: The name of the attack or scenario. Callgoose uses this for the Incident Title.
2. Obtain the Callgoose SQIBS Webhook URL
Use the Callgoose process endpoint with token authentication:
https://****.callgoose.com/v1/process?from=gremlin&token=xxxx
Keep the API token private and never store it in exposed configuration sources.
3. Example Payloads from Gremlin
The Gremlin payload is automatically generated. The critical distinction for Callgoose SQIBS is the value of the event_type field.
3.1. Trigger Payload (Attack Started)
When an attack or scenario starts, the payload includes the identifier and the initiation status.
JSON
{
"teamId": "e7352a6b-a9a0-513c-81e4-980f680a70c4",
"teamName": "MyChaosTeam",
"attackId": "cpu-spike-20251209-1",
"name": "CPU Stress Test on Web App",
"event_type": "Attack: Started",
"attackStatus": "RUNNING",
"source": "WebApp",
"time": "2025-12-09T12:00:00Z",
"targets": {
"hosts": ["ip-10-0-1-5"],
"containers": null
}
}
3.2. Resolve Payload (Attack Halted)
When the attack completes or is manually halted, the payload is sent with the completion status.
JSON
{
"teamId": "e7352a6b-a9a0-513c-81e4-980f680a70c4",
"teamName": "MyChaosTeam",
"attackId": "cpu-spike-20251209-1",
"name": "CPU Stress Test on Web App",
"event_type": "Attack: Halted",
"attackStatus": "FINISHED",
"source": "WebApp",
"time": "2025-12-09T12:01:00Z",
"targets": {
"hosts": ["ip-10-0-1-5"],
"containers": null
}
}
4. Configure API Filters in Callgoose SQIBS
You must set up two filters within the Callgoose SQIBS API Integration section: a Trigger Filter to create the incident and a Resolve Filter to close it.
4.1. Trigger Filter (Create Incident)
This filter uses the Gremlin event_type to identify the start of an experiment.
- Payload JSON Key: "event_type"
- Key Value Contains: Started (This matches both Attack: Started and Scenario: Started).
- Map Incident With: "attackId"(Use this field to link the incident to the unique Gremlin run ID).
- Incident Title From: "name"
- Incident Description From: (Leave empty to include the full JSON payload for rich context, or use a specific key like targets.hosts for a brief description).
4.2. Resolve Filter (Resolve Incident)
This filter uses the Gremlin event_type to identify the end of an experiment and matches the incident key.
- Payload JSON Key: "event_type"
- Key Value Contains: Halted (This matches both Attack: Halted and Scenario: Halted).
- Incident Mapped With: "attackId"(This key must exactly match the Map Incident With key from the Trigger Filter).
Refer to the API Filter Instructions and FAQ for more details.
5. Testing the Integration
- Initiate a Gremlin Attack Log in to the Gremlin UI and start a controlled, low-impact Attack (e.g., a 10-second CPU spike on a single staging host).
- Check Gremlin Webhook Logs In the Gremlin Webhook configuration page, verify that the webhook fired successfully and received a positive status code (e.g., 200/202) from the Callgoose URL.
- Check Callgoose Incident Dashboard (Trigger) Verify:
- An Incident is created in Callgoose SQIBS immediately.
- The Incident Title and Description match the Gremlin name and payload details.
- The incident key is correctly set to the Gremlin attackId.
- Halt the Gremlin Attack Allow the attack to finish naturally or manually stop it using the HALT button in the Gremlin UI.
- Confirm Resolution (Resolve) Verify the corresponding incident in Callgoose SQIBS automatically transitions to a Resolved status.
6. Troubleshooting
- No incident created: Check the Gremlin webhook logs for delivery errors. Ensure the Callgoose Webhook URL and API token are correct. In Callgoose, check the API logs to see if the payload was received but rejected by the filter.
- Incident not resolving: The attackId sent in the resolve payload must be identical to the ID in the trigger payload. Check that the Resolve filter's Key Value Contains field (Halted) is exactly what is sent by Gremlin.
- Payload data incorrect: Use the API Log section in Callgoose SQIBS to view the raw JSON sent by Gremlin and adjust your filter keys (event_type, attackId) accordingly.
7. Conclusion
You now have a fully operational integration between Gremlin and Callgoose SQIBS using direct webhooks. Gremlin's Chaos Engineering events are automatically translated into structured Callgoose incidents, enabling your team to monitor, manage, and track the impact of resilience testing alongside real production incidents.
For further customization, refer to the official documentation for both Gremlin and Callgoose SQIBS:
- Callgoose SQIBS API Token Documentation
- Callgoose SQIBS API Endpoint Documentation
- API Filter Instructions and FAQ
- How to Send API
- gremlin documentation
