🔍🌐 Open Sesame - Finding Exposed Services Across Your Cloud Estate
📚 DuckDB for Cloud Security
This series explores how to leverage DuckDB for cloud security analysis, taking you from basic concepts to advanced security monitoring techniques. Learn to build your own CSPM solutions, analyze security tool outputs, and create powerful DIY security workflows.
Introduction: The Security Visibility Problem
If you’ve spent any time in cloud security, you’re familiar with the term CSPM (Cloud Security Posture Management). These commercial tools promise to monitor your cloud environments, detect misconfigurations, and keep you safe from the ever-growing list of cloud security threats. They’re also expensive—often costing thousands of dollars per month, which puts them out of reach for smaller teams, educational environments, or labs.
Managing security across multiple cloud providers introduces significant challenges:
- Complexity: Each cloud has its own security model and configuration paradigms
- Lack of unified visibility: Different interfaces for each cloud provider
- Alert fatigue: Commercial tools often overwhelm with alerts, many irrelevant
- Cost barriers: Enterprise CSPM solutions often charge per-resource pricing
And perhaps most critically, firewall and security group misconfigurations remain one of the leading causes of breaches. A single mistaken rule allowing 0.0.0.0/0 access to sensitive ports can compromise your entire environment.
But what if I told you that with a bit of SQL knowledge and some free tools, you could build much of this capability yourself?
Enter DuckDB: The Hero We Need
DuckDB is a lightweight, in-process analytical database that’s perfect for security analysis. Think of it as “SQLite for analytics” with some key advantages that make it ideal for DIY CSPM:
- Ease of Use: Zero-configuration setup, familiar SQL interface
- Performance: Lightning-fast for analytical queries, even with millions of rows
- Versatility: Excellent at ingesting JSON (the native output format for cloud CLIs)
- Cost-Effective: Completely open-source and free
In this post, I’ll show you how to use DuckDB and SQL to create your own “CSPM at home” solution that can:
- Collect resource configurations from AWS, Azure, and GCP
- Analyze them for security issues using SQL
- Track changes over time
- Generate reports on your security posture
This approach won’t replace all the features of a commercial CSPM, but it can provide remarkable value at a fraction of the cost—or completely free for small environments.
Prerequisites: Setting Up Your Environment
Before we begin, you’ll need the following tools installed and configured:
- DuckDB: Install with
brew install duckdb
on macOS, or download from duckdb.org for other platforms - Cloud CLI tools:
- AWS CLI (
aws
): Configured with credentials having at least read access to EC2, VPC, and IAM - Azure CLI (
az
): Authenticated to your Azure subscriptions - Google Cloud CLI (
gcloud
): Configured for your GCP projects
- AWS CLI (
Pro-Tip: For production use, create specific IAM roles or service accounts with minimal permissions needed for security auditing. For AWS, you’d want permissions like
ec2:Describe*
,iam:List*
, etc.
Let’s create a project directory to organize our work:
mkdir cloud-cspm && cd cloud-cspm
If you’re following along with our honeypot lab setup from the previous post, you already have the perfect environment to experiment with these techniques.
Part 1: Collecting Cloud Resources with CLI Tools
Let’s start with the simplest approach: using the built-in CLI tools to extract resource configurations and pipe them directly into DuckDB.
AWS Resources
# Collect EC2 instance information
aws ec2 describe-instances > aws_instances.json
# Collect security group information
aws ec2 describe-security-groups > aws_security_groups.json
# Collect VPC information
aws ec2 describe-vpcs > aws_vpcs.json
# Start DuckDB and load the data
duckdb cspm.db
The describe-security-groups
command fetches all security group configurations in the default region or the region configured in your AWS CLI profile. The output is JSON, which DuckDB can directly consume without any transformation.
Azure Resources
# Collect virtual machine information
az vm list --show-details > azure_vms.json
# Collect network security group information
az network nsg list > azure_nsgs.json
# Collect network security group rules
az network nsg rule list --include-default > azure_nsg_rules.json
For Azure, the first command lists all NSGs. You’d typically loop through these or target a specific NSG for rule extraction. The second command retrieves rules for a specific NSG. For a full audit, you’d script this to iterate over all NSGs identified by the first command.
GCP Resources
# Collect compute instances
gcloud compute instances list --format=json > gcp_instances.json
# Collect firewall rules
gcloud compute firewall-rules list --format=json > gcp_firewall_rules.json
# Collect VPC networks
gcloud compute networks list --format=json > gcp_networks.json
The --format=json
flag ensures we get structured JSON output that DuckDB can process.
Part 2: DuckDB Table Creation - Structuring Your Security Data
Now that we have our raw data, let’s use DuckDB to structure it. DuckDB’s read_json_auto
function is incredibly useful for inferring schemas from JSON files. We’ll also use UNNEST
to expand nested JSON arrays into rows.
AWS Security Group Tables
-- Create a table for EC2 instances
CREATE TABLE aws_instances AS
SELECT
r.id as instance_id,
r.instanceType as instance_type,
r.privateIpAddress as private_ip,
r.publicIpAddress as public_ip,
r.state->>'Name' as state,
r.vpcId as vpc_id,
r.subnetId as subnet_id,
r.keyName as key_name,
r.iamInstanceProfile->>'arn' as instance_profile
FROM read_json_auto('aws_instances.json')
CROSS JOIN UNNEST(Reservations) as res
CROSS JOIN UNNEST(res.Instances) as r;
-- Create a table for security groups with CIDR sources
CREATE TABLE aws_security_group_ingress_cidr_rules AS
SELECT
sg.GroupId AS group_id,
sg.GroupName AS group_name,
sg.VpcId AS vpc_id,
sg.Description AS group_description,
'Ingress' AS direction,
perm.IpProtocol AS ip_protocol,
perm.FromPort AS from_port,
perm.ToPort AS to_port,
cidr.CidrIp AS source_cidr,
NULL AS source_cidr_description
FROM
read_json_auto('aws_security_groups.json') AS root_data
CROSS JOIN UNNEST(root_data.SecurityGroups) AS sg_list(sg)
CROSS JOIN UNNEST(sg.IpPermissions) AS perm_list(perm)
CROSS JOIN UNNEST(perm.IpRanges) AS cidr_list(cidr);
The CROSS JOIN UNNEST
pattern is powerful for flattening the nested JSON structure that AWS returns. Each security group contains an array of permissions, and each permission contains an array of CIDR ranges, which we transform into individual rows.
Azure NSG Rules Table
-- Create a table for Azure VMs
CREATE TABLE azure_vms AS
SELECT
id,
name,
resourceGroup as resource_group,
location,
hardwareProfile->>'vmSize' as vm_size,
properties->>'vmId' as vm_id,
properties->>'provisioningState' as state,
networkProfile->'networkInterfaces'->0->>'id' as primary_nic,
storageProfile->'osDisk'->>'osType' as os_type,
tags
FROM read_json_auto('azure_vms.json');
-- Create a table for NSG rules
CREATE TABLE azure_nsg_rules AS
SELECT
'Azure' as cloud_provider,
id as rule_id,
json_extract_string(properties, '$.networkSecurityGroup.id') AS nsg_id,
name as rule_name,
properties->>'protocol' as protocol,
properties->>'sourceAddressPrefix' as source_cidr,
properties->>'destinationPortRange' as port_range,
CASE
WHEN properties->>'destinationPortRange' = '*' THEN NULL
WHEN properties->>'destinationPortRange' = '' THEN NULL
WHEN POSITION('-' IN properties->>'destinationPortRange') > 0 THEN
CAST(SPLIT_PART(properties->>'destinationPortRange', '-', 1) AS INTEGER)
ELSE CAST(properties->>'destinationPortRange' AS INTEGER)
END as port_from,
CASE
WHEN properties->>'destinationPortRange' = '*' THEN NULL
WHEN properties->>'destinationPortRange' = '' THEN NULL
WHEN POSITION('-' IN properties->>'destinationPortRange') > 0 THEN
CAST(SPLIT_PART(properties->>'destinationPortRange', '-', 2) AS INTEGER)
ELSE CAST(properties->>'destinationPortRange' AS INTEGER)
END as port_to,
properties->>'access' as access,
properties->>'direction' as direction,
properties->>'priority' as priority
FROM read_json_auto('azure_nsg_rules.json');
Notice the handling of destinationPortRange
- we need special handling for ’*’, empty strings, and ranges with hyphens. This illustrates one of the challenges of working with cloud provider APIs: handling diverse data formats.
GCP Firewall Rules Table
-- Create a table for GCP instances
CREATE TABLE gcp_instances AS
SELECT
id,
name,
machineType,
zone,
status,
networkInterfaces->0->>'networkIP' as internal_ip,
networkInterfaces->0->'accessConfigs'->0->>'natIP' as external_ip,
serviceAccounts->0->>'email' as service_account,
labels,
creationTimestamp
FROM read_json_auto('gcp_instances.json');
-- Create a table for firewall rules with security focus
CREATE TABLE gcp_firewall_rules AS
SELECT
id,
name,
network,
direction,
priority,
array_to_string(sourceRanges, ',') AS source_cidr,
CASE
WHEN sourceRanges IS NOT NULL
AND array_contains(sourceRanges, '0.0.0.0/0') THEN TRUE
ELSE FALSE
END AS internet_exposed,
CASE
WHEN name LIKE '%ssh%' THEN 'SSH'
WHEN name LIKE '%rdp%' THEN 'RDP'
WHEN name LIKE '%all%' THEN 'ALL TRAFFIC'
ELSE 'OTHER'
END AS service,
CASE
WHEN name LIKE '%all%' THEN 'CRITICAL'
WHEN name LIKE '%ssh%' OR name LIKE '%rdp%' THEN 'HIGH'
ELSE 'MEDIUM'
END AS risk_level
FROM read_json_auto('gcp_firewall_rules.json');
We’ve added derived columns like internet_exposed
, service
, and risk_level
based on common patterns in GCP firewall rule names and configurations. This kind of enrichment makes our later analysis more powerful.
Part 3: Finding Security Issues with SQL Queries
Now that we have our cloud resource data in DuckDB tables, we can write SQL queries to find security issues. Here are some examples:
AWS: Public EC2 Instances with Admin IAM Roles
-- This query identifies AWS instances with public IPs that also have admin-like IAM roles attached
SELECT
i.instance_id,
i.instance_type,
i.public_ip,
i.instance_profile
FROM
aws_instances i
WHERE
i.public_ip IS NOT NULL
AND (
i.instance_profile LIKE '%admin%'
OR i.instance_profile LIKE '%Admin%'
OR i.instance_profile LIKE '%full%'
);
This query identifies a particularly concerning configuration: instances that are both internet-accessible (have public IPs) and possess administrative IAM roles. Such instances could be prime targets for attackers, as compromising them would provide both network access and elevated privileges.
AWS: Security Groups with Unrestricted Access
-- Find security groups with internet access (0.0.0.0/0)
SELECT
group_name,
ip_protocol,
from_port,
to_port,
source_cidr,
CASE
WHEN ip_protocol = '-1' THEN 'ALL TRAFFIC (CRITICAL)'
WHEN from_port = 22 THEN 'SSH (HIGH)'
WHEN from_port = 3389 THEN 'RDP (HIGH)'
WHEN from_port = 3306 THEN 'MySQL (HIGH)'
WHEN from_port = 5432 THEN 'PostgreSQL (HIGH)'
WHEN from_port = 1433 THEN 'MSSQL (HIGH)'
WHEN from_port = 80 OR from_port = 443 THEN 'Web (MEDIUM)'
ELSE 'PORT ' || from_port || ' (LOW)'
END as service_risk
FROM aws_security_group_ingress_cidr_rules
WHERE source_cidr = '0.0.0.0/0'
ORDER BY
CASE
WHEN ip_protocol = '-1' THEN 1
WHEN from_port IN (22, 3389, 3306) THEN 2
ELSE 3
END,
group_name;
This query identifies AWS Security Groups allowing unrestricted access from the internet (0.0.0.0/0). The CASE
statement for service_risk
categorizes different ports by their risk level, helping prioritize findings. The ORDER BY
clause ensures the most critical findings (all traffic) appear first, followed by high-risk services like SSH and RDP.
Azure: VMs with Public IPs
-- Find Azure VMs with public IPs
SELECT
id,
name,
resource_group,
vm_size,
CONCAT(
SPLIT_PART(id, '/', 9), -- Subscription ID
':', name
) AS resource_id
FROM
azure_vms
WHERE
primary_nic IS NOT NULL
AND primary_nic LIKE '%publicIPAddresses%';
This query identifies Azure VMs that have public IPs, which might indicate unnecessary internet exposure. The CONCAT
function creates a unique identifier that combines the subscription ID and VM name for easier tracking.
Azure: NSG Rules Allowing Broad Access
-- Find NSG rules allowing broad access
SELECT
nsg_id,
rule_name,
protocol,
source_cidr,
port_range,
port_from,
port_to,
priority,
CASE
WHEN protocol = '*' THEN 'ALL TRAFFIC (CRITICAL)'
WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
ELSE 'MEDIUM'
END as service_risk
FROM azure_nsg_rules
WHERE
(source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
AND access = 'Allow'
ORDER BY
CASE
WHEN protocol = '*' THEN 1
WHEN port_from IN (22, 3389, 3306) THEN 2
ELSE 3
END,
priority;
This query identifies Azure Network Security Group rules that allow inbound traffic from the internet. We need to handle multiple forms of “internet” in Azure’s syntax: ‘0.0.0.0/0’, ’*’, and ‘Internet’. The query prioritizes findings by the protocol and port, putting the most critical ones first.
GCP: Instances with Public IPs
-- Find GCP instances with public IPs
SELECT
name,
machineType,
zone,
external_ip,
service_account
FROM
gcp_instances
WHERE
external_ip IS NOT NULL;
This query identifies GCP instances with public IP addresses, which might indicate unnecessary internet exposure.
GCP: Overly Permissive Firewall Rules
-- Find internet-exposed GCP firewall rules
SELECT
name,
direction,
source_cidr,
service,
priority,
risk_level
FROM gcp_firewall_rules
WHERE
internet_exposed = TRUE
AND direction = 'INGRESS'
ORDER BY
CASE
WHEN risk_level = 'CRITICAL' THEN 1
WHEN risk_level = 'HIGH' THEN 2
ELSE 3
END,
name;
This query leverages the derived columns we created during table setup to quickly identify internet-exposed firewall rules. The ORDER BY
clause ensures the most critical findings appear first.
Cross-Cloud Query: Count of Vulnerable Resources
-- Count of public-facing resources across clouds
SELECT
'AWS' as cloud,
COUNT(*) as public_instances
FROM
aws_instances
WHERE
public_ip IS NOT NULL
UNION ALL
SELECT
'Azure' as cloud,
COUNT(*) as public_instances
FROM
azure_vms
WHERE
primary_nic IS NOT NULL
AND primary_nic LIKE '%publicIPAddresses%'
UNION ALL
SELECT
'GCP' as cloud,
COUNT(*) as public_instances
FROM
gcp_instances
WHERE
external_ip IS NOT NULL;
The real magic happens when we combine data from all clouds for a single pane of glass view. This query provides a simple count of public-facing resources across your multi-cloud environment, giving you a quick overview of your potential attack surface.
Part 4: Comprehensive Cross-Cloud Security Analysis
For a more unified view across cloud providers, we can use Common Table Expressions (CTEs) to normalize the data from different sources:
-- Comprehensive cross-cloud security findings
WITH aws_findings AS (
SELECT
'AWS' AS cloud,
group_name AS resource_name,
'Ingress' AS direction,
ip_protocol AS protocol,
CAST(from_port AS VARCHAR) || '-' || CAST(to_port AS VARCHAR) AS ports,
source_cidr AS cidr,
CASE
WHEN ip_protocol = '-1' THEN 'CRITICAL'
WHEN from_port IN (22, 3389, 3306) THEN 'HIGH'
ELSE 'MEDIUM'
END AS risk_level,
CASE
WHEN ip_protocol = '-1' THEN 'ALL TRAFFIC'
WHEN from_port = 22 THEN 'SSH'
WHEN from_port = 3389 THEN 'RDP'
WHEN from_port = 3306 THEN 'MySQL'
WHEN from_port = 80 THEN 'HTTP'
WHEN from_port = 443 THEN 'HTTPS'
ELSE 'PORT ' || from_port
END AS service
FROM aws_security_group_ingress_cidr_rules
WHERE source_cidr = '0.0.0.0/0'
),
azure_findings AS (
SELECT
'Azure' AS cloud,
nsg_id || '/' || rule_name AS resource_name,
direction AS direction,
protocol,
port_range AS ports,
source_cidr AS cidr,
CASE
WHEN protocol = '*' THEN 'CRITICAL'
WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
ELSE 'MEDIUM'
END AS risk_level,
CASE
WHEN protocol = '*' THEN 'ALL TRAFFIC'
WHEN port_from = 22 THEN 'SSH'
WHEN port_from = 3389 THEN 'RDP'
WHEN port_from = 3306 THEN 'MySQL'
WHEN port_from = 80 THEN 'HTTP'
WHEN port_from = 443 THEN 'HTTPS'
ELSE 'PORT ' || port_from
END AS service
FROM azure_nsg_rules
WHERE
(source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
AND access = 'Allow'
),
gcp_findings AS (
SELECT
'GCP' AS cloud,
name AS resource_name,
direction,
'tcp' AS protocol,
service AS ports,
source_cidr AS cidr,
risk_level,
service
FROM gcp_firewall_rules
WHERE
internet_exposed = TRUE
AND direction = 'INGRESS'
)
-- Union all findings together
SELECT
cloud,
resource_name,
direction,
service,
protocol,
ports,
cidr,
risk_level
FROM (
SELECT * FROM aws_findings
UNION ALL
SELECT * FROM azure_findings
UNION ALL
SELECT * FROM gcp_findings
) AS all_findings
ORDER BY
CASE
WHEN risk_level = 'CRITICAL' THEN 1
WHEN risk_level = 'HIGH' THEN 2
ELSE 3
END,
cloud,
resource_name;
Notice how each CTE (aws_findings, azure_findings, gcp_findings) transforms its cloud-specific data into a common schema with fields like cloud, resource_name, risk_level, etc. This normalization is key to unified analysis.
The UNION ALL
combines the results into a single unified view, allowing you to quickly identify the highest-risk open firewall rules across your entire multi-cloud estate.
Part 5: Security Posture Summary Dashboard
Beyond individual findings, we can create summary statistics for a high-level overview of our security posture:
-- Summary of internet-exposed resources by cloud and risk
WITH aws_summary AS (
SELECT
'AWS' AS cloud,
CASE
WHEN ip_protocol = '-1' THEN 'CRITICAL'
WHEN from_port IN (22, 3389, 3306) THEN 'HIGH'
ELSE 'MEDIUM'
END AS risk_level,
COUNT(*) AS finding_count
FROM aws_security_group_ingress_cidr_rules
WHERE source_cidr = '0.0.0.0/0'
GROUP BY risk_level
),
azure_summary AS (
SELECT
'Azure' AS cloud,
CASE
WHEN protocol = '*' THEN 'CRITICAL'
WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
ELSE 'MEDIUM'
END AS risk_level,
COUNT(*) AS finding_count
FROM azure_nsg_rules
WHERE
(source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
AND access = 'Allow'
GROUP BY risk_level
),
gcp_summary AS (
SELECT
'GCP' AS cloud,
risk_level,
COUNT(*) AS finding_count
FROM gcp_firewall_rules
WHERE
internet_exposed = TRUE
AND direction = 'INGRESS'
GROUP BY risk_level
)
-- Combine all summaries
SELECT
cloud,
risk_level,
finding_count
FROM (
SELECT * FROM aws_summary
UNION ALL
SELECT * FROM azure_summary
UNION ALL
SELECT * FROM gcp_summary
) AS all_summaries
ORDER BY
CASE risk_level
WHEN 'CRITICAL' THEN 1
WHEN 'HIGH' THEN 2
ELSE 3
END,
cloud;
This kind of summary is useful for reporting, tracking remediation progress, or quickly understanding risk distribution. You might output this to a CSV file for further visualization in a tool like Excel or Tableau.
Part 6: Taking It Further - Advanced Techniques
Automation with Python
While CLI commands work for ad-hoc analysis, Python scripts provide more flexibility and can handle pagination, error handling, and more complex data transformations. Here’s a simplified example Python script to collect and analyze resources from AWS:
import json
import subprocess
import duckdb
import pandas as pd
import boto3
# Connect to DuckDB
conn = duckdb.connect('cspm.db')
# AWS Collection
def collect_aws():
ec2 = boto3.client('ec2')
# Collect EC2 instances
instances = []
paginator = ec2.get_paginator('describe_instances')
for page in paginator.paginate():
for reservation in page['Reservations']:
for instance in reservation['Instances']:
instances.append({
'instance_id': instance.get('InstanceId'),
'instance_type': instance.get('InstanceType'),
'private_ip': instance.get('PrivateIpAddress'),
'public_ip': instance.get('PublicIpAddress'),
'state': instance.get('State', {}).get('Name'),
'vpc_id': instance.get('VpcId'),
'subnet_id': instance.get('SubnetId'),
'key_name': instance.get('KeyName'),
'iam_profile': instance.get('IamInstanceProfile', {}).get('Arn')
})
# Create DataFrame and load into DuckDB
df = pd.DataFrame(instances)
conn.execute('DROP TABLE IF EXISTS aws_instances')
conn.execute('CREATE TABLE aws_instances AS SELECT * FROM df')
# Security groups, IAM policies, etc. would follow a similar pattern
# Run collection
collect_aws()
# Run security queries
results = conn.execute("""
SELECT
instance_id,
instance_type,
public_ip,
iam_profile
FROM
aws_instances
WHERE
public_ip IS NOT NULL
AND iam_profile LIKE '%admin%'
""").fetchall()
print("Instances with public IPs and admin roles:")
for row in results:
print(f"{row[0]}: {row[1]} - {row[2]} - {row[3]}")
# Close connection
conn.close()
This Python script could be scheduled to run daily, with results emailed to security teams or fed into a dashboard.
Historical Analysis for Drift Detection
One powerful use case is tracking changes over time to detect security drift. This can be done by:
- Saving dated snapshots of your cloud configurations
- Using DuckDB to compare snapshots and identify changes
-- Create a dated snapshot table
CREATE TABLE security_groups_20250505 AS
SELECT * FROM aws_security_group_ingress_cidr_rules;
-- Later, comparing with a new snapshot
WITH new_rules AS (
SELECT sg.* FROM aws_security_group_ingress_cidr_rules sg
LEFT JOIN security_groups_20250505 old
ON sg.group_id = old.group_id
AND sg.from_port = old.from_port
AND sg.to_port = old.to_port
AND sg.source_cidr = old.source_cidr
WHERE old.group_id IS NULL
),
removed_rules AS (
SELECT old.* FROM security_groups_20250505 old
LEFT JOIN aws_security_group_ingress_cidr_rules sg
ON sg.group_id = old.group_id
AND sg.from_port = old.from_port
AND sg.to_port = old.to_port
AND sg.source_cidr = old.source_cidr
WHERE sg.group_id IS NULL
)
SELECT 'Added' as change_type, * FROM new_rules
UNION ALL
SELECT 'Removed' as change_type, * FROM removed_rules
ORDER BY change_type, group_name;
This query identifies security group rules that have been added or removed since the last snapshot, helping detect unauthorized or risky changes.
Using Tailpipe for CloudTrail Analysis
For a more complete CSPM solution, you might want to combine configuration analysis with CloudTrail log analysis. As I covered in my previous post about Tailpipe, you can use this DuckDB-powered tool to analyze AWS CloudTrail logs for security-relevant events:
-- Find security group modifications in CloudTrail
SELECT
event_time,
event_name,
user_identity.arn AS user_arn,
request_parameters,
response_elements
FROM aws_cloudtrail_log
WHERE event_source = 'ec2.amazonaws.com'
AND event_name LIKE '%SecurityGroup%'
ORDER BY event_time DESC;
This kind of analysis helps identify not just the current state of your environment, but also who made changes and when.
Part 7: Comparing with Commercial CSPM Solutions
While our DIY solution provides tremendous value, it’s worth understanding how it compares to commercial CSPM tools:
Feature | DIY CSPM | Commercial CSPM |
---|---|---|
Cost | Free to very low | $$$$ (often per-resource pricing) |
Setup complexity | Medium | Low |
Maintenance effort | Medium | Low |
Customization | Very high | Limited |
Coverage breadth | Limited to what you build | Extensive |
Detection depth | High (custom queries) | Medium (predefined rules) |
Update frequency | Manual | Automatic |
Integration with workflows | Custom development | Built-in |
Compliance reporting | Manual | Automated |
The DIY approach shines when:
- Budget is limited
- You need custom detection logic
- You want full control over your security data
- You’re focused on specific cloud resources
- You’re using this as a learning experience
Best Practices for DIY CSPM
Based on experience building and maintaining DIY CSPM solutions, here are some best practices:
- Version your queries: Store them in a git repo so you can track changes over time
- Create tables for benchmark results: Save the results of each scan to track improvements
- Add resource ownership and tagging: Ensure you can track who owns each resource
- Implement change detection: Alert on new public resources or security group changes
- Automate remediation where possible: For critical issues, write scripts to fix them immediately
- Establish escalation paths: Know who to contact when you find serious issues
- Document exceptions: Track authorized policy exceptions so you don’t repeatedly alert on them
Conclusion: Level Up Your Security Data Magic
Commercial CSPM tools have their place, but you don’t always need to spend thousands of dollars to get meaningful cloud security insights. With DuckDB, SQL, and a bit of scripting, you can build a surprisingly powerful “CSPM at home” solution that focuses on the issues most relevant to your organization.
This approach not only saves money but also gives you deeper insight into your cloud infrastructure and builds valuable skills in cloud security and data analysis. As your environment grows, you can decide whether to continue scaling your DIY solution or transition to a commercial offering.
The goal isn’t to replicate every feature of a commercial CSPM, but to focus on the security issues that matter most to your organization, detect them reliably, and remediate them promptly.
What DIY security tools have you built? Share your experiences in the comments!
The scripts and queries in this post are provided as examples and should be tested and adapted to your specific environment before use in production.
~jared gore