🔍🌐 Open Sesame - Finding Exposed Services Across Your Cloud Estate • jared gore

Introduction: The Security Visibility Problem

If you’ve spent any time in cloud security, you’re familiar with the term CSPM (Cloud Security Posture Management). These commercial tools promise to monitor your cloud environments, detect misconfigurations, and keep you safe from the ever-growing list of cloud security threats. They’re also expensive—often costing thousands of dollars per month, which puts them out of reach for smaller teams, educational environments, or labs.

Managing security across multiple cloud providers introduces significant challenges:

Complexity: Each cloud has its own security model and configuration paradigms
Lack of unified visibility: Different interfaces for each cloud provider
Alert fatigue: Commercial tools often overwhelm with alerts, many irrelevant
Cost barriers: Enterprise CSPM solutions often charge per-resource pricing

And perhaps most critically, firewall and security group misconfigurations remain one of the leading causes of breaches. A single mistaken rule allowing 0.0.0.0/0 access to sensitive ports can compromise your entire environment.

But what if I told you that with a bit of SQL knowledge and some free tools, you could build much of this capability yourself?

Enter DuckDB: The Hero We Need

DuckDB is a lightweight, in-process analytical database that’s perfect for security analysis. Think of it as “SQLite for analytics” with some key advantages that make it ideal for DIY CSPM:

Ease of Use: Zero-configuration setup, familiar SQL interface
Performance: Lightning-fast for analytical queries, even with millions of rows
Versatility: Excellent at ingesting JSON (the native output format for cloud CLIs)
Cost-Effective: Completely open-source and free

In this post, I’ll show you how to use DuckDB and SQL to create your own “CSPM at home” solution that can:

Collect resource configurations from AWS, Azure, and GCP
Analyze them for security issues using SQL
Track changes over time
Generate reports on your security posture

This approach won’t replace all the features of a commercial CSPM, but it can provide remarkable value at a fraction of the cost—or completely free for small environments.

Prerequisites: Setting Up Your Environment

Before we begin, you’ll need the following tools installed and configured:

DuckDB: Install with brew install duckdb on macOS, or download from duckdb.org for other platforms
Cloud CLI tools:
- AWS CLI (aws): Configured with credentials having at least read access to EC2, VPC, and IAM
- Azure CLI (az): Authenticated to your Azure subscriptions
- Google Cloud CLI (gcloud): Configured for your GCP projects

Pro-Tip: For production use, create specific IAM roles or service accounts with minimal permissions needed for security auditing. For AWS, you’d want permissions like ec2:Describe*, iam:List*, etc.

Let’s create a project directory to organize our work:

mkdir cloud-cspm && cd cloud-cspm

If you’re following along with our honeypot lab setup from the previous post, you already have the perfect environment to experiment with these techniques.

Part 1: Collecting Cloud Resources with CLI Tools

Let’s start with the simplest approach: using the built-in CLI tools to extract resource configurations and pipe them directly into DuckDB.

AWS Resources

# Collect EC2 instance information
aws ec2 describe-instances > aws_instances.json

# Collect security group information
aws ec2 describe-security-groups > aws_security_groups.json

# Collect VPC information
aws ec2 describe-vpcs > aws_vpcs.json

# Start DuckDB and load the data
duckdb cspm.db

The describe-security-groups command fetches all security group configurations in the default region or the region configured in your AWS CLI profile. The output is JSON, which DuckDB can directly consume without any transformation.

Azure Resources

# Collect virtual machine information
az vm list --show-details > azure_vms.json

# Collect network security group information
az network nsg list > azure_nsgs.json

# Collect network security group rules
az network nsg rule list --include-default > azure_nsg_rules.json

For Azure, the first command lists all NSGs. You’d typically loop through these or target a specific NSG for rule extraction. The second command retrieves rules for a specific NSG. For a full audit, you’d script this to iterate over all NSGs identified by the first command.

GCP Resources

# Collect compute instances
gcloud compute instances list --format=json > gcp_instances.json

# Collect firewall rules
gcloud compute firewall-rules list --format=json > gcp_firewall_rules.json

# Collect VPC networks
gcloud compute networks list --format=json > gcp_networks.json

The --format=json flag ensures we get structured JSON output that DuckDB can process.

Part 2: DuckDB Table Creation - Structuring Your Security Data

Now that we have our raw data, let’s use DuckDB to structure it. DuckDB’s read_json_auto function is incredibly useful for inferring schemas from JSON files. We’ll also use UNNEST to expand nested JSON arrays into rows.

AWS Security Group Tables

-- Create a table for EC2 instances
CREATE TABLE aws_instances AS
SELECT 
  r.id as instance_id,
  r.instanceType as instance_type,
  r.privateIpAddress as private_ip,
  r.publicIpAddress as public_ip,
  r.state->>'Name' as state,
  r.vpcId as vpc_id,
  r.subnetId as subnet_id,
  r.keyName as key_name,
  r.iamInstanceProfile->>'arn' as instance_profile
FROM read_json_auto('aws_instances.json')
  CROSS JOIN UNNEST(Reservations) as res
  CROSS JOIN UNNEST(res.Instances) as r;

-- Create a table for security groups with CIDR sources
CREATE TABLE aws_security_group_ingress_cidr_rules AS
SELECT
    sg.GroupId AS group_id,
    sg.GroupName AS group_name,
    sg.VpcId AS vpc_id,
    sg.Description AS group_description,
    'Ingress' AS direction,
    perm.IpProtocol AS ip_protocol,
    perm.FromPort AS from_port,
    perm.ToPort AS to_port,
    cidr.CidrIp AS source_cidr,
    NULL AS source_cidr_description
FROM
    read_json_auto('aws_security_groups.json') AS root_data
    CROSS JOIN UNNEST(root_data.SecurityGroups) AS sg_list(sg)
    CROSS JOIN UNNEST(sg.IpPermissions) AS perm_list(perm)
    CROSS JOIN UNNEST(perm.IpRanges) AS cidr_list(cidr);

The CROSS JOIN UNNEST pattern is powerful for flattening the nested JSON structure that AWS returns. Each security group contains an array of permissions, and each permission contains an array of CIDR ranges, which we transform into individual rows.

Azure NSG Rules Table

-- Create a table for Azure VMs
CREATE TABLE azure_vms AS
SELECT
  id,
  name,
  resourceGroup as resource_group,
  location,
  hardwareProfile->>'vmSize' as vm_size,
  properties->>'vmId' as vm_id,
  properties->>'provisioningState' as state,
  networkProfile->'networkInterfaces'->0->>'id' as primary_nic,
  storageProfile->'osDisk'->>'osType' as os_type,
  tags
FROM read_json_auto('azure_vms.json');

-- Create a table for NSG rules
CREATE TABLE azure_nsg_rules AS
SELECT
  'Azure' as cloud_provider,
  id as rule_id,
  json_extract_string(properties, '$.networkSecurityGroup.id') AS nsg_id,
  name as rule_name,
  properties->>'protocol' as protocol,
  properties->>'sourceAddressPrefix' as source_cidr,
  properties->>'destinationPortRange' as port_range,
  CASE 
    WHEN properties->>'destinationPortRange' = '*' THEN NULL
    WHEN properties->>'destinationPortRange' = '' THEN NULL
    WHEN POSITION('-' IN properties->>'destinationPortRange') > 0 THEN 
      CAST(SPLIT_PART(properties->>'destinationPortRange', '-', 1) AS INTEGER) 
    ELSE CAST(properties->>'destinationPortRange' AS INTEGER)
  END as port_from,
  CASE 
    WHEN properties->>'destinationPortRange' = '*' THEN NULL
    WHEN properties->>'destinationPortRange' = '' THEN NULL
    WHEN POSITION('-' IN properties->>'destinationPortRange') > 0 THEN 
      CAST(SPLIT_PART(properties->>'destinationPortRange', '-', 2) AS INTEGER)
    ELSE CAST(properties->>'destinationPortRange' AS INTEGER)
  END as port_to,
  properties->>'access' as access,
  properties->>'direction' as direction,
  properties->>'priority' as priority
FROM read_json_auto('azure_nsg_rules.json');

Notice the handling of destinationPortRange - we need special handling for ’*’, empty strings, and ranges with hyphens. This illustrates one of the challenges of working with cloud provider APIs: handling diverse data formats.

GCP Firewall Rules Table

-- Create a table for GCP instances
CREATE TABLE gcp_instances AS
SELECT
  id,
  name,
  machineType,
  zone,
  status,
  networkInterfaces->0->>'networkIP' as internal_ip,
  networkInterfaces->0->'accessConfigs'->0->>'natIP' as external_ip,
  serviceAccounts->0->>'email' as service_account,
  labels,
  creationTimestamp
FROM read_json_auto('gcp_instances.json');

-- Create a table for firewall rules with security focus
CREATE TABLE gcp_firewall_rules AS
SELECT
  id,
  name,
  network,
  direction,
  priority,
  array_to_string(sourceRanges, ',') AS source_cidr,
  CASE
    WHEN sourceRanges IS NOT NULL 
      AND array_contains(sourceRanges, '0.0.0.0/0') THEN TRUE
    ELSE FALSE
  END AS internet_exposed,
  CASE
    WHEN name LIKE '%ssh%' THEN 'SSH'
    WHEN name LIKE '%rdp%' THEN 'RDP'
    WHEN name LIKE '%all%' THEN 'ALL TRAFFIC'
    ELSE 'OTHER'
  END AS service,
  CASE
    WHEN name LIKE '%all%' THEN 'CRITICAL'
    WHEN name LIKE '%ssh%' OR name LIKE '%rdp%' THEN 'HIGH'
    ELSE 'MEDIUM'
  END AS risk_level
FROM read_json_auto('gcp_firewall_rules.json');

We’ve added derived columns like internet_exposed, service, and risk_level based on common patterns in GCP firewall rule names and configurations. This kind of enrichment makes our later analysis more powerful.

Part 3: Finding Security Issues with SQL Queries

Now that we have our cloud resource data in DuckDB tables, we can write SQL queries to find security issues. Here are some examples:

AWS: Public EC2 Instances with Admin IAM Roles

-- This query identifies AWS instances with public IPs that also have admin-like IAM roles attached
SELECT 
  i.instance_id,
  i.instance_type,
  i.public_ip,
  i.instance_profile
FROM 
  aws_instances i
WHERE 
  i.public_ip IS NOT NULL
  AND (
    i.instance_profile LIKE '%admin%'
    OR i.instance_profile LIKE '%Admin%'
    OR i.instance_profile LIKE '%full%'
  );

This query identifies a particularly concerning configuration: instances that are both internet-accessible (have public IPs) and possess administrative IAM roles. Such instances could be prime targets for attackers, as compromising them would provide both network access and elevated privileges.

AWS: Security Groups with Unrestricted Access

-- Find security groups with internet access (0.0.0.0/0)
SELECT 
  group_name, 
  ip_protocol, 
  from_port, 
  to_port, 
  source_cidr,
  CASE 
    WHEN ip_protocol = '-1' THEN 'ALL TRAFFIC (CRITICAL)'
    WHEN from_port = 22 THEN 'SSH (HIGH)'
    WHEN from_port = 3389 THEN 'RDP (HIGH)'
    WHEN from_port = 3306 THEN 'MySQL (HIGH)'
    WHEN from_port = 5432 THEN 'PostgreSQL (HIGH)'
    WHEN from_port = 1433 THEN 'MSSQL (HIGH)'
    WHEN from_port = 80 OR from_port = 443 THEN 'Web (MEDIUM)'
    ELSE 'PORT ' || from_port || ' (LOW)'
  END as service_risk
FROM aws_security_group_ingress_cidr_rules
WHERE source_cidr = '0.0.0.0/0'
ORDER BY 
  CASE 
    WHEN ip_protocol = '-1' THEN 1
    WHEN from_port IN (22, 3389, 3306) THEN 2
    ELSE 3
  END,
  group_name;

This query identifies AWS Security Groups allowing unrestricted access from the internet (0.0.0.0/0). The CASE statement for service_risk categorizes different ports by their risk level, helping prioritize findings. The ORDER BY clause ensures the most critical findings (all traffic) appear first, followed by high-risk services like SSH and RDP.

Azure: VMs with Public IPs

-- Find Azure VMs with public IPs
SELECT
  id,
  name,
  resource_group,
  vm_size,
  CONCAT(
    SPLIT_PART(id, '/', 9), -- Subscription ID
    ':', name
  ) AS resource_id
FROM
  azure_vms
WHERE
  primary_nic IS NOT NULL
  AND primary_nic LIKE '%publicIPAddresses%';

This query identifies Azure VMs that have public IPs, which might indicate unnecessary internet exposure. The CONCAT function creates a unique identifier that combines the subscription ID and VM name for easier tracking.

Azure: NSG Rules Allowing Broad Access

-- Find NSG rules allowing broad access
SELECT
  nsg_id,
  rule_name,
  protocol,
  source_cidr,
  port_range,
  port_from,
  port_to,
  priority,
  CASE 
    WHEN protocol = '*' THEN 'ALL TRAFFIC (CRITICAL)'
    WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
    ELSE 'MEDIUM'
  END as service_risk
FROM azure_nsg_rules
WHERE 
  (source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
  AND access = 'Allow'
ORDER BY 
  CASE 
    WHEN protocol = '*' THEN 1
    WHEN port_from IN (22, 3389, 3306) THEN 2
    ELSE 3
  END,
  priority;

This query identifies Azure Network Security Group rules that allow inbound traffic from the internet. We need to handle multiple forms of “internet” in Azure’s syntax: ‘0.0.0.0/0’, ’*’, and ‘Internet’. The query prioritizes findings by the protocol and port, putting the most critical ones first.

GCP: Instances with Public IPs

-- Find GCP instances with public IPs
SELECT
  name,
  machineType,
  zone,
  external_ip,
  service_account
FROM
  gcp_instances
WHERE
  external_ip IS NOT NULL;

This query identifies GCP instances with public IP addresses, which might indicate unnecessary internet exposure.

GCP: Overly Permissive Firewall Rules

-- Find internet-exposed GCP firewall rules
SELECT
  name,
  direction,
  source_cidr,
  service,
  priority,
  risk_level
FROM gcp_firewall_rules
WHERE 
  internet_exposed = TRUE
  AND direction = 'INGRESS'
ORDER BY
  CASE
    WHEN risk_level = 'CRITICAL' THEN 1
    WHEN risk_level = 'HIGH' THEN 2
    ELSE 3
  END,
  name;

This query leverages the derived columns we created during table setup to quickly identify internet-exposed firewall rules. The ORDER BY clause ensures the most critical findings appear first.

Cross-Cloud Query: Count of Vulnerable Resources

-- Count of public-facing resources across clouds
SELECT
  'AWS' as cloud,
  COUNT(*) as public_instances
FROM
  aws_instances
WHERE
  public_ip IS NOT NULL

UNION ALL

SELECT
  'Azure' as cloud,
  COUNT(*) as public_instances
FROM
  azure_vms
WHERE
  primary_nic IS NOT NULL
  AND primary_nic LIKE '%publicIPAddresses%'

UNION ALL

SELECT
  'GCP' as cloud,
  COUNT(*) as public_instances
FROM
  gcp_instances
WHERE
  external_ip IS NOT NULL;

The real magic happens when we combine data from all clouds for a single pane of glass view. This query provides a simple count of public-facing resources across your multi-cloud environment, giving you a quick overview of your potential attack surface.

Part 4: Comprehensive Cross-Cloud Security Analysis

For a more unified view across cloud providers, we can use Common Table Expressions (CTEs) to normalize the data from different sources:

-- Comprehensive cross-cloud security findings
WITH aws_findings AS (
  SELECT
    'AWS' AS cloud,
    group_name AS resource_name,
    'Ingress' AS direction,
    ip_protocol AS protocol,
    CAST(from_port AS VARCHAR) || '-' || CAST(to_port AS VARCHAR) AS ports,
    source_cidr AS cidr,
    CASE 
      WHEN ip_protocol = '-1' THEN 'CRITICAL'
      WHEN from_port IN (22, 3389, 3306) THEN 'HIGH'
      ELSE 'MEDIUM'
    END AS risk_level,
    CASE 
      WHEN ip_protocol = '-1' THEN 'ALL TRAFFIC'
      WHEN from_port = 22 THEN 'SSH'
      WHEN from_port = 3389 THEN 'RDP'
      WHEN from_port = 3306 THEN 'MySQL'
      WHEN from_port = 80 THEN 'HTTP'
      WHEN from_port = 443 THEN 'HTTPS'
      ELSE 'PORT ' || from_port
    END AS service
  FROM aws_security_group_ingress_cidr_rules
  WHERE source_cidr = '0.0.0.0/0'
),
azure_findings AS (
  SELECT
    'Azure' AS cloud,
    nsg_id || '/' || rule_name AS resource_name,
    direction AS direction,
    protocol,
    port_range AS ports,
    source_cidr AS cidr,
    CASE 
      WHEN protocol = '*' THEN 'CRITICAL'
      WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
      ELSE 'MEDIUM'
    END AS risk_level,
    CASE 
      WHEN protocol = '*' THEN 'ALL TRAFFIC'
      WHEN port_from = 22 THEN 'SSH'
      WHEN port_from = 3389 THEN 'RDP'
      WHEN port_from = 3306 THEN 'MySQL'
      WHEN port_from = 80 THEN 'HTTP'
      WHEN port_from = 443 THEN 'HTTPS'
      ELSE 'PORT ' || port_from
    END AS service
  FROM azure_nsg_rules
  WHERE 
    (source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
    AND access = 'Allow'
),
gcp_findings AS (
  SELECT
    'GCP' AS cloud,
    name AS resource_name,
    direction,
    'tcp' AS protocol,
    service AS ports,
    source_cidr AS cidr,
    risk_level,
    service
  FROM gcp_firewall_rules
  WHERE 
    internet_exposed = TRUE
    AND direction = 'INGRESS'
)

-- Union all findings together
SELECT 
  cloud,
  resource_name,
  direction,
  service,
  protocol,
  ports,
  cidr,
  risk_level
FROM (
  SELECT * FROM aws_findings
  UNION ALL
  SELECT * FROM azure_findings
  UNION ALL
  SELECT * FROM gcp_findings
) AS all_findings
ORDER BY
  CASE 
    WHEN risk_level = 'CRITICAL' THEN 1
    WHEN risk_level = 'HIGH' THEN 2
    ELSE 3
  END,
  cloud,
  resource_name;

Notice how each CTE (aws_findings, azure_findings, gcp_findings) transforms its cloud-specific data into a common schema with fields like cloud, resource_name, risk_level, etc. This normalization is key to unified analysis.

The UNION ALL combines the results into a single unified view, allowing you to quickly identify the highest-risk open firewall rules across your entire multi-cloud estate.

Part 5: Security Posture Summary Dashboard

Beyond individual findings, we can create summary statistics for a high-level overview of our security posture:

-- Summary of internet-exposed resources by cloud and risk
WITH aws_summary AS (
  SELECT
    'AWS' AS cloud,
    CASE 
      WHEN ip_protocol = '-1' THEN 'CRITICAL'
      WHEN from_port IN (22, 3389, 3306) THEN 'HIGH'
      ELSE 'MEDIUM'
    END AS risk_level,
    COUNT(*) AS finding_count
  FROM aws_security_group_ingress_cidr_rules
  WHERE source_cidr = '0.0.0.0/0'
  GROUP BY risk_level
),
azure_summary AS (
  SELECT
    'Azure' AS cloud,
    CASE 
      WHEN protocol = '*' THEN 'CRITICAL'
      WHEN port_from IN (22, 3389, 3306) THEN 'HIGH'
      ELSE 'MEDIUM'
    END AS risk_level,
    COUNT(*) AS finding_count
  FROM azure_nsg_rules
  WHERE 
    (source_cidr = '0.0.0.0/0' OR source_cidr = '*' OR source_cidr = 'Internet')
    AND access = 'Allow'
  GROUP BY risk_level
),
gcp_summary AS (
  SELECT
    'GCP' AS cloud,
    risk_level,
    COUNT(*) AS finding_count
  FROM gcp_firewall_rules
  WHERE 
    internet_exposed = TRUE
    AND direction = 'INGRESS'
  GROUP BY risk_level
)

-- Combine all summaries
SELECT 
  cloud,
  risk_level,
  finding_count
FROM (
  SELECT * FROM aws_summary
  UNION ALL
  SELECT * FROM azure_summary
  UNION ALL
  SELECT * FROM gcp_summary
) AS all_summaries
ORDER BY
  CASE risk_level
    WHEN 'CRITICAL' THEN 1
    WHEN 'HIGH' THEN 2
    ELSE 3
  END,
  cloud;

This kind of summary is useful for reporting, tracking remediation progress, or quickly understanding risk distribution. You might output this to a CSV file for further visualization in a tool like Excel or Tableau.

Part 6: Taking It Further - Advanced Techniques

Automation with Python

While CLI commands work for ad-hoc analysis, Python scripts provide more flexibility and can handle pagination, error handling, and more complex data transformations. Here’s a simplified example Python script to collect and analyze resources from AWS:

import json
import subprocess
import duckdb
import pandas as pd
import boto3

# Connect to DuckDB
conn = duckdb.connect('cspm.db')

# AWS Collection
def collect_aws():
    ec2 = boto3.client('ec2')
    
    # Collect EC2 instances
    instances = []
    paginator = ec2.get_paginator('describe_instances')
    for page in paginator.paginate():
        for reservation in page['Reservations']:
            for instance in reservation['Instances']:
                instances.append({
                    'instance_id': instance.get('InstanceId'),
                    'instance_type': instance.get('InstanceType'),
                    'private_ip': instance.get('PrivateIpAddress'),
                    'public_ip': instance.get('PublicIpAddress'),
                    'state': instance.get('State', {}).get('Name'),
                    'vpc_id': instance.get('VpcId'),
                    'subnet_id': instance.get('SubnetId'),
                    'key_name': instance.get('KeyName'),
                    'iam_profile': instance.get('IamInstanceProfile', {}).get('Arn')
                })
    
    # Create DataFrame and load into DuckDB
    df = pd.DataFrame(instances)
    conn.execute('DROP TABLE IF EXISTS aws_instances')
    conn.execute('CREATE TABLE aws_instances AS SELECT * FROM df')
    
    # Security groups, IAM policies, etc. would follow a similar pattern

# Run collection
collect_aws()

# Run security queries
results = conn.execute("""
SELECT 
  instance_id,
  instance_type,
  public_ip,
  iam_profile
FROM 
  aws_instances
WHERE 
  public_ip IS NOT NULL
  AND iam_profile LIKE '%admin%'
""").fetchall()

print("Instances with public IPs and admin roles:")
for row in results:
    print(f"{row[0]}: {row[1]} - {row[2]} - {row[3]}")

# Close connection
conn.close()

This Python script could be scheduled to run daily, with results emailed to security teams or fed into a dashboard.

Historical Analysis for Drift Detection

One powerful use case is tracking changes over time to detect security drift. This can be done by:

Saving dated snapshots of your cloud configurations
Using DuckDB to compare snapshots and identify changes

-- Create a dated snapshot table
CREATE TABLE security_groups_20250505 AS 
SELECT * FROM aws_security_group_ingress_cidr_rules;

-- Later, comparing with a new snapshot
WITH new_rules AS (
  SELECT sg.* FROM aws_security_group_ingress_cidr_rules sg
  LEFT JOIN security_groups_20250505 old 
    ON sg.group_id = old.group_id 
    AND sg.from_port = old.from_port
    AND sg.to_port = old.to_port
    AND sg.source_cidr = old.source_cidr
  WHERE old.group_id IS NULL
),
removed_rules AS (
  SELECT old.* FROM security_groups_20250505 old
  LEFT JOIN aws_security_group_ingress_cidr_rules sg
    ON sg.group_id = old.group_id 
    AND sg.from_port = old.from_port
    AND sg.to_port = old.to_port
    AND sg.source_cidr = old.source_cidr
  WHERE sg.group_id IS NULL
)

SELECT 'Added' as change_type, * FROM new_rules
UNION ALL
SELECT 'Removed' as change_type, * FROM removed_rules
ORDER BY change_type, group_name;

This query identifies security group rules that have been added or removed since the last snapshot, helping detect unauthorized or risky changes.

Using Tailpipe for CloudTrail Analysis

For a more complete CSPM solution, you might want to combine configuration analysis with CloudTrail log analysis. As I covered in my previous post about Tailpipe, you can use this DuckDB-powered tool to analyze AWS CloudTrail logs for security-relevant events:

-- Find security group modifications in CloudTrail
SELECT 
  event_time,
  event_name,
  user_identity.arn AS user_arn,
  request_parameters,
  response_elements
FROM aws_cloudtrail_log
WHERE event_source = 'ec2.amazonaws.com'
  AND event_name LIKE '%SecurityGroup%'
ORDER BY event_time DESC;

This kind of analysis helps identify not just the current state of your environment, but also who made changes and when.

Part 7: Comparing with Commercial CSPM Solutions

While our DIY solution provides tremendous value, it’s worth understanding how it compares to commercial CSPM tools:

Feature	DIY CSPM	Commercial CSPM
Cost	Free to very low	$$$$ (often per-resource pricing)
Setup complexity	Medium	Low
Maintenance effort	Medium	Low
Customization	Very high	Limited
Coverage breadth	Limited to what you build	Extensive
Detection depth	High (custom queries)	Medium (predefined rules)
Update frequency	Manual	Automatic
Integration with workflows	Custom development	Built-in
Compliance reporting	Manual	Automated

The DIY approach shines when:

Budget is limited
You need custom detection logic
You want full control over your security data
You’re focused on specific cloud resources
You’re using this as a learning experience

Best Practices for DIY CSPM

Based on experience building and maintaining DIY CSPM solutions, here are some best practices:

Version your queries: Store them in a git repo so you can track changes over time
Create tables for benchmark results: Save the results of each scan to track improvements
Add resource ownership and tagging: Ensure you can track who owns each resource
Implement change detection: Alert on new public resources or security group changes
Automate remediation where possible: For critical issues, write scripts to fix them immediately
Establish escalation paths: Know who to contact when you find serious issues
Document exceptions: Track authorized policy exceptions so you don’t repeatedly alert on them

Conclusion: Level Up Your Security Data Magic

Commercial CSPM tools have their place, but you don’t always need to spend thousands of dollars to get meaningful cloud security insights. With DuckDB, SQL, and a bit of scripting, you can build a surprisingly powerful “CSPM at home” solution that focuses on the issues most relevant to your organization.

This approach not only saves money but also gives you deeper insight into your cloud infrastructure and builds valuable skills in cloud security and data analysis. As your environment grows, you can decide whether to continue scaling your DIY solution or transition to a commercial offering.

The goal isn’t to replicate every feature of a commercial CSPM, but to focus on the security issues that matter most to your organization, detect them reliably, and remediate them promptly.

What DIY security tools have you built? Share your experiences in the comments!

The scripts and queries in this post are provided as examples and should be tested and adapted to your specific environment before use in production.

🔍🌐 Open Sesame - Finding Exposed Services Across Your Cloud Estate

📚 DuckDB for Cloud Security