🦆💻 DuckDB for Security Engineers: SQL Magic for Your Security Tool Outputs

Introduction

Let’s face it, the cyber security industry has a data problem. Despite all the fancy tools and platforms, a lot of us often end up staring at yet another CSV export.

The Job Description vs the Job meme with cybersecurity vendor logos constrasting with Excel

When I was working in security operations, I’d receive exports from tools like Wiz, Tenable, or cloud native services - all as CSV, JSON files. I needed answers like “Show me all internet-exposed critical assets with unpatched vulnerabilities” but found myself writing custom Python and Golang scripts or manually filtering spreadsheets.

Even worse, I’d often need to correlate findings across multiple tool outputs (“cross-reference EC2 instance ids Wiz flagged with this CSV of vulnerabilities from some other platform”), which meant copying data between spreadsheets or writing even more complex scripts.

Enter DuckDB and SQL

This is where DuckDB becomes a security analyst’s secret weapon. DuckDB is an in-process SQL database that:

Runs locally (no server setup)
Directly queries files without importing them
Supports virtually all common data formats (CSV, JSON, Excel, Parquet)
Uses SQL - the language of data wizards

If you’re already familiar with SQL, you’re halfway there. If not, think of SQL as a set of incantations for manipulating data:

SELECT what_you_want
FROM where_it_lives
WHERE conditions_you_care_about

a Wizard commanding data from various sources

There are multiple dialects of SQL, but DuckDB closely resembles Postgres style syntax. Let’s break down our magical incantation:

SELECT what_you_want - Specify which columns (like “Severity” or “ResourceName”) to include in your results
FROM where_it_lives - Identify your data source (CSV, JSON, or any file format)
WHERE conditions_you_care_about - Filter for what matters to you (like “WHERE Severity = ‘CRITICAL’”)

For example, to find all AWS EC2 instances with high privileges exposed to the internet, you might cast:

SELECT ResourceName, PublicIP
FROM read_csv('aws_instances.csv')
WHERE PrivilegeLevel = 'High' AND InternetExposed = 'true'

Why DuckDB for Security Analysis?

Unlike traditional databases:

Zero setup (download and go)
Queries files in place (no import/export headache)
Blazing fast (analyzes millions of rows in seconds)
Works with any security tool’s output files

Querying Security Tool Outputs

Let’s see DuckDB in action with common security scenarios:

Real-World Example: Wiz Security Findings

Security tools like Wiz export findings as CSV files. Here’s how to analyze them with DuckDB:

-- First, let's examine the structure of the Wiz export
SELECT * FROM read_csv('wiz_findings.csv', auto_detect=true) LIMIT 3;

This initial incantation is like a wizard’s “Reveal Essence” spell - it lets you quickly peek at what’s inside your mysterious artifact (the CSV file) without overwhelming yourself with its full contents. Just like a wizard might examine a small sample of a potion before attempting to understand it completely, this query shows you just enough rows to understand the data structure.

-- Find critical severity publicly exposed resources
SELECT 
    "Title",
    "Resource Name",
    "Resource Type",
    "Resource Region",
    "Resource Platform"
FROM read_csv('wiz_findings.csv')
WHERE 
    "Severity" = 'CRITICAL'
    AND "Status" = 'OPEN'
    AND "Title" LIKE '%publicly exposed%'
ORDER BY "Created At" DESC;

Think of this as casting “Locate Vulnerable Gateways” - a spell that reveals your realm’s most critical weaknesses exposed to outside threats. This query finds all your most severe security issues (“CRITICAL” severity) that remain unfixed (“OPEN” status) and specifically involve public exposure to the internet. The query orders results by creation date, so you see the most recently discovered vulnerabilities first - just as a vigilant wizard would prioritize the newest breaches in their magical barriers.

-- Find resources with multiple security issues
SELECT 
    "Resource Name",
    "Resource Type",
    COUNT(*) AS issue_count,
    STRING_AGG(DISTINCT "Title", ' | ') AS issues,
    MIN("Due At") AS earliest_due_date
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY "Resource Name", "Resource Type"
HAVING COUNT(*) > 1
ORDER BY issue_count DESC;

This is your “Reveal Compound Curses” spell, identifying resources suffering from multiple security ailments at once. Like a wizard examining a patient afflicted by several hexes, this query groups your findings by resource and counts how many issues affect each one. It concatenates all the distinct issue titles into a single text field (using STRING_AGG), showing you everything plaguing each resource. It also identifies the earliest due date to help prioritize your remediation efforts. The results are ordered by the resources with the most issues first - these are your most cursed assets!

-- Analyze findings by risk category
SELECT 
    t.risk_category,
    COUNT(*) AS finding_count
FROM read_csv('wiz_findings.csv') wf
CROSS JOIN LATERAL UNNEST(string_split(wf."Risks", ', ')) AS t(risk_category)
WHERE wf."Status" = 'OPEN'
GROUP BY t.risk_category
ORDER BY finding_count DESC;

Consider this your “Categorize Arcane Threats” divination. This sophisticated spell breaks apart the comma-separated risk categories in your data and counts each type individually. The CROSS JOIN LATERAL UNNEST part is particularly clever magic - it splits the “Risks” field which contains multiple categories separated by commas, transforming it so each category appears as its own row. This reveals which kinds of security risks are most prevalent in your environment, helping you decide which protective countermeasures to prioritize, just as a wizard would allocate their magical defenses against the most common threats.

-- Find all Jenkins servers with security issues
SELECT
    "Resource Name",
    "Title",
    "Severity",
    "Resource Platform"
FROM read_csv('wiz_findings.csv')
WHERE 
    "Resource Name" LIKE '%jenkins%'
    AND "Status" = 'OPEN'
ORDER BY
    CASE "Severity"
        WHEN 'CRITICAL' THEN 1
        WHEN 'HIGH' THEN 2
        WHEN 'MEDIUM' THEN 3
        ELSE 4
    END;

This is your “Scry Specific Artifacts” spell, focusing your magical sight on a particular type of resource - in this case, Jenkins servers. Like a wizard using a specific reagent to enhance their divination for a particular target, the ‘%jenkins%’ pattern matches any resource containing “jenkins” in its name. The CASE expression in the ORDER BY clause is clever spell refinement, ensuring you see the most severe issues first, just as a wizard would prioritize the most dangerous curses affecting a precious magical artifact.

Choose Your Fighter: Single-Player vs. Enterprise - DuckDB and Motherduck

DuckDB is primarily designed as a single-user tool - one database, one server, one analyst. All the examples in this blog fall under this “single-player” mode, which is perfect for individual security analysts working with tool outputs. However, if you need to scale your security data analysis to multiple team members, the creators of DuckDB offer Motherduck, a cloud service that extends DuckDB with enterprise features. Motherduck integrates seamlessly with the DuckDB CLI while adding:

Multi-user access to security data
SSO integration (important for security teams)
Role-based access controls for sensitive findings
Collaborative security dashboards and notebooks

While this blog focuses on free tools, Motherduck is worth considering if you’re building a more robust security analytics platform for your team.

Prior Art - Forgien Data Wrappers

Before DuckDB became a popular choice for security analysis, other SQL-based approaches existed. Steampipe pioneered the concept of “infrastructure as SQL” by using PostgreSQL Foreign Data Wrappers (FDWs) to transform cloud APIs into queryable tables. This innovative approach allowed security engineers to write SQL queries that would reach out directly to AWS, Azure, and GCP APIs to pull security configurations in real-time. PostgreSQL FDWs have long enabled similar functionality, letting database administrators connect external data sources like CSV files, web services, and other databases directly to Postgres as virtual tables. While powerful, these solutions required running a PostgreSQL server and dealing with more complex configuration than DuckDB’s lightweight approach. Tools like CloudQuery followed a similar philosophy, but DuckDB’s ability to directly query files without a database server significantly lowers the barrier to entry for security analysts who just want to query their exported security data quickly.

Advanced Wiz Security Analysis

Let’s dig deeper with some more sophisticated queries for Wiz security findings. Advanced security analysis with CSPM or Cloud Native tooling isn’t just about finding individual vulnerabilities, it’s about uncovering patterns, relationships, and security gaps that span your entire environment. Using SQL, we can extract insights like trends over time, relationships between resource types and vulnerabilities, or correlations between deployment methods and security posture.

-- See which application teams have the most security issues
SELECT 
    JSON_EXTRACT_STRING("Resource Tags", '$.Application') AS application,
    COUNT(*) AS issue_count,
    COUNT(DISTINCT "Resource Name") AS affected_resources,
    STRING_AGG(DISTINCT "Severity", ', ') AS severity_levels
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY application
ORDER BY issue_count DESC;

-- Compare security posture across deployment methods
SELECT 
    JSON_EXTRACT_STRING("Resource Tags", '$.Deployer') AS deployment_method,
    COUNT(*) AS issue_count,
    COUNT(DISTINCT "Resource external ID") AS resource_count,
    ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT "Resource external ID"), 2) AS issues_per_resource
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY deployment_method
ORDER BY issues_per_resource DESC;

-- Extract instance types from AWS resource JSON
SELECT 
    "Resource Name",
    "Title",
    "Severity",
    JSON_EXTRACT_STRING("Resource original JSON", '$.InstanceType') AS instance_type,
    JSON_EXTRACT_STRING("Resource original JSON", '$.LaunchTime') AS launch_time
FROM read_csv('wiz_findings.csv')
WHERE 
    "Resource Platform" = 'AWS'
    AND "Resource Type" = 'virtualMachine'
ORDER BY "Severity", launch_time;

These same analytical techniques work across almost any security tool that exports data—Prisma Cloud, Tenable, Qualys, CrowdStrike, or Azure Security Center. The beauty of the DuckDB approach is its consistency: once you’ve mastered querying one tool’s output, you can apply the same SQL patterns to others, simply adapting to their specific schema. You can even combine findings from multiple tools into a unified security view, letting you compare how different tools detect the same issues or identify gaps in coverage between tools.

Combining Multiple Security Data Sources

DuckDB really shines when joining data from different security tools. We can leverage SQL Views in DuckDB. A View in SQL is essentially a saved query that acts like a virtual table. Think of it as a lens through which you look at your data - the underlying data doesn’t change, but the View gives you a specific perspective on it. For security analysis, Views offer several key advantages:

They simplify complex security datasets by hiding messy details and column names; they standardize field names across different security tools (turning “resource_id” and “AssetID” into just “resource_id”).
They apply security-focused transformations automatically (like parsing JSON or extracting nested values); and they improve query readability when joining multiple security datasets. Rather than writing the same complex parsing logic repeatedly, you can define it once in a View and reuse it in all your security queries. This approach transforms disjointed security tool outputs into a cohesive security data platform.We can create SQL Views to project data, save as an external index or present the data in a different way:

-- Create views for each data source
CREATE OR REPLACE VIEW wiz_findings AS
SELECT
    "Issue ID" AS id,
    "Resource external ID" AS resource_id,
    "Severity" AS severity,
    "Title" AS finding_type,
    CASE WHEN "Title" LIKE '%publicly exposed%' THEN TRUE ELSE FALSE END AS public_exposure,
    "Status" AS status,
    "Resource Tags" AS tags,
    "Risks" AS risks
FROM read_csv('wiz_findings.csv');

CREATE OR REPLACE VIEW aws_cost_data AS
SELECT
    resource_id,
    monthly_cost,
    department,
    cost_center
FROM read_csv('aws_costs.csv');

-- Find expensive resources with security issues
SELECT 
    w.resource_id,
    w.severity,
    w.finding_type,
    c.monthly_cost,
    c.department,
    c.cost_center
FROM wiz_findings w
JOIN aws_cost_data c ON w.resource_id = c.resource_id
WHERE 
    w.severity IN ('CRITICAL', 'HIGH')
    AND w.public_exposure = TRUE
    AND w.status = 'OPEN'
    AND c.monthly_cost > 100
ORDER BY 
    c.monthly_cost DESC,
    CASE w.severity
        WHEN 'CRITICAL' THEN 1
        WHEN 'HIGH' THEN 2
        ELSE 3
    END;

This approach combines security findings with cost data to identify high-risk, expensive resources that should be prioritized for remediation. By creating views for each data source, you can write cleaner, more maintainable queries across multiple security tools - something that’s nearly impossible with spreadsheets alone.

Practical DuckDB Tips for Security Teams

Interactive SQL Shell

DuckDB comes with an interactive CLI that works great for quick analysis:

# Run DuckDB shell against a file
duckdb wiz_findings.csv

# Inside the shell
SELECT COUNT(*) FROM 'wiz_findings.csv';
.mode markdown
SELECT severity, COUNT(*) FROM 'wiz_findings.csv' GROUP BY severity;

One-Liners from Command Line

You can also run SQL directly from your command line:

# Find all severely exposed resources
duckdb -csv :memory: "SELECT \"Resource Name\", \"Resource Type\", \"Severity\" FROM read_csv('wiz_findings.csv') WHERE \"Severity\" IN ('CRITICAL', 'HIGH') AND \"Title\" LIKE '%publicly exposed%'"

# Export results to a new CSV
duckdb -csv :memory: "SELECT * FROM read_csv('wiz_findings.csv') 
WHERE \"Status\" = 'OPEN'" > open_findings.csv

Performance Comparison

For a typical security assessment with 100,000 findings:

Excel: Often crashes or becomes unresponsive
Python scripts: 20+ seconds to load and filter
DuckDB: ~2 seconds from query to results

Next Steps: Cloud Security Posture Management

Once you’re comfortable with these basics, DuckDB becomes a powerful tool for DIY Cloud Security Posture Management. In the next section, we’ll explore how to:

Directly query cloud logs from S3/Azure/GCP
Create alerts for security misconfigurations
Build a lightweight CSPM dashboard using SQL queries
Track security posture improvements over time

Instead of paying for expensive CSPM tools, you can leverage DuckDB and SQL to build a surprisingly robust cloud security monitoring system - essentially “CSPM at home” that rivals commercial solutions.

Conclusion: Level Up Your Security Data Magic

We’ve covered how DuckDB can transform your approach to security data analysis, from basic queries to advanced security analytics. But this is just the beginning of what’s possible when you combine powerful SQL tools with your security workflows.

If you enjoyed this approach to security automation, you might also be interested in a similar tool I’ve written about called Tailpipe. Where DuckDB excels at data analysis, Tailpipe provides a lightweight approach to building automated security pipelines. Together, these tools can form a powerful toolkit for the modern security engineer who wants to work smarter, not harder.

Part 2: Tailpipe

📚 DuckDB for Cloud Security