đŠđ» DuckDB for Security Engineers: SQL Magic for Your Security Tool Outputs
đ DuckDB for Cloud Security
This series explores how to leverage DuckDB for cloud security analysis, taking you from basic concepts to advanced security monitoring techniques. Learn to build your own CSPM solutions, analyze security tool outputs, and create powerful DIY security workflows.
Introduction
Letâs face it, the cyber security industry has a data problem. Despite all the fancy tools and platforms, a lot of us often end up staring at yet another CSV export.
When I was working in security operations, Iâd receive exports from tools like Wiz, Tenable, or cloud native services - all as CSV, JSON files. I needed answers like âShow me all internet-exposed critical assets with unpatched vulnerabilitiesâ but found myself writing custom Python and Golang scripts or manually filtering spreadsheets.
Even worse, Iâd often need to correlate findings across multiple tool outputs (âcross-reference EC2 instance ids Wiz flagged with this CSV of vulnerabilities from some other platformâ), which meant copying data between spreadsheets or writing even more complex scripts.
Enter DuckDB and SQL
This is where DuckDB becomes a security analystâs secret weapon. DuckDB is an in-process SQL database that:
- Runs locally (no server setup)
- Directly queries files without importing them
- Supports virtually all common data formats (CSV, JSON, Excel, Parquet)
- Uses SQL - the language of data wizards
If youâre already familiar with SQL, youâre halfway there. If not, think of SQL as a set of incantations for manipulating data:
SELECT what_you_want
FROM where_it_lives
WHERE conditions_you_care_about
There are multiple dialects of SQL, but DuckDB closely resembles Postgres style syntax. Letâs break down our magical incantation:
SELECT what_you_want
- Specify which columns (like âSeverityâ or âResourceNameâ) to include in your resultsFROM where_it_lives
- Identify your data source (CSV, JSON, or any file format)WHERE conditions_you_care_about
- Filter for what matters to you (like âWHERE Severity = âCRITICALââ)
For example, to find all AWS EC2 instances with high privileges exposed to the internet, you might cast:
SELECT ResourceName, PublicIP
FROM read_csv('aws_instances.csv')
WHERE PrivilegeLevel = 'High' AND InternetExposed = 'true'
Why DuckDB for Security Analysis?
Unlike traditional databases:
- Zero setup (download and go)
- Queries files in place (no import/export headache)
- Blazing fast (analyzes millions of rows in seconds)
- Works with any security toolâs output files
Querying Security Tool Outputs
Letâs see DuckDB in action with common security scenarios:
Real-World Example: Wiz Security Findings
Security tools like Wiz export findings as CSV files. Hereâs how to analyze them with DuckDB:
-- First, let's examine the structure of the Wiz export
SELECT * FROM read_csv('wiz_findings.csv', auto_detect=true) LIMIT 3;
This initial incantation is like a wizardâs âReveal Essenceâ spell - it lets you quickly peek at whatâs inside your mysterious artifact (the CSV file) without overwhelming yourself with its full contents. Just like a wizard might examine a small sample of a potion before attempting to understand it completely, this query shows you just enough rows to understand the data structure.
-- Find critical severity publicly exposed resources
SELECT
"Title",
"Resource Name",
"Resource Type",
"Resource Region",
"Resource Platform"
FROM read_csv('wiz_findings.csv')
WHERE
"Severity" = 'CRITICAL'
AND "Status" = 'OPEN'
AND "Title" LIKE '%publicly exposed%'
ORDER BY "Created At" DESC;
Think of this as casting âLocate Vulnerable Gatewaysâ - a spell that reveals your realmâs most critical weaknesses exposed to outside threats. This query finds all your most severe security issues (âCRITICALâ severity) that remain unfixed (âOPENâ status) and specifically involve public exposure to the internet. The query orders results by creation date, so you see the most recently discovered vulnerabilities first - just as a vigilant wizard would prioritize the newest breaches in their magical barriers.
-- Find resources with multiple security issues
SELECT
"Resource Name",
"Resource Type",
COUNT(*) AS issue_count,
STRING_AGG(DISTINCT "Title", ' | ') AS issues,
MIN("Due At") AS earliest_due_date
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY "Resource Name", "Resource Type"
HAVING COUNT(*) > 1
ORDER BY issue_count DESC;
This is your âReveal Compound Cursesâ spell, identifying resources suffering from multiple security ailments at once. Like a wizard examining a patient afflicted by several hexes, this query groups your findings by resource and counts how many issues affect each one. It concatenates all the distinct issue titles into a single text field (using STRING_AGG), showing you everything plaguing each resource. It also identifies the earliest due date to help prioritize your remediation efforts. The results are ordered by the resources with the most issues first - these are your most cursed assets!
-- Analyze findings by risk category
SELECT
t.risk_category,
COUNT(*) AS finding_count
FROM read_csv('wiz_findings.csv') wf
CROSS JOIN LATERAL UNNEST(string_split(wf."Risks", ', ')) AS t(risk_category)
WHERE wf."Status" = 'OPEN'
GROUP BY t.risk_category
ORDER BY finding_count DESC;
Consider this your âCategorize Arcane Threatsâ divination. This sophisticated spell breaks apart the comma-separated risk categories in your data and counts each type individually. The CROSS JOIN LATERAL UNNEST part is particularly clever magic - it splits the âRisksâ field which contains multiple categories separated by commas, transforming it so each category appears as its own row. This reveals which kinds of security risks are most prevalent in your environment, helping you decide which protective countermeasures to prioritize, just as a wizard would allocate their magical defenses against the most common threats.
-- Find all Jenkins servers with security issues
SELECT
"Resource Name",
"Title",
"Severity",
"Resource Platform"
FROM read_csv('wiz_findings.csv')
WHERE
"Resource Name" LIKE '%jenkins%'
AND "Status" = 'OPEN'
ORDER BY
CASE "Severity"
WHEN 'CRITICAL' THEN 1
WHEN 'HIGH' THEN 2
WHEN 'MEDIUM' THEN 3
ELSE 4
END;
This is your âScry Specific Artifactsâ spell, focusing your magical sight on a particular type of resource - in this case, Jenkins servers. Like a wizard using a specific reagent to enhance their divination for a particular target, the â%jenkins%â pattern matches any resource containing âjenkinsâ in its name. The CASE expression in the ORDER BY clause is clever spell refinement, ensuring you see the most severe issues first, just as a wizard would prioritize the most dangerous curses affecting a precious magical artifact.
Choose Your Fighter: Single-Player vs. Enterprise - DuckDB and Motherduck
DuckDB is primarily designed as a single-user tool - one database, one server, one analyst. All the examples in this blog fall under this âsingle-playerâ mode, which is perfect for individual security analysts working with tool outputs. However, if you need to scale your security data analysis to multiple team members, the creators of DuckDB offer Motherduck, a cloud service that extends DuckDB with enterprise features. Motherduck integrates seamlessly with the DuckDB CLI while adding:
- Multi-user access to security data
- SSO integration (important for security teams)
- Role-based access controls for sensitive findings
- Collaborative security dashboards and notebooks
While this blog focuses on free tools, Motherduck is worth considering if youâre building a more robust security analytics platform for your team.
Prior Art - Forgien Data Wrappers
Before DuckDB became a popular choice for security analysis, other SQL-based approaches existed. Steampipe pioneered the concept of âinfrastructure as SQLâ by using PostgreSQL Foreign Data Wrappers (FDWs) to transform cloud APIs into queryable tables. This innovative approach allowed security engineers to write SQL queries that would reach out directly to AWS, Azure, and GCP APIs to pull security configurations in real-time. PostgreSQL FDWs have long enabled similar functionality, letting database administrators connect external data sources like CSV files, web services, and other databases directly to Postgres as virtual tables. While powerful, these solutions required running a PostgreSQL server and dealing with more complex configuration than DuckDBâs lightweight approach. Tools like CloudQuery followed a similar philosophy, but DuckDBâs ability to directly query files without a database server significantly lowers the barrier to entry for security analysts who just want to query their exported security data quickly.
Advanced Wiz Security Analysis
Letâs dig deeper with some more sophisticated queries for Wiz security findings. Advanced security analysis with CSPM or Cloud Native tooling isnât just about finding individual vulnerabilities, itâs about uncovering patterns, relationships, and security gaps that span your entire environment. Using SQL, we can extract insights like trends over time, relationships between resource types and vulnerabilities, or correlations between deployment methods and security posture.
-- See which application teams have the most security issues
SELECT
JSON_EXTRACT_STRING("Resource Tags", '$.Application') AS application,
COUNT(*) AS issue_count,
COUNT(DISTINCT "Resource Name") AS affected_resources,
STRING_AGG(DISTINCT "Severity", ', ') AS severity_levels
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY application
ORDER BY issue_count DESC;
-- Compare security posture across deployment methods
SELECT
JSON_EXTRACT_STRING("Resource Tags", '$.Deployer') AS deployment_method,
COUNT(*) AS issue_count,
COUNT(DISTINCT "Resource external ID") AS resource_count,
ROUND(COUNT(*) * 1.0 / COUNT(DISTINCT "Resource external ID"), 2) AS issues_per_resource
FROM read_csv('wiz_findings.csv')
WHERE "Status" = 'OPEN'
GROUP BY deployment_method
ORDER BY issues_per_resource DESC;
-- Extract instance types from AWS resource JSON
SELECT
"Resource Name",
"Title",
"Severity",
JSON_EXTRACT_STRING("Resource original JSON", '$.InstanceType') AS instance_type,
JSON_EXTRACT_STRING("Resource original JSON", '$.LaunchTime') AS launch_time
FROM read_csv('wiz_findings.csv')
WHERE
"Resource Platform" = 'AWS'
AND "Resource Type" = 'virtualMachine'
ORDER BY "Severity", launch_time;
These same analytical techniques work across almost any security tool that exports dataâPrisma Cloud, Tenable, Qualys, CrowdStrike, or Azure Security Center. The beauty of the DuckDB approach is its consistency: once youâve mastered querying one toolâs output, you can apply the same SQL patterns to others, simply adapting to their specific schema. You can even combine findings from multiple tools into a unified security view, letting you compare how different tools detect the same issues or identify gaps in coverage between tools.
Combining Multiple Security Data Sources
DuckDB really shines when joining data from different security tools. We can leverage SQL Views in DuckDB. A View in SQL is essentially a saved query that acts like a virtual table. Think of it as a lens through which you look at your data - the underlying data doesnât change, but the View gives you a specific perspective on it. For security analysis, Views offer several key advantages:
-
They simplify complex security datasets by hiding messy details and column names; they standardize field names across different security tools (turning âresource_idâ and âAssetIDâ into just âresource_idâ).
-
They apply security-focused transformations automatically (like parsing JSON or extracting nested values); and they improve query readability when joining multiple security datasets. Rather than writing the same complex parsing logic repeatedly, you can define it once in a View and reuse it in all your security queries. This approach transforms disjointed security tool outputs into a cohesive security data platform.We can create SQL Views to project data, save as an external index or present the data in a different way:
-- Create views for each data source
CREATE OR REPLACE VIEW wiz_findings AS
SELECT
"Issue ID" AS id,
"Resource external ID" AS resource_id,
"Severity" AS severity,
"Title" AS finding_type,
CASE WHEN "Title" LIKE '%publicly exposed%' THEN TRUE ELSE FALSE END AS public_exposure,
"Status" AS status,
"Resource Tags" AS tags,
"Risks" AS risks
FROM read_csv('wiz_findings.csv');
CREATE OR REPLACE VIEW aws_cost_data AS
SELECT
resource_id,
monthly_cost,
department,
cost_center
FROM read_csv('aws_costs.csv');
-- Find expensive resources with security issues
SELECT
w.resource_id,
w.severity,
w.finding_type,
c.monthly_cost,
c.department,
c.cost_center
FROM wiz_findings w
JOIN aws_cost_data c ON w.resource_id = c.resource_id
WHERE
w.severity IN ('CRITICAL', 'HIGH')
AND w.public_exposure = TRUE
AND w.status = 'OPEN'
AND c.monthly_cost > 100
ORDER BY
c.monthly_cost DESC,
CASE w.severity
WHEN 'CRITICAL' THEN 1
WHEN 'HIGH' THEN 2
ELSE 3
END;
This approach combines security findings with cost data to identify high-risk, expensive resources that should be prioritized for remediation. By creating views for each data source, you can write cleaner, more maintainable queries across multiple security tools - something thatâs nearly impossible with spreadsheets alone.
Practical DuckDB Tips for Security Teams
Interactive SQL Shell
DuckDB comes with an interactive CLI that works great for quick analysis:
# Run DuckDB shell against a file
duckdb wiz_findings.csv
# Inside the shell
SELECT COUNT(*) FROM 'wiz_findings.csv';
.mode markdown
SELECT severity, COUNT(*) FROM 'wiz_findings.csv' GROUP BY severity;
One-Liners from Command Line
You can also run SQL directly from your command line:
# Find all severely exposed resources
duckdb -csv :memory: "SELECT \"Resource Name\", \"Resource Type\", \"Severity\" FROM read_csv('wiz_findings.csv') WHERE \"Severity\" IN ('CRITICAL', 'HIGH') AND \"Title\" LIKE '%publicly exposed%'"
# Export results to a new CSV
duckdb -csv :memory: "SELECT * FROM read_csv('wiz_findings.csv')
WHERE \"Status\" = 'OPEN'" > open_findings.csv
Performance Comparison
For a typical security assessment with 100,000 findings:
- Excel: Often crashes or becomes unresponsive
- Python scripts: 20+ seconds to load and filter
- DuckDB: ~2 seconds from query to results
Next Steps: Cloud Security Posture Management
Once youâre comfortable with these basics, DuckDB becomes a powerful tool for DIY Cloud Security Posture Management. In the next section, weâll explore how to:
- Directly query cloud logs from S3/Azure/GCP
- Create alerts for security misconfigurations
- Build a lightweight CSPM dashboard using SQL queries
- Track security posture improvements over time
Instead of paying for expensive CSPM tools, you can leverage DuckDB and SQL to build a surprisingly robust cloud security monitoring system - essentially âCSPM at homeâ that rivals commercial solutions.
Conclusion: Level Up Your Security Data Magic
Weâve covered how DuckDB can transform your approach to security data analysis, from basic queries to advanced security analytics. But this is just the beginning of whatâs possible when you combine powerful SQL tools with your security workflows.
If you enjoyed this approach to security automation, you might also be interested in a similar tool Iâve written about called Tailpipe. Where DuckDB excels at data analysis, Tailpipe provides a lightweight approach to building automated security pipelines. Together, these tools can form a powerful toolkit for the modern security engineer who wants to work smarter, not harder.
~jared gore