AWS Athena Query for AWS WAF S3 Logs

AWS WAF (Web Application Firewall) is a powerful service that helps protect your web applications from common web exploits. It can generate detailed logs that are stored in Amazon S3, providing valuable insights into traffic patterns, blocked requests, and potential security threats. AWS Athena, on the other hand, is an interactive query service that enables you to analyze data stored in S3 using standard SQL. By combining these two services, you can easily query and analyze your AWS WAF logs to gain actionable insights.

Table of Contents#

  1. Core Concepts
  2. Typical Usage Scenarios
  3. Common Practice
  4. Best Practices
  5. Conclusion
  6. FAQ
  7. References

Core Concepts#

AWS WAF Logs#

AWS WAF logs contain detailed information about the requests that are evaluated by your WAF rules. These logs are stored in Amazon S3 in JSON format and include fields such as the timestamp, client IP address, HTTP method, URI, and the action taken by the WAF (e.g., allow, block).

AWS Athena#

AWS Athena is a serverless query service that allows you to run SQL queries directly on data stored in S3. It uses Presto, an open - source distributed SQL query engine, to process the queries. Athena eliminates the need to manage a separate data processing infrastructure, making it easy to analyze large datasets stored in S3.

Schema Definition#

To query AWS WAF logs in Athena, you need to define a table schema that maps to the JSON structure of the logs. This schema tells Athena how to interpret the data in the S3 objects.

Typical Usage Scenarios#

Security Analysis#

  • Identify Attack Patterns: By querying the WAF logs, you can identify patterns of malicious requests, such as multiple requests from the same IP address with suspicious URIs or HTTP methods.
  • Detect Bot Activity: Analyze the logs to detect automated bot traffic that may be trying to scrape your website or perform other malicious activities.

Traffic Monitoring#

  • Understand Traffic Sources: Determine where your web traffic is coming from by analyzing the client IP addresses in the logs.
  • Monitor Traffic Volume: Keep track of the number of requests over time to identify any sudden spikes or drops in traffic.

Rule Optimization#

  • Evaluate Rule Effectiveness: Check if your WAF rules are blocking the right requests and allowing legitimate traffic. You can use queries to see how often a particular rule is triggered and if it is causing false positives.

Common Practice#

Step 1: Enable WAF Logging#

First, you need to enable logging for your AWS WAF web ACL. In the AWS WAF console, select your web ACL and configure the logging settings to specify an S3 bucket where the logs will be stored.

Step 2: Create an Athena Table#

  1. Open the Athena console in the AWS Management Console.
  2. Create a new database if you haven't already. You can use the following SQL command:
CREATE DATABASE waf_logs;
  1. Define the table schema for the WAF logs. The following is an example of a table creation statement:
CREATE EXTERNAL TABLE IF NOT EXISTS waf_logs.waf_log_table (
    `timestamp` TIMESTAMP,
    `formatVersion` INT,
    `webaclId` STRING,
    `terminatingRuleId` STRING,
    `terminatingRuleType` STRING,
    `action` STRING,
    `httpSourceName` STRING,
    `httpSourceId` STRING,
    `httpRequest` STRUCT<
        clientIp: STRING,
        country: STRING,
        headers: ARRAY<STRUCT<
            name: STRING,
            value: STRING
        >>,
        uri: STRING,
        args: STRING,
        httpVersion: STRING,
        httpMethod: STRING,
        requestId: STRING
    >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
    'serialization.format' = '1'
)
LOCATION 's3://your-waf-logs-bucket/';

Step 3: Run Queries#

Once the table is created, you can start running SQL queries on the WAF logs. For example, to find the number of blocked requests in the last 24 hours:

SELECT COUNT(*)
FROM waf_logs.waf_log_table
WHERE action = 'BLOCK'
  AND `timestamp` >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR;

Best Practices#

Partitioning#

  • Partition your WAF logs in S3 based on time (e.g., by day or hour). This can significantly improve query performance as Athena can skip scanning unnecessary data partitions. To create a partitioned table, you need to adjust the table creation statement to include partition columns.

Indexing#

Although Athena does not support traditional indexing, you can use columnar storage formats like Parquet or ORC for your WAF logs. These formats can improve query performance by allowing Athena to read only the necessary columns.

Query Optimization#

  • Use filters in your queries to limit the amount of data that Athena needs to scan. For example, if you are only interested in requests from a specific country, add a filter for the httpRequest.country column.
  • Avoid using expensive functions like LIKE with leading wildcards, as they can slow down the query.

Conclusion#

Querying AWS WAF S3 logs using AWS Athena is a powerful way to gain insights into your web application's security and traffic. By understanding the core concepts, typical usage scenarios, and following the common practices and best practices, software engineers can effectively analyze the logs to enhance security, optimize rules, and monitor traffic.

FAQ#

Q1: How much does it cost to use Athena to query WAF logs?#

Athena charges based on the amount of data scanned per query. The cost is relatively low, and you only pay for the data you actually query.

Q2: Can I query WAF logs in real - time?#

Athena is not designed for real - time querying. There is usually a small delay between the time the WAF logs are generated and when they are available for querying in Athena.

Q3: Do I need to have a large amount of technical knowledge to use Athena for WAF log analysis?#

While some knowledge of SQL is required, AWS provides a user - friendly console and documentation to help you get started. You don't need to have in - depth knowledge of data processing infrastructure.

References#