Create Tables with Glue

In this lab, we will use AWS Glue Crawlers to scan sales data and create a new table in the AWS Glue Data Catalog. Then, we will use Amazon Athena to query that table.

Step 1: Access AWS Glue Console

  1. Open the AWS Management Console.
  2. Type Glue in the search bar and select AWS Glue from the search results to open the AWS Glue Console. Connect

Step 2: Create Crawler

  1. In the AWS Glue Console, select Crawlers from the left-hand menu.
  2. Click the Create Crawler button. Connect

Step 3: Name and Configure Data Source

  1. Enter the name Athena Sales for the crawler and click NEXT. Connect

  2. When asked “Is your data already mapped to Glue tables?”, select Not yet.

  3. Click Add a Data Source. Connect

  4. In the Data Source section, select S3.

  5. Click Browse S3, then follow these steps: Connect

    • Select a bucket that starts with athena-workshop-. Connect
    • Navigate to the Basics folder, then select the parquet folder. Connect Connect
    • Click the circle next to the sales folder.
    • Click Choose. Connect
  6. Keep the remaining settings as is and click Add an S3 Data source. Connect

Step 4: Create IAM Role

  1. Click Next. Connect
  2. Click the Create new IAM Role button, enter AWSGlueServiceRole-salescrawler as the role name, and click the Create button. Connect
  3. Click the Next button. Connect
  4. In the Set output and scheduling section, follow these steps:
    • Target Database: Choose default.
    • Table Name Prefix (optional): Enter athena_glue_.
  5. Click Next. Connect

Step 5: Review and Create Crawler

  1. On the Review and Create screen, click Create Crawler. Connect
  2. After the crawler is created, click Run crawler. This process may take 2-4 minutes to complete. Connect
  3. After the crawler completes, you will see a new table added. Connect

Query the New Table with Athena

Once the table is created successfully, we will go back to the Athena Console and run a query.

Step 6: Access Athena Console

  1. Open the AWS Management Console and type Athena in the search bar. Connect
  2. Select Athena from the search results to open the Athena Console.

Step 7: Query the Table

  1. In the left-hand pane of the Query Editor, you will see the athena_glue_sales table in the list of tables. Connect
  2. Click the three-dot icon (ellipsis) next to the table name and select Preview Table. Connect
  3. The query will run and return sales data. Connect

Congratulations! You have successfully created a new table using AWS Glue Crawler and queried it with Amazon Athena.