Automating Image Metadata Extraction with AWS Lambda, Go, and PostgreSQL

July 5, 2024 · 9 min read

Software Developer

Introduction

In today's digital age, images play a crucial role in various applications and services. However, managing and extracting metadata from these images can be a challenging task, especially when dealing with large volumes of data. In this article, we'll explore how to leverage AWS Lambda, Go, and PostgreSQL to create an automated system for extracting EXIF data from images and storing it in a database.

What is AWS Lambda?

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. It automatically scales your applications in response to incoming requests, making it an ideal solution for event-driven architectures. With Lambda, you only pay for the compute time you consume, making it cost-effective for various use cases.

Use-cases

AWS Lambda can be employed in numerous scenarios, including:

Real-time file processing
Data transformations
Automated backups
Scheduled tasks
Webhooks and API backends

In our case, we'll use Lambda to process images as they're uploaded to an S3 bucket, extract their EXIF data, and store it in a PostgreSQL database.

In this article, we will be using PostgreSQL as our database. You can maintain your database in any database management system. For a convenient deployment option, consider cloud-based solutions like Rapidapp, which offers managed PostgreSQL databases, simplifying setup and maintenance.

tip

Create a free database with connection pooling support for the serverless use-cases in Rapidapp in seconds here

Implementation

Project Initialization and Dependencies

In this project we will implement a function by using Go which depends on AWS Lambda and PostgreSQL. You can initialize Go project and install the dependencies as follows.

mkdir aws-lambda-go
cd aws-lambda-go
go mod init aws-lambda-go
go get -u github.com/aws/aws-lambda-go/lambda
go get -u github.com/aws/aws-sdk-go-v2/config
go get -u github.com/aws/aws-sdk-go-v2/service/s3
go get -u github.com/lib/pq

Function Endpoint

main.go
package main
...
import "github.com/aws/aws-lambda-go/lambda"
...
func HandleRequest(ctx context.Context, event events.S3Event) (*string, error) {
// Function logic goes here
}

func main() {
    lambda.Start(HandleRequest)
}

Line 5: As always, context is used to control execution logic, and since this function is triggered by an S3 event, we'll use the events.S3Event type. This means, once this function is started to run, we will have a payload that contains the S3 event that triggered the function.

Line 10: In this part, the actual function logic is handled by a wrapper lambda.Start coming from aws-lambda package.

Let's deep dive into actual function logic.

Database Connection

We will be getting database connection url from the environment variables, and then connect to the database. It could be good if we also ping the database to be sure it is healthy.

main.go
connStr := os.Getenv("DB_URL")
db, err := sql.Open("postgres", connStr)
if err != nil {
    return nil, fmt.Errorf("failed to open database: %s", err)
}
defer db.Close()

err = db.Ping()
if err != nil {
    return nil, fmt.Errorf("failed to ping database: %s", err)
}
fmt.Println("Successfully connected to the database!")

Retrieving Object from S3

Once the function triggerred by S3 event, we will get the object from the S3 bucket as follows.

main.go
sdkConfig, err := config.LoadDefaultConfig(ctx)
if err != nil {
    return nil, fmt.Errorf("failed to load SDK config: %s", err)
}
s3Client := s3.NewFromConfig(sdkConfig)

var bucket string
var key string
for _, record := range event.Records {
    bucket = record.S3.Bucket.Name
    key = record.S3.Object.URLDecodedKey

    // Get the object
    getObjectOutput, err := s3Client.GetObject(ctx, &s3.GetObjectInput{
        Bucket: &bucket,
        Key:    &key,
    })
    if err != nil {
        return nil, fmt.Errorf("failed to get object %s/%s: %s", bucket, key, err)
    }
    defer getObjectOutput.Body.Close()
...
}

Line 1: If you have ever used AWS SDKs before, you might have seen the credential chaining operation. AWS SDK can use different methods to resolve credentials to create a session to connect AWS services. If you don't pass anything as credentials, it will try to find the credentials in the environment variables. If it cannot find it, then it will use the AWS metadata to understand the identity. In AWS Lambda environment, it knows how to resolve indentity to construct a session in Go.

Line 14: In this part, we will get the object from S3 bucket. We will be using this object to decode image details to get EXIF information.

Extracting EXIF Data

main.go
buf := new(bytes.Buffer)
_, err = buf.ReadFrom(getObjectOutput.Body)
if err != nil {
    return nil, fmt.Errorf("failed to read object %s/%s: %s", bucket, key, err)
}

// Check EXIF data
exifData, err := exif.Decode(buf)
if err != nil {
    return nil, fmt.Errorf("failed to decode EXIF data: %s", err)
}

log.Printf("successfully retrieved %s/%s with EXIF DateTime: %v", bucket, key, exifData)

Line 2: Create a reader from S3 object contents to use for decoding EXIF data.

Line 8: Extract EXIF data from image

Store in Postgres Database

There are lots of information in image headers, but in our case we will use 2 fields: make and model.

main.go
// SQL statement
sqlStatement := `INSERT INTO images (bucket, key, model, company) VALUES ($1,$2,$3,$4)`

// Execute the insertion
model, err := exifData.Get(exif.Model)
if err != nil {
    return nil, fmt.Errorf("failed to get model: %s", err)
}
company, err := exifData.Get(exif.Make)
if err != nil {
    return nil, fmt.Errorf("failed to get company: %s", err)
}
_, err = db.Exec(sqlStatement, bucket, key, model.String(), company.String())
if err != nil {
    return nil, fmt.Errorf("failed to execute SQL statement: %s", err)
}

We basically read the EXIF data and insert it into the database. You can use following to create images table in your database.

CREATE TABLE images (
    bucket varchar(255),
    key varchar(255),
    model varchar(255),
    company varchar(255)
);

bucket - S3 bucket name key - S3 object key model - Model name of the camera used to take the image company - Company name of the camera used to take the image

Now that we implemented our image metadata extraction, let's take a look at how we can deploy this function to AWS Lambda.

Deployment

Preparing Artifact

There is a reason to have a main function in our functions since we are about to build an executable to pass as bootstrap entrypoint to AWS Lambda environment. We need to build an executable, zip it and upload it to AWS Lambda as a new function.

GOOS=linux GOARCH=arm64 go build -tags lambda.norpc -o bootstrap main.go

We build an executable for linux OS and ARM64 architecture by using the main.go as an entrypoint. We use lambda.norpc tag to exclude the RPC library from the executable. This will prevent the RPC library from being included in the executable. This is only used if you are using 1.X Go runtime. Also, we named the executable as bootstrap, this is the entrypoint for AWS Lambda. It will not be executed if you use another name. Finally, we will zip the executable and upload it to AWS Lambda as a new function.

zip PhotoHandler.zip bootstrap

AWS Requirements

Once we deploy the function, it will require set of permission like;

Accessing S3 buckets
Being able to create log groups in CloudWatch
Being able to write to CloudWatch logs We can create an AWS role with the following policy for this purpose and assign it to the Lambda function

trust-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "logs:CreateLogGroup",
      "Resource": "arn:aws:logs:<region>:<account-id>:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "lambda:InvokeFunction"
      ],
      "Resource": [
        "arn:aws:logs:<region>:<account-id>:log-group:/aws/lambda/PhotoHandler:*",
        "arn:aws:lambda:<region>:<account-id>:function:PhotoHandler"
      ]
    },
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "*"
    }
  ]
}

Line 7: This part is used to create CloudWatch log group and log stream. Do not forget to use your region and account ID. You can grab the account id with the following command

aws sts get-caller-identity

Line 17-18: This section contains another set of permission for creating log events, also invoking specific function which is PhotoHandler in our case. Again, do not forget to replace region and account id in your case.

Line 23: This section contains the permission to access S3 buckets.

Now you can store this as trust-policy.json and execute the following command to create role.

aws iam create-role \
  --role-name photo-handler \
  --assume-role-policy-document \
  file://trust-policy.json

Remember this role name since we will use it on AWS Lambda function creation.

AWS Lambda Function Creation

You can create a new lambda function as follows.

aws lambda create-function \
  --function-name PhotoHandler \
  --runtime provided.al2023 \
  --handler bootstrap \
  --architectures arm64 \
  --role arn:aws:iam::<account-id>:role/photo-handler \
  --zip-file fileb://PhotoHandler.zip

Line 3: This is the OS only environment, since we already have binary executable, so this can be provided to this env as entrypoint.

Line 6: Do not forget to replace with your account id, this part is needed for binding role to this specific function. This execution runtime will be able to do the operation provided in the trust policy that we created role out of it in previous section.

Adding S3 Events Trigger

In this section, we will add a trigger for S3 event so that this lambda function will be invoked whenever you upload new image to specific S3 bucket.

s3-notification.json
{
  "LambdaFunctionConfigurations": [
    {
      "LambdaFunctionArn": "arn:aws:lambda:<region>:<account-id>:function:PhotoHandler",
      "Events": [
        "s3:ObjectCreated:*"
      ],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "acme-images/"
            },
            {
              "Name": "suffix",
              "Value": ".jpeg"
            }
          ]
        }
      }
    }
  ]
}

Now you can configure your bucket for the notifications so that it will trigger this lambda function.

aws s3api put-bucket-notification-configuration \
    --bucket acme-images \
    --notification-configuration file://s3-notification.json

This configure will ensure sending notification about S3 object creation events to trigger AWS lambda function. This event can be consumed inside the HandleRequest function.

Last Step

Now that we added a trigger to lambda function for S3 events. Whenever you add a new jpeg file to acme-images bucket it will invoke lambda function and it will get EXIF data then finally store in PostgreSQL database.

Conclusion

In this article, we explored how to automate image metadata extraction using AWS Lambda, Go, and PostgreSQL. We demonstrated how to use AWS Lambda to handle S3 events, extract EXIF data from images using the exif package in Go, and store the extracted metadata in a PostgreSQL database using the Rapidapp, PostgreSQL As a Service. There will be more Serverless use-cases in the future, do not forget to subscribe for new articles.

tip

You can find the complete source code for this project on GitHub.

Introduction​

What is AWS Lambda?​

Use-cases​

Implementation​

Project Initialization and Dependencies​

Function Endpoint​

Database Connection​

Retrieving Object from S3​

Extracting EXIF Data​

Store in Postgres Database​

Deployment​

Preparing Artifact​

AWS Requirements​

AWS Lambda Function Creation​

Adding S3 Events Trigger​

Last Step​

Conclusion​