Automating Image Metadata Extraction with AWS Lambda, Go, and PostgreSQL
Introduction
In today's digital age, images play a crucial role in various applications and services. However, managing and extracting metadata from these images can be a challenging task, especially when dealing with large volumes of data. In this article, we'll explore how to leverage AWS Lambda, Go, and PostgreSQL to create an automated system for extracting EXIF data from images and storing it in a database.
What is AWS Lambda?
AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers. It automatically scales your applications in response to incoming requests, making it an ideal solution for event-driven architectures. With Lambda, you only pay for the compute time you consume, making it cost-effective for various use cases.
Use-cases
AWS Lambda can be employed in numerous scenarios, including:
- Real-time file processing
- Data transformations
- Automated backups
- Scheduled tasks
- Webhooks and API backends
In our case, we'll use Lambda to process images as they're uploaded to an S3 bucket, extract their EXIF data, and store it in a PostgreSQL database.
In this article, we will be using PostgreSQL as our database. You can maintain your database in any database management system. For a convenient deployment option, consider cloud-based solutions like Rapidapp, which offers managed PostgreSQL databases, simplifying setup and maintenance.
Create a free database with connection pooling support for the serverless use-cases in Rapidapp in seconds here
Implementation
Project Initialization and Dependencies
In this project we will implement a function by using Go which depends on AWS Lambda and PostgreSQL. You can initialize Go project and install the dependencies as follows.
mkdir aws-lambda-go
cd aws-lambda-go
go mod init aws-lambda-go
go get -u github.com/aws/aws-lambda-go/lambda
go get -u github.com/aws/aws-sdk-go-v2/config
go get -u github.com/aws/aws-sdk-go-v2/service/s3
go get -u github.com/lib/pq
Function Endpoint
package main
...
import "github.com/aws/aws-lambda-go/lambda"
...
func HandleRequest(ctx context.Context, event events.S3Event) (*string, error) {
// Function logic goes here
}
func main() {
lambda.Start(HandleRequest)
}
Line 5: As always, context is used to control execution logic, and since this function is triggered by an S3 event, we'll use the events.S3Event
type.
This means, once this function is started to run, we will have a payload that contains the S3 event that triggered the function.
Line 10: In this part, the actual function logic is handled by a wrapper lambda.Start
coming from aws-lambda package.
Let's deep dive into actual function logic.
Database Connection
We will be getting database connection url from the environment variables, and then connect to the database. It could be good if we also ping the database to be sure it is healthy.
connStr := os.Getenv("DB_URL")
db, err := sql.Open("postgres", connStr)
if err != nil {
return nil, fmt.Errorf("failed to open database: %s", err)
}
defer db.Close()
err = db.Ping()
if err != nil {
return nil, fmt.Errorf("failed to ping database: %s", err)
}
fmt.Println("Successfully connected to the database!")
Retrieving Object from S3
Once the function triggerred by S3 event, we will get the object from the S3 bucket as follows.
sdkConfig, err := config.LoadDefaultConfig(ctx)
if err != nil {
return nil, fmt.Errorf("failed to load SDK config: %s", err)
}
s3Client := s3.NewFromConfig(sdkConfig)
var bucket string
var key string
for _, record := range event.Records {
bucket = record.S3.Bucket.Name
key = record.S3.Object.URLDecodedKey
// Get the object
getObjectOutput, err := s3Client.GetObject(ctx, &s3.GetObjectInput{
Bucket: &bucket,
Key: &key,
})
if err != nil {
return nil, fmt.Errorf("failed to get object %s/%s: %s", bucket, key, err)
}
defer getObjectOutput.Body.Close()
...
}
Line 1: If you have ever used AWS SDKs before, you might have seen the credential chaining operation. AWS SDK can use different methods to resolve credentials to create a session to connect AWS services. If you don't pass anything as credentials, it will try to find the credentials in the environment variables. If it cannot find it, then it will use the AWS metadata to understand the identity. In AWS Lambda environment, it knows how to resolve indentity to construct a session in Go.
Line 14: In this part, we will get the object from S3 bucket. We will be using this object to decode image details to get EXIF information.
Extracting EXIF Data
buf := new(bytes.Buffer)
_, err = buf.ReadFrom(getObjectOutput.Body)
if err != nil {
return nil, fmt.Errorf("failed to read object %s/%s: %s", bucket, key, err)
}
// Check EXIF data
exifData, err := exif.Decode(buf)
if err != nil {
return nil, fmt.Errorf("failed to decode EXIF data: %s", err)
}
log.Printf("successfully retrieved %s/%s with EXIF DateTime: %v", bucket, key, exifData)
Line 2: Create a reader from S3 object contents to use for decoding EXIF data.
Line 8: Extract EXIF data from image
Store in Postgres Database
There are lots of information in image headers, but in our case we will use 2 fields: make
and model
.
// SQL statement
sqlStatement := `INSERT INTO images (bucket, key, model, company) VALUES ($1,$2,$3,$4)`
// Execute the insertion
model, err := exifData.Get(exif.Model)
if err != nil {
return nil, fmt.Errorf("failed to get model: %s", err)
}
company, err := exifData.Get(exif.Make)
if err != nil {
return nil, fmt.Errorf("failed to get company: %s", err)
}
_, err = db.Exec(sqlStatement, bucket, key, model.String(), company.String())
if err != nil {
return nil, fmt.Errorf("failed to execute SQL statement: %s", err)
}
We basically read the EXIF data and insert it into the database. You can use following to create images
table in your database.
CREATE TABLE images (
bucket varchar(255),
key varchar(255),
model varchar(255),
company varchar(255)
);
bucket - S3 bucket name key - S3 object key model - Model name of the camera used to take the image company - Company name of the camera used to take the image
Now that we implemented our image metadata extraction, let's take a look at how we can deploy this function to AWS Lambda.
Deployment
Preparing Artifact
There is a reason to have a main function in our functions since we are about to build an executable to pass as bootstrap entrypoint to AWS Lambda environment. We need to build an executable, zip it and upload it to AWS Lambda as a new function.
GOOS=linux GOARCH=arm64 go build -tags lambda.norpc -o bootstrap main.go
We build an executable for linux OS and ARM64 architecture by using the main.go as an entrypoint. We use lambda.norpc
tag
to exclude the RPC library from the executable. This will prevent the RPC library from being included in the executable. This is
only used if you are using 1.X Go runtime. Also, we named the executable as bootstrap
, this is the entrypoint for AWS Lambda.
It will not be executed if you use another name. Finally, we will zip the executable and upload it to AWS Lambda as a new function.
zip PhotoHandler.zip bootstrap
AWS Requirements
Once we deploy the function, it will require set of permission like;
- Accessing S3 buckets
- Being able to create log groups in CloudWatch
- Being able to write to CloudWatch logs We can create an AWS role with the following policy for this purpose and assign it to the Lambda function
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:<region>:<account-id>:*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents",
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:logs:<region>:<account-id>:log-group:/aws/lambda/PhotoHandler:*",
"arn:aws:lambda:<region>:<account-id>:function:PhotoHandler"
]
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "*"
}
]
}
Line 7: This part is used to create CloudWatch log group and log stream. Do not forget to use your region and account ID. You can grab the account id with the following command
aws sts get-caller-identity
Line 17-18: This section contains another set of permission for creating log events, also invoking specific function which
is PhotoHandler
in our case. Again, do not forget to replace region and account id in your case.
Line 23: This section contains the permission to access S3 buckets.
Now you can store this as trust-policy.json
and execute the following command to create role.
aws iam create-role \
--role-name photo-handler \
--assume-role-policy-document \
file://trust-policy.json
Remember this role name since we will use it on AWS Lambda function creation.
AWS Lambda Function Creation
You can create a new lambda function as follows.
aws lambda create-function \
--function-name PhotoHandler \
--runtime provided.al2023 \
--handler bootstrap \
--architectures arm64 \
--role arn:aws:iam::<account-id>:role/photo-handler \
--zip-file fileb://PhotoHandler.zip
Line 3: This is the OS only environment, since we already have binary executable, so this can be provided to this env as entrypoint.
Line 6: Do not forget to replace with your account id, this part is needed for binding role to this specific function. This execution runtime will be able to do the operation provided in the trust policy that we created role out of it in previous section.
Adding S3 Events Trigger
In this section, we will add a trigger for S3 event so that this lambda function will be invoked whenever you upload new image to specific S3 bucket.
{
"LambdaFunctionConfigurations": [
{
"LambdaFunctionArn": "arn:aws:lambda:<region>:<account-id>:function:PhotoHandler",
"Events": [
"s3:ObjectCreated:*"
],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "acme-images/"
},
{
"Name": "suffix",
"Value": ".jpeg"
}
]
}
}
}
]
}
Now you can configure your bucket for the notifications so that it will trigger this lambda function.
aws s3api put-bucket-notification-configuration \
--bucket acme-images \
--notification-configuration file://s3-notification.json
This configure will ensure sending notification about S3 object creation events to trigger AWS lambda function. This event
can be consumed inside the HandleRequest
function.
Last Step
Now that we added a trigger to lambda function for S3 events. Whenever you add a new jpeg file to acme-images bucket it will invoke lambda function and it will get EXIF data then finally store in PostgreSQL database.
Conclusion
In this article, we explored how to automate image metadata extraction using AWS Lambda, Go, and PostgreSQL. We demonstrated how to use AWS Lambda to handle S3 events, extract EXIF data from images using the exif package in Go, and store the extracted metadata in a PostgreSQL database using the Rapidapp, PostgreSQL As a Service. There will be more Serverless use-cases in the future, do not forget to subscribe for new articles.
You can find the complete source code for this project on GitHub.