Project Description

This project utilizes Lambda and Rekognition to build an OCR application which can extract text information from the image.

Environment

Operating System: Windows 10 (Home Edition)
Tools:
- CMD with administrator privileges
- Docker Desktop
- AWS SAM CLI
Language: Python 3.6

Preparation

Build Python virtual environment
One of the advantages of Python is rich libs. However, in order to avoid redundant libs and manage the project easily, we use virtual environment.
1
python -m venv ~/.venvs/aws_sam
Use virtual environment
First entering Scripts folder, then run the activate.bat to use the virtual environment
1
2
cd C:\Windows\System32\~\.venvs\aws_sam\Scripts
activate.bat
Once the result looks like:
(aws_sam) C:\Windows\System32\~\.venvs\aws_sam\Scripts>
we have already start the virtual environment aws_sam
Update environment
1
pip3 install pip setuptools wheel
Install packages
Since we are using virtual environment, we only have two packages, pip and setuptools. So we need to install packages.
1
2
3
pip3 install boto3 botocore -U
pip3 install awscli -U
pip3 install aws-sam-cli -U
- boto3: Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.
- botocore: Botocore is a low-level interface to a growing number of Amazon Web Services. Botocore serves as the foundation for the AWS-CLI command line utilities. It will also play an important role in the boto3.x project.
- awscli: The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.
- aws-sam-cli: Use this tool to build serverless applications that are defined by AWS SAM templates. The CLI provides commands that enable you to verify that AWS SAM template files are written according to the specification, invoke Lambda functions locally, step-through debug Lambda functions, package and deploy serverless applications to the AWS Cloud, and so on.

Build project

Download code from Github

1	sam init git@github.com:xiaokeliu666/AWS_OCR.git

Build project
1
sam build

Deploy

Package local environment
Since the environment of Lambda is too complicated to simulate locally, here we use docker container to build the project. In Docker-hub, there is already an image of docker-lambda)
1
sam build --use-container
Create an S3 bucket
1
aws s3 mb s3://my-lambda-ocr-repo

Package

1	sam package --template-file template.yaml --output-template-file packaged.yaml --s3-bucket my-lambda-ocr-repo

After excuting, a configuration file will be created in S3 bucket.

Deploy to Lambda
1
sam deploy --template-file packaged.yaml --stack-name aws-sam-ocr --capabilities CAPABILITY_IAM --region us-east-1
In this step, CloudFormation is used to build (template.yaml), so stack name is required. By excuting this step, we will create: table of dynamoDB, S3 bucket and Lambda function.
We can monitor the process of deployment in AWS -> CloudFormation -> Stacks

Test

Upload picture to bucket
First, we prepare a folder called “pic” to store pictures
There are two buckets relative to this project. The first one is which we upload file to and the second one is where stored the configuration file.

So, we upload picture to the first bucket
1
aws s3 cp pic/news.jpg s3://aws-lambda-ocr-sourceimagebucket-dqrx1qpra9wr
Find the result in dynamoDB
news.jpg:

Result: