Athena SQLite Driver

Using Athena's newQuery Federationfunctionality, read SQLite databases from S3.

Install it from the Serverless Application Repository:AthenaSQLiteConnector.

Why?

I occasionally like to put together fun side projects over Thanksgiving and Christmas holidays.

I'd always joked it would a crazy idea to be able to read SQLite using Athena, so...here we are!

How?

I decided to use Python as I'm most familiar with it and because of the next point
UsingAPSW,we can implement aVirtual File System(VFS) for S3
Using theAthena query federation example,we can see what calls need to be implemented

The PyArrow library unfortunately weighs in over 250MB, so we have to use a custom compilation step to build a Lambda Layer.

What?

Drop SQLite databases in a single prefix in S3, and Athena will list each file as a database and automatically detect tables and schemas.

Currently, all data types are strings. I'll fix this eventually. All good things in time.

Status

This project is under active development and very much in it's infancy.

Many things are hard-coded or broken into various pieces as I experiment and figure out how everything works.

Building

The documentation for this is a work in progress. It's currently in between me creating the resources manually and building the assets for the AWS SAR, and most of the docs will be automated away.

Requirements

Docker
Python 3.7

Lambda layer

First you need to build Lambda layer. There are two Dockerfiles and build scripts in thelambda-layer/directory.

We'll execute each of the build scripts and copy the results to the target directory. This is referenced by the SAR template,athena-sqlite.yaml.

cd lambda-layer
./build.sh
./build-pyarrow.sh
cp -R layer/../target/

Upload sample data

For the purpose of this test, we just have a sample sqlite database you can upload.

aws s3 cp sample-data/sample_data.sqlite s3://<TARGET_BUCKET>/<TARGET_PREFIX>/

Feel free to upload your own SQLite databases as well!

Lambda function

There are three components to the Lambda code:

vfs.py- A SQLite Virtual File System implementation for S3
s3qlite.py- The actual Lambda function that handles Athena metadata/data requests
sqlite_db.py- Helper functions for access SQLite databases on S3

Create a function with the code inlambda-function/s3qlite.pythat uses the previously created layer. The handler will bes3qlite.lambda_handler Also include thevfs.pyandsqlite_db.pyfiles in your Lambda function

Configure two environment variables for your lambda function:

TARGET_BUCKET- The name of your S3 bucket where SQLite files live
TARGET_PREFIX- The prefix (e.g.data/sqlite) that you uploaded the sample sqlite database to

Note that the IAM role you associate the function with will also needs3:GetObjectands3:ListBucketaccess to wherever your lovely SQLite databases are stored.

Configure Athena

Follow the Athena documentation forConnecting to a data source. The primary thing to note here is that you need to create a workgroup namedAmazonAthenaPreviewFunctionalityand use that for your testing. Some functionality will work in the primary workgroup, but you'll get weird errors when you try to query data.

I named my functions3qlite:)

Run queries!

Here's a couple basic queries that should work:

SELECT*FROM"s3qlite"."sample_data"."records"limit10;

SELECTCOUNT(*)FROM"s3qlite"."sample_data"."records";

If you deploy the SAR app, the data catalog isn't registered automatically, but you can still run queries by using the speciallambda:schema:

SELECT*FROM"lambda:s3qlite".sample_data.recordsLIMIT10;

Wheres3qliteis the value you provided for theAthenaCatalogNameparameter.

TODO

Move these into issues:)
Move vfs.py into it's own module
- Maybe add write support to it someday 😱
Publish to SAR
Add tests...always tests
struct types, probably
Don't read the entire file every time:)
Escape column names with invalid characters
Implement recursive listing

Serverless App Repo

These are mostly notes I made while figuring out how to get SAR working.

Need to grant SAR access to the bucket

aws s3api put-bucket-policy --bucket<BUCKET>--region us-east-1 --policy'{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "serverlessrepo.amazonaws.com"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<BUCKET>/*"
}
]
}'

For publishing to the SAR, we just execute two commands

sam package --template-file athena-sqlite.yaml --s3-bucket<BUCKET>--output-template-file target/out.yaml
sam publish --template target/out.yaml --region us-east-1

If you want to deploy using CloudFormation, use this command:

sam deploy --template-file./target/out.yaml --stack-name athena-sqlite --capabilities CAPABILITY_IAM --parameter-overrides'DataBucket=<BUCKET> DataPrefix=tmp/sqlite'--region us-east-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Athena SQLite Driver

Why?

How?

What?

Status

Building

Requirements

Lambda layer

Upload sample data

Lambda function

Configure Athena

Run queries!

TODO

Serverless App Repo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lambda-function		lambda-function
lambda-layer		lambda-layer
sample-data		sample-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
athena-sqlite.yaml		athena-sqlite.yaml

License

dacort/athena-sqlite

Folders and files

Latest commit

History

Repository files navigation

Athena SQLite Driver

Why?

How?

What?

Status

Building

Requirements

Lambda layer

Upload sample data

Lambda function

Configure Athena

Run queries!

TODO

Serverless App Repo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages