Load Data from Amazon S3 using COPY
tip
Expected time: 5 minutes ⏱
Databend COPY can read object files(CSV or Parquet format) from Amazon S3 buckets, To understand Amazon S3’s core concepts please read Amazon S3 documentation.
Before you begin
- AWS Account: This Quickstart uses Amazon S3 and requires an AWS account’s
access key id
andsecret access key
. - Databend: You will connect to the database and using COPY to pull data from your Amazon S3 bucket, please see How to deploy Databend.
Part 1: Creating an Amazon S3 Bucket and Adding a File
- On your local machine, create a text file with the following CSV contents and name it
books.csv
:
books.csv
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004
This CSV file field delimiter is ,
and the record delimiter is \n
.
- In S3 create a bucket and upload
books.csv
to the bucket.
Part 2: Creating a Database and Table using COPY
create database book_db;
use book_db;
create table books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);
Now that the database and table have been created.
In Part 1 of this Quickstart, you uploaded the books.csv
file to your bucket.
To use the COPY data loading, you will need the following information:
- The name of the S3 URI(s3://bucket/to/path/), such as: s3://databend-bohu/data/
- Your AWS account’s access keys, such as:
- Access Key ID: your-access-key-id
- Secret Access Key: your-secret-access-key
Using this URI and keys, execute the following statement, replacing the placeholder values with your own:
copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0);
Now, let's check the data to make sure data has actually loaded:
mysql> select * from books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+
tip
If the file(s) is large and we want to check the file format is ok to parse, we can use the SIZE_LIMIT
:
copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0)
size_limit = 1; -- only load 1 rows