📊

Data Analytics Pipeline

Kinesis → Firehose → S3 data lake → Glue ETL → Redshift with Athena.

What you can build with this

Ingest high-volume event streams in real time, store raw data in a data lake on S3, run ETL jobs to transform and catalog the data, then query it ad-hoc with Athena or load it into Redshift for BI dashboards and reporting.

Deployment timing

After you run the deploy command, resources come online at different times. Check the validation steps in order.

Resource	Ready in
Kinesis Stream	1–2 min
Firehose	1–2 min
S3 Buckets	Immediate
First data in S3 (via Firehose)	60–120 sec after first record sent
Redshift Cluster	8–12 min after stack starts
Glue Crawler (schema discovery)	2–5 min per run

CloudFormation parameters

These are the values you will fill in when deploying the exported template. Changing these does not break the template — that is the point.

Parameter	What to provide	Example
RawBucketName	S3 bucket for raw ingested data	mycompany-data-raw
CuratedBucketName	S3 bucket for transformed/curated data	mycompany-data-curated
KinesisShardCount	Kinesis stream shard count (scales ingest capacity)	2
RedshiftMasterPassword	Redshift admin password	Admin123!