📊
Data Analytics Pipeline
Kinesis → Firehose → S3 data lake → Glue ETL → Redshift with Athena.
What you can build with this
Ingest high-volume event streams in real time, store raw data in a data lake on S3, run ETL jobs to transform and catalog the data, then query it ad-hoc with Athena or load it into Redshift for BI dashboards and reporting.
Deployment timing
After you run the deploy command, resources come online at different times. Check the validation steps in order.
| Resource | Ready in |
|---|---|
| Kinesis Stream | 1–2 min |
| Firehose | 1–2 min |
| S3 Buckets | Immediate |
| First data in S3 (via Firehose) | 60–120 sec after first record sent |
| Redshift Cluster | 8–12 min after stack starts |
| Glue Crawler (schema discovery) | 2–5 min per run |
CloudFormation parameters
These are the values you will fill in when deploying the exported template. Changing these does not break the template — that is the point.
| Parameter | What to provide | Example |
|---|---|---|
| RawBucketName | S3 bucket for raw ingested data | mycompany-data-raw |
| CuratedBucketName | S3 bucket for transformed/curated data | mycompany-data-curated |
| KinesisShardCount | Kinesis stream shard count (scales ingest capacity) | 2 |
| RedshiftMasterPassword | Redshift admin password | Admin123! |