Overview
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics and ML. It includes a data catalog, ETL engine (Spark-based), and crawlers for automatic schema discovery.
Key Features
- ✓ Serverless: No infrastructure to manage
- ✓ Data Catalog: Centralized metadata repository
- ✓ Crawlers: Auto-discover schemas
- ✓ Visual ETL: Low-code job authoring
- ✓ Spark Engine: Scalable processing
- ✓ Job Bookmarks: Incremental processing
Pros
- 👍 Deep AWS integration
- 👍 Serverless scaling
- 👍 Data Catalog is useful standalone
- 👍 Visual editor for simple jobs
- 👍 Pay only for compute used
Cons
- 👎 Can be expensive at scale
- 👎 Cold start latency
- 👎 Limited to Spark/Python
- 👎 Complex pricing model
- 👎 Debugging can be painful
Best For
AWS-native organizations needing serverless ETL. Good for teams without dedicated data engineers who need basic data integration.
Founded: 2017 HQ: Amazon Web Services