Great Expectations (GX) is the most popular open-source data quality framework. It lets you define "expectations" about your data and validate them as part of your pipeline. Think of it as unit tests for data. GX also auto-generates data documentation.
Key Features
✓Expectations: Declarative data assertions
✓Data Docs: Auto-generated documentation
✓Checkpoints: Validation orchestration
✓Profiler: Auto-generate expectations from data
✓Multi-backend: Works with Pandas, Spark, SQL
✓Extensible: Build custom expectations
Pros
👍True open-source with active community
👍Largest library of built-in expectations
👍Works anywhere Python runs
👍Data Docs are genuinely useful
👍Strong Airflow/Orchestrator integration
👍No vendor lock-in
Cons
👎Significant setup and learning curve
👎Configuration can be verbose
👎GX Cloud is relatively new
👎Doesn't detect unknown issues (rules-based)
👎Can slow down pipelines at scale
Best For
Teams who want testing and validation they control. Ideal for data engineers who think in code and want to version-control their data quality rules alongside their pipelines.