Apache Griffin

Big Data Quality Solution For Batch and Streaming

ABOUT APACHE GRIFFIN

Apache Griffin is an open source Data Quality solution for Big Data, which supports both batch and streaming mode. It offers an unified process to measure your data quality from different perspectives, helping you build trusted data assets, therefore boost your confidence for your business.


Apache Griffin offers a set of well-defined data quality domain model, which covers most of data quality problems in general. It also define a set of data quality DSL to help users define their quality criteria. By extending the DSL, users are even able to implement their own specific features/functions in Apache Griffin.

Apache Griffin had been accepted as an Apache Incubator Project on Dec 7, 2016.

Apache Griffin graduated as an Apache Top Level Project on Nov 21, 2018.

Apache Griffin handle data quality issues in 3 steps:

Picture

Step 1 Define Data Quality

Data scientists/analyst define their data quality requirements such as accuracy, completeness, timeliness, profiling, etc.

Picture

Step 2 Measure Data Quality

Source data will be ingested into Apache Griffin computing cluster and Apache Griffin will kick off data quality measurement based on data quality requirements.

Picture

Step 3 Metrics

Data quality reports as metrics will be evicted to designated destination.

Picture

Additional Bonus

Apache Griffin provides front tier for user to easily onboard any new data quality requirement into Apache Griffin platform and write comprehensive logic to define their data quality.

ARCHITECTURE

WHO USES Apache Griffin

COMMUNITY

Contribution

Get help using Apache Griffin or contribute to the project

Events

Learn more about Apache Griffin from Conferences

Copyright © 2018 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache Griffin, Griffin, Apache, the Apache feather logo and the Apache Griffin logo are trademarks of The Apache Software Foundation.