-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] fix custom SQL Expectation approach for cloud #10844
base: develop
Are you sure you want to change the base?
Changes from 20 commits
133262a
e33f962
61591ce
b334105
b40860f
69fe3a5
1fef7cf
61f5936
d2d86ed
8cd4acc
b66aa44
3c256cd
caf1228
f37031f
e0732c7
7fc9fac
ded380b
4d1d362
c6c5a2b
25a119b
83a2c0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,32 +35,35 @@ def set_up_context_for_example(context): | |
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - full code example"> | ||
import great_expectations as gx | ||
|
||
# Define your custom SQL query. | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define query"> | ||
my_query = """ | ||
SELECT | ||
* | ||
FROM | ||
{batch} | ||
WHERE | ||
passenger_count > 6 or passenger_count < 0 | ||
""" | ||
# </snippet> | ||
|
||
# Define a custom Expectation that uses SQL by subclassing UnexpectedRowsExpectation | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a custom UnexpectedRowsExpectation"> | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define the query for an UnexpectedRowsExpectation"> | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a more descriptive name for an UnexpectedRowsExpectation"> | ||
class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation): | ||
# </snippet> | ||
unexpected_rows_query: str = ( | ||
"SELECT * FROM {batch} WHERE passenger_count > 6 or passenger_count < 0" | ||
) | ||
# </snippet> | ||
description: str = "There should be no more than **6** passengers." | ||
|
||
# Customize how the Expectation renders in Data Docs. | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define description"> | ||
my_description = "There should be no more than **6** passengers." | ||
# </snippet> | ||
|
||
# Create an Expectation using the UnexpectedRowsExpectation class and your parameters. | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - create Expectation"> | ||
ExpectPassengerCountToBeLegal = gx.expectations.UnexpectedRowsExpectation( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is so small, but I'm going to throw a blocker on it. |
||
unexpected_rows_query=my_query, description=my_description | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note for reviewers: I wanted the parameters to be on separate lines, but a CI automation keeps shoving them into one combined line. Is there any way I can preserve the line break?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, you can just throw a trailing comma after In other cases, if you really need an escape hatch, you can tell the formatter to skip a section with a |
||
# </snippet> | ||
|
||
# Test the Expectation. | ||
context = gx.get_context() | ||
# Hide this | ||
set_up_context_for_example(context) | ||
|
||
# Instantiate the custom Expectation | ||
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - instantiate the custom SQL Expectation"> | ||
expectation = ExpectPassengerCountToBeLegal() | ||
# </snippet> | ||
|
||
# Test the Expectation | ||
data_source_name = "my_sql_data_source" | ||
data_asset_name = "my_data_asset" | ||
batch_definition_name = "my_batch_definition" | ||
|
@@ -71,5 +74,5 @@ class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation): | |
.get_batch() | ||
) | ||
|
||
batch.validate(expectation) | ||
batch.validate(ExpectPassengerCountToBeLegal) | ||
# </snippet> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,9 +9,7 @@ import PrereqGxInstalled from '../_core_components/prerequisites/_gx_installatio | |
import PrereqPreconfiguredDataContext from '../_core_components/prerequisites/_preconfigured_data_context.md'; | ||
import PrereqPreconfiguredDataSourceAndAsset from '../_core_components/prerequisites/_data_source_and_asset_connected_to_data.md'; | ||
|
||
Among the available Expectations, the `UnexpectedRowsExpectation` is designed to facilitate the execution of SQL or Spark-SQL queries as the core logic for an Expectation. By default, `UnexpectedRowsExpectation` considers validation successful when no rows are returned by the provided SQL query. | ||
|
||
Like any other Expectation, you can instantiate the `UnexpectedRowsExpectation` directly. You can also customize an `UnexpectedRowsExpectation` in essentially the same manner as you would [define a custom Expectation](/core/customize_expectations/define_a_custom_expectation_class.md), by subclassing `UnexpectedRowsExpectation` and providing customized default attributes and text for Data Docs. However, there are some caveats around the `UnexpectedRowsExpectation`'s `unexpected_rows_query` attribute that deserve further detail. | ||
Among the available Expectations, the `UnexpectedRowsExpectation` is designed to facilitate the execution of SQL queries as the core logic for an Expectation. By default, `UnexpectedRowsExpectation` considers validation successful when no rows are returned by the provided SQL query. | ||
|
||
<!-- TODO: Do we want to discuss custom `_validate(...)` logic here, or should that be held for a future topic on building custom Expectation classes from scratch? --> | ||
|
||
|
@@ -37,38 +35,35 @@ Like any other Expectation, you can instantiate the `UnexpectedRowsExpectation` | |
<TabItem value="instructions" label="Instructions"> | ||
|
||
1. Create a new Expectation class that inherits the `UnexpectedRowsExpectation` class. | ||
|
||
The class name `UnexpectedRowsExpectation` describes the functionality of the Expectation: it finds rows with unexpected values. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case. In this example, the customized subclass of `UnexpectedRowsExpectation` will be used to find invalid passenger counts in taxi trip data: | ||
|
||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a more descriptive name for an UnexpectedRowsExpectation" | ||
``` | ||
|
||
2. Override the Expectation's `unexpected_rows_query` attribute. | ||
1. Determine your custom SQL query. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note for reviewers: I re-flowed the instructions to be more like the create an Expectation page where you determine parameters and then create an Expectation using them because the snippets were ending in awkward places when I tried to keep the old flow while switching from subclassing to using the class directly. Example of one iteration I didn't like: |
||
|
||
The `unexpected_rows_query` attribute is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check. | ||
The `UnexpectedRowsExpectation` class takes an `unexpected_rows_query` attribute, which is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check. | ||
|
||
The `unexpected_rows_query` should be written in standard SQL or Spark-SQL syntax, except that it can also contain the special `{batch}` named query. When the Expectation is evaluated, the `{batch}` keyword will be replaced with the Batch of data that is configured for your Data Asset. | ||
The custom SQL query should be written in the SQL dialect your database uses, except that it can also contain the special `{batch}` named query. When the Expectation is evaluated, the `{batch}` keyword will be replaced with the Batch of data that is configured for your Data Asset. | ||
|
||
In this example, `unexpected_rows_query` will select any rows where the passenger count is greater than `6` or less than `0`. These rows will fail validation for this Expectation: | ||
In this example, the custom query will select any rows where the passenger count is greater than `6` or less than `0`: | ||
|
||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define the query for an UnexpectedRowsExpectation" | ||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define query" | ||
``` | ||
|
||
3. Customize the rendering of the new Expectation when displayed in Data Docs. | ||
2. Customize how the Expectation renders in Data Docs. | ||
|
||
As with other Expectations, the `description` attribute contains the text describing the customized Expectation when your results are rendered into Data Docs. It can be set when an Expectation class is defined or edited as an attribute of an Expectation instance. You can format the `description` string with Markdown syntax: | ||
As with other Expectations, the `description` attribute contains the text describing the Expectation when your results are rendered into Data Docs. You can format the `description` string with Markdown syntax: | ||
|
||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a custom UnexpectedRowsExpectation" | ||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define description" | ||
``` | ||
|
||
4. Use the customized subclass as an Expectation. | ||
|
||
Once the customized Expectation subclass has been defined, instances of it can be created, added to Expectation Suites, and validated just like any other Expectation class: | ||
3. Create a new Expectation using the `UnexpectedRowsExpectation` class and your parameters. | ||
The class name `UnexpectedRowsExpectation` describes the functionality of the Expectation: it finds rows with unexpected values. When you create your Expectation, you can use a name that is more indicative of your specific use case. In this example, the customized Expectation will be used to find invalid passenger counts in taxi trip data: | ||
|
||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - instantiate the custom SQL Expectation" | ||
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - create Expectation" | ||
``` | ||
|
||
4. Use your custom SQL Expectation. | ||
|
||
Now that you've created a custom SQL Expectation, you can [add it to an Expectation Suite](/core/define_expectations/organize_expectation_suites.md) and [validate it](/docs/core/run_validations/run_a_validation_definition.md) like any other Expectation. | ||
|
||
</TabItem> | ||
|
||
<TabItem value="sample_code" label="Sample code"> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't take action on this now, but just throwing it out as a stylistic thing in case other folks feel similarly - I personally find it much more to take in when we spread out simple stuff across multiple snippets, i.e., I'd rather just see this inlined when creating the expectation. But again, that's just my 2 cents!