Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] fix custom SQL Expectation approach for cloud #10844

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,32 +35,35 @@ def set_up_context_for_example(context):
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - full code example">
import great_expectations as gx

# Define your custom SQL query.
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define query">
my_query = """
SELECT
*
FROM
{batch}
WHERE
passenger_count > 6 or passenger_count < 0
"""
# </snippet>

# Define a custom Expectation that uses SQL by subclassing UnexpectedRowsExpectation
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a custom UnexpectedRowsExpectation">
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define the query for an UnexpectedRowsExpectation">
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a more descriptive name for an UnexpectedRowsExpectation">
class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation):
# </snippet>
unexpected_rows_query: str = (
"SELECT * FROM {batch} WHERE passenger_count > 6 or passenger_count < 0"
)
# </snippet>
description: str = "There should be no more than **6** passengers."

# Customize how the Expectation renders in Data Docs.
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define description">
my_description = "There should be no more than **6** passengers."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't take action on this now, but just throwing it out as a stylistic thing in case other folks feel similarly - I personally find it much more to take in when we spread out simple stuff across multiple snippets, i.e., I'd rather just see this inlined when creating the expectation. But again, that's just my 2 cents!

# </snippet>

# Create an Expectation using the UnexpectedRowsExpectation class and your parameters.
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - create Expectation">
ExpectPassengerCountToBeLegal = gx.expectations.UnexpectedRowsExpectation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so small, but I'm going to throw a blocker on it. ExpectPassengerCountToBeLegal follows the class-naming convention. This should be expect_passenger_count_to_be_legal or something like that in snake_case

unexpected_rows_query=my_query, description=my_description
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: I wanted the parameters to be on separate lines, but a CI automation keeps shoving them into one combined line. Is there any way I can preserve the line break?

ExpectPassengerCountToBeLegal = gx.expectations.UnexpectedRowsExpectation(
    unexpected_rows_query = my_query,
    description = my_description
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, you can just throw a trailing comma after my_description. AFAIK ruff format uses black, which is pretty opinionated, and that's one of its demands.

In other cases, if you really need an escape hatch, you can tell the formatter to skip a section with a # fmt: off and then a # fmt: on where it should start again. Looks like we only use that in one other place in this repo; I'd be extremely judicious in using this. The philosophy is pretty much to just go with what the formatter says, even if you don't love it.

# </snippet>

# Test the Expectation.
context = gx.get_context()
# Hide this
set_up_context_for_example(context)

# Instantiate the custom Expectation
# <snippet name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - instantiate the custom SQL Expectation">
expectation = ExpectPassengerCountToBeLegal()
# </snippet>

# Test the Expectation
data_source_name = "my_sql_data_source"
data_asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
Expand All @@ -71,5 +74,5 @@ class ExpectPassengerCountToBeLegal(gx.expectations.UnexpectedRowsExpectation):
.get_batch()
)

batch.validate(expectation)
batch.validate(ExpectPassengerCountToBeLegal)
# </snippet>
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ import PrereqGxInstalled from '../_core_components/prerequisites/_gx_installatio
import PrereqPreconfiguredDataContext from '../_core_components/prerequisites/_preconfigured_data_context.md';
import PrereqPreconfiguredDataSourceAndAsset from '../_core_components/prerequisites/_data_source_and_asset_connected_to_data.md';

Among the available Expectations, the `UnexpectedRowsExpectation` is designed to facilitate the execution of SQL or Spark-SQL queries as the core logic for an Expectation. By default, `UnexpectedRowsExpectation` considers validation successful when no rows are returned by the provided SQL query.

Like any other Expectation, you can instantiate the `UnexpectedRowsExpectation` directly. You can also customize an `UnexpectedRowsExpectation` in essentially the same manner as you would [define a custom Expectation](/core/customize_expectations/define_a_custom_expectation_class.md), by subclassing `UnexpectedRowsExpectation` and providing customized default attributes and text for Data Docs. However, there are some caveats around the `UnexpectedRowsExpectation`'s `unexpected_rows_query` attribute that deserve further detail.
Among the available Expectations, the `UnexpectedRowsExpectation` is designed to facilitate the execution of SQL queries as the core logic for an Expectation. By default, `UnexpectedRowsExpectation` considers validation successful when no rows are returned by the provided SQL query.

<!-- TODO: Do we want to discuss custom `_validate(...)` logic here, or should that be held for a future topic on building custom Expectation classes from scratch? -->

Expand All @@ -37,38 +35,35 @@ Like any other Expectation, you can instantiate the `UnexpectedRowsExpectation`
<TabItem value="instructions" label="Instructions">

1. Create a new Expectation class that inherits the `UnexpectedRowsExpectation` class.

The class name `UnexpectedRowsExpectation` describes the functionality of the Expectation: it finds rows with unexpected values. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case. In this example, the customized subclass of `UnexpectedRowsExpectation` will be used to find invalid passenger counts in taxi trip data:

```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a more descriptive name for an UnexpectedRowsExpectation"
```

2. Override the Expectation's `unexpected_rows_query` attribute.
1. Determine your custom SQL query.
Copy link
Contributor Author

@klavavej klavavej Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: I re-flowed the instructions to be more like the create an Expectation page where you determine parameters and then create an Expectation using them because the snippets were ending in awkward places when I tried to keep the old flow while switching from subclassing to using the class directly. Example of one iteration I didn't like:

old flow


The `unexpected_rows_query` attribute is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check.
The `UnexpectedRowsExpectation` class takes an `unexpected_rows_query` attribute, which is a SQL or Spark-SQL query that returns a selection of rows from the Batch of data being validated. By default, rows that are returned have failed the validation check.

The `unexpected_rows_query` should be written in standard SQL or Spark-SQL syntax, except that it can also contain the special `{batch}` named query. When the Expectation is evaluated, the `{batch}` keyword will be replaced with the Batch of data that is configured for your Data Asset.
The custom SQL query should be written in the SQL dialect your database uses, except that it can also contain the special `{batch}` named query. When the Expectation is evaluated, the `{batch}` keyword will be replaced with the Batch of data that is configured for your Data Asset.

In this example, `unexpected_rows_query` will select any rows where the passenger count is greater than `6` or less than `0`. These rows will fail validation for this Expectation:
In this example, the custom query will select any rows where the passenger count is greater than `6` or less than `0`:

```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define the query for an UnexpectedRowsExpectation"
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define query"
```

3. Customize the rendering of the new Expectation when displayed in Data Docs.
2. Customize how the Expectation renders in Data Docs.

As with other Expectations, the `description` attribute contains the text describing the customized Expectation when your results are rendered into Data Docs. It can be set when an Expectation class is defined or edited as an attribute of an Expectation instance. You can format the `description` string with Markdown syntax:
As with other Expectations, the `description` attribute contains the text describing the Expectation when your results are rendered into Data Docs. You can format the `description` string with Markdown syntax:

```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define a custom UnexpectedRowsExpectation"
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - define description"
```

4. Use the customized subclass as an Expectation.

Once the customized Expectation subclass has been defined, instances of it can be created, added to Expectation Suites, and validated just like any other Expectation class:
3. Create a new Expectation using the `UnexpectedRowsExpectation` class and your parameters.
The class name `UnexpectedRowsExpectation` describes the functionality of the Expectation: it finds rows with unexpected values. When you create your Expectation, you can use a name that is more indicative of your specific use case. In this example, the customized Expectation will be used to find invalid passenger counts in taxi trip data:

```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - instantiate the custom SQL Expectation"
```python title="Python" name="docs/docusaurus/docs/core/customize_expectations/_examples/use_sql_to_define_a_custom_expectation.py - create Expectation"
```

4. Use your custom SQL Expectation.

Now that you've created a custom SQL Expectation, you can [add it to an Expectation Suite](/core/define_expectations/organize_expectation_suites.md) and [validate it](/docs/core/run_validations/run_a_validation_definition.md) like any other Expectation.

</TabItem>

<TabItem value="sample_code" label="Sample code">
Expand Down
Loading