Skip to main content

About data tests property

models/<filename>.yml
version: 2

models:
- name: <model_name>
tests:
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>

columns:
- name: <column_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>

Description

The data tests property defines assertions about a column, tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells., or viewA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse).. The property contains a list of generic tests, referenced by name, which can include the four built-in generic tests available in dbt. For example, you can add tests that ensure a column contains no duplicates and zero null values. Any arguments or configurations passed to those tests should be nested below the test name.

Once these tests are defined, you can validate their correctness by running dbt test.

Out-of-the-box data tests

There are four generic data tests that are available out of the box, for everyone using dbt.

not_null

This test validates that there are no null values present in a column.

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: order_id
tests:
- not_null

unique

This test validates that there are no duplicate values present in a field.

The config and where clause are optional.

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: order_id
tests:
- unique:
config:
where: "order_id > 21"

accepted_values

This test validates that all of the values in a column are present in a supplied list of values. If any values other than those provided in the list are present, then the test will fail.

The accepted_values test supports an optional quote parameter which, by default, will single-quote the list of accepted values in the test query. To test non-strings (like integers or boolean values) explicitly set the quote config to false.

schema.yml
version: 2

models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']

- name: status_id
tests:
- accepted_values:
values: [1, 2, 3, 4]
quote: false

relationships

This test validates that all of the records in a child tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. have a corresponding record in a parent table. This property is referred to as "referential integrity".

The following example tests that every order's customer_id maps back to a valid customer.

schema.yml
version: 2

models:
- name: orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id

The to argument accepts a Relation – this means you can pass it a ref to a model (e.g. ref('customers')), or a source (e.g. source('jaffle_shop', 'customers')).

Additional examples

Test an expression

Some data tests require multiple columns, so it doesn't make sense to nest them under the columns: key. In this case, you can apply the data test to the model (or source, seed, or snapshot) instead:

models/orders.yml
version: 2

models:
- name: orders
tests:
- unique:
column_name: "country_code || '-' || order_id"

Use custom generic test

If you've defined your own custom generic test, you can use that as the test_name:

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: order_id
tests:
- primary_key # name of my custom generic test

Check out the guide on writing a custom generic test for more information.

Custom data test name

By default, dbt will synthesize a name for your generic test by concatenating:

  • test name (not_null, unique, etc)
  • model name (or source/seed/snapshot)
  • column name (if relevant)
  • arguments (if relevant, e.g. values for accepted_values)

It does not include any configurations for the test. If the concatenated name is too long, dbt will use a truncated and hashed version instead. The goal is to preserve unique identifiers for all resources in your project, including tests.

You may also define your own name for a specific test, via the name property.

When might you want this? dbt's default approach can result in some wonky (and ugly) test names. By defining a custom name, you get full control over how the test will appear in log messages and metadata artifacts. You'll also be able to select the test by that name.

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
name: unexpected_order_status_today
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
$ dbt test --select unexpected_order_status_today
12:43:41 Running with dbt=1.1.0
12:43:41 Found 1 model, 1 test, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
12:43:41
12:43:41 Concurrency: 5 threads (target='dev')
12:43:41
12:43:41 1 of 1 START test unexpected_order_status_today ................................ [RUN]
12:43:41 1 of 1 PASS unexpected_order_status_today ...................................... [PASS in 0.03s]
12:43:41
12:43:41 Finished running 1 test in 0.13s.
12:43:41
12:43:41 Completed successfully
12:43:41
12:43:41 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

A test's name must be unique for all tests defined on a given model-column combination. If you give the same name to tests defined on several different columns, or across several different models, then dbt test --select <repeated_custom_name> will select them all.

When might you need this? In cases where you have defined the same test twice, with only a difference in configuration, dbt will consider these tests to be duplicates:

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
config:
# only difference is in the 'where' config
where: "order_date = (current_date - interval '1 day')" # PostgreSQL syntax
Compilation Error
dbt found two tests with the name "accepted_values_orders_status__placed__shipped__completed__returned" defined on column "status" in "models.orders".

Since these resources have the same name, dbt will be unable to find the correct resource
when running tests.

To fix this, change the name of one of these resources:
- test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml)
- test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml)

By providing a custom name, you help dbt differentiate tests:

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
name: unexpected_order_status_today
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
- accepted_values:
name: unexpected_order_status_yesterday
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = (current_date - interval '1 day')" # PostgreSQL
$ dbt test
12:48:03 Running with dbt=1.1.0-b1
12:48:04 Found 1 model, 2 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
12:48:04
12:48:04 Concurrency: 5 threads (target='dev')
12:48:04
12:48:04 1 of 2 START test unexpected_order_status_today ................................ [RUN]
12:48:04 2 of 2 START test unexpected_order_status_yesterday ............................ [RUN]
12:48:04 1 of 2 PASS unexpected_order_status_today ...................................... [PASS in 0.04s]
12:48:04 2 of 2 PASS unexpected_order_status_yesterday .................................. [PASS in 0.04s]
12:48:04
12:48:04 Finished running 2 tests in 0.21s.
12:48:04
12:48:04 Completed successfully
12:48:04
12:48:04 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

If using store_failures: dbt uses each data test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an alias for the test, to separately control both the name of the test (for metadata) and the name of its database table (for storing failures).

Alternative format for defining tests

When defining a generic data test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as test_name instead. It's totally up to you.

This example is identical to the one above:

models/<filename>.yml
version: 2

models:
- name: orders
columns:
- name: status
tests:
- name: unexpected_order_status_today
test_name: accepted_values # name of the generic test to apply
values:
- placed
- shipped
- completed
- returned
config:
where: "order_date = current_date"
0