Skip to content

Commit

Permalink
feat(pipeline) : Ensure the zone_diffusion_codes for DROM/COM
Browse files Browse the repository at this point in the history
There were cases where we did have the zone_diffusion_code set to "97"
which can't be right, as it would mean that a given service coyuld be
available across the oceans.

Let's fix it and set the correct, 3-digit department number.

This will also enable their search as we now (since the "new" communes)
search for a match agains commune.departement, which can be 3 digits.

I would love to add a data test for this, but not sure how useful that
is. So far, CHECK_SERVICE_ERRORS() is only semantic but does not
validate for instance that the zone_diffusion_code is a correct
department number if the zone_diffusion_type is "department", and so on.

If we wanted to validate this, what should be the correct way to
proceed?
  • Loading branch information
vperron committed Sep 17, 2024
1 parent 3367584 commit ec38b53
Showing 1 changed file with 21 additions and 14 deletions.
35 changes: 21 additions & 14 deletions pipeline/dbt/models/intermediate/int__union_services__enhanced.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,40 @@ structures AS (
SELECT * FROM {{ ref('int__union_structures__enhanced') }}
),

adresses AS (
SELECT * FROM {{ ref('int__union_adresses__enhanced') }}
),

departements AS (
SELECT * FROM {{ ref('stg_decoupage_administratif__departements') }}
),

-- TODO: Refactoring needed to be able to do geocoding per source and then use the result in the mapping
adresses_with_code_departement AS (
SELECT
{{ dbt_utils.star(from=ref('int__union_adresses__enhanced'), relation_alias='adresses') }},
CASE
WHEN LEFT(adresses.code_insee, 2) = '97' THEN LEFT(adresses.code_insee, 3)
ELSE LEFT(adresses.code_insee, 2)
END AS "code_departement"
FROM {{ ref("int__union_adresses__enhanced") }} AS adresses
),

services_with_zone_diffusion AS (
SELECT
{{ dbt_utils.star(from=ref('int__union_services'), relation_alias='services', except=["zone_diffusion_code", "zone_diffusion_nom"]) }},
CASE
WHEN services.source = ANY(ARRAY['monenfant', 'soliguide']) THEN adresses.code_insee
WHEN services.source = ANY(ARRAY['reseau-alpha', 'action-logement']) THEN LEFT(adresses.code_insee, 2)
WHEN services.zone_diffusion_type = 'commune' THEN adresses.code_insee
WHEN services.zone_diffusion_type = 'departement' THEN adresses.code_departement
ELSE services.zone_diffusion_code
END AS "zone_diffusion_code",
CASE
WHEN services.source = ANY(ARRAY['monenfant', 'soliguide']) THEN adresses.commune
WHEN services.source = ANY(ARRAY['reseau-alpha', 'action-logement']) THEN (SELECT departements."nom" FROM departements WHERE departements."code" = LEFT(adresses.code_insee, 2))
WHEN services.source = 'mediation-numerique' THEN (SELECT departements."nom" FROM departements WHERE departements."code" = services.zone_diffusion_code)
WHEN services.zone_diffusion_type = 'commune' THEN adresses.commune
WHEN services.zone_diffusion_type = 'departement' THEN departements.nom
ELSE services.zone_diffusion_nom
END AS "zone_diffusion_nom"
FROM
services
LEFT JOIN adresses ON services._di_adresse_surrogate_id = adresses._di_surrogate_id
FROM services
LEFT JOIN adresses_with_code_departement AS adresses
ON services._di_adresse_surrogate_id = adresses._di_surrogate_id
LEFT JOIN departements ON adresses.code_departement = departements.code
),


services_with_valid_structure AS (
SELECT services_with_zone_diffusion.*
FROM services_with_zone_diffusion
Expand Down Expand Up @@ -96,7 +102,8 @@ final AS (
adresses.result_score AS "_di_geocodage_score"
FROM
valid_services
LEFT JOIN adresses ON valid_services._di_adresse_surrogate_id = adresses._di_surrogate_id
LEFT JOIN adresses_with_code_departement AS adresses
ON valid_services._di_adresse_surrogate_id = adresses._di_surrogate_id
)

SELECT * FROM final

0 comments on commit ec38b53

Please sign in to comment.