sql – Equivalent of string contains in google bigquery

sql – Equivalent of string contains in google bigquery

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), rcortisol|cortisone|dexamethasone) THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), rpeptide|paracetamol) THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id   

if to apply to sample data from your question – result is

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1     

Note: instead of simple LIKE ending with lengthy and redundant text – I am using LIKE on steroids – which is REGEXP_CONTAINS

In Postgres, I would recommend using the filter clause:

select subject_id,
       count(*) filter (where lower(drug) ~ cortisol|cortisone|dexamethasone) as steroids,
       count(*) filter (where lower(drug) ~ peptide|paracetamol) as aspirin,
from db.Team01.Table_1
group by subject_id;

In BigQuery, I would recommend countif():

select subject_id,
       countif(regexp_contains(drug, cortisol|cortisone|dexamethasone) as steroids,
       countif(drug ~  peptide|paracetamol) as aspirin,
from db.Team01.Table_1
group by subject_id;

You can use sum(case when . . . end) as a more general approach. However, each database has a more local way of expressing this logic. By the way, the FILTER clause is standard SQL, just not widely adopted.

sql – Equivalent of string contains in google bigquery

Use conditional aggregation. This is a solution that works across most (if not all) RDBMS:

SELECT
    subject_id,
    MAX(CASE WHEN drug IN (cortisol, cortisone, dexamethasone) THEN 1 END) steroids,
    MAX(CASE WHEN drug IN (peptide, paracetamol) THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

NB: it is unclear why you are using LIKE, since it seems like you are having exact matches; I turned the LIKE condition to equalities.

Leave a Reply

Your email address will not be published.