statistics – Using BigQuery to find outliers with standard deviation results combined with WHERE clause

statistics – Using BigQuery to find outliers with standard deviation results combined with WHERE clause

You can abuse JOIN for this (and thus performance will suffer):

SELECT n.state, n.year, n.month ,COUNT(*) AS outlier_count
FROM (
  SELECT state, year, month, weight_pounds, 1 as key 
  FROM [publicdata:samples.natality]) as n
JOIN (
  SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as giant_baby, 
          1 as key 
  FROM [publicdata:samples.natality]) as o
ON n.key = o.key
WHERE
  (n.weight_pounds > o.giant_baby)
AND
  (n.state !=  AND n.state IS NOT NULL)
GROUP BY n.state, n.year, n.month 
ORDER BY outlier_count DESC;

statistics – Using BigQuery to find outliers with standard deviation results combined with WHERE clause

Leave a Reply

Your email address will not be published.