I am trying to execute this SQL, but the query is really slow (almost 2 hours), It's actually querying 1 million records (property_ids), that's why it's slow. But can someone suggest me a strategy to make this query faster, may be doing this in batches etc.

WITH properties AS (SELECT sid FROM prod.vw_property WHERE status = 'active') SELECT p.sid, -- active list count means not expired count(*) FILTER (WHERE cl.expired_at IS NULL) AS list_count_active, -- 30d list count DOES include those that are now expired (ie. listed and expired quickly within 30d) count(*) FILTER (WHERE (cl.listed_on >= ((now())::date - '30 days'::interval))) AS list_count_30d, -- All other aggregates should only include active (not expired) rew listings avg(cl.list_price) FILTER (WHERE cl.expired_at IS NULL) AS list_price_avg, percentile_cont(0.50) WITHIN GROUP (ORDER BY cl.list_price) FILTER (WHERE cl.expired_at IS NULL)::numeric(12, 2) AS list_price_median, min(cl.list_price) FILTER (WHERE cl.expired_at IS NULL) AS list_price_min, max(cl.list_price) FILTER (WHERE cl.expired_at IS NULL) AS list_price_max, avg((cl.list_price / (cl.interior_size_sqft)::numeric)) FILTER (WHERE (cl.interior_size_sqft > 100 AND cl.expired_at IS NULL)) AS list_price_per_sqft_avg, percentile_cont((0.50)::double precision) WITHIN GROUP (ORDER BY (((cl.list_price / (cl.interior_size_sqft)::numeric))::double precision)) FILTER (WHERE (cl.interior_size_sqft > 100) AND cl.expired_at IS NULL)::numeric(12, 2) AS list_price_per_sqft_median, -- Show up to 10 closest listings that were used to get the aggregate numbers (array_agg(cl.rew_properties_id ORDER BY (st_distance(p.location, cl.location))) FILTER (WHERE cl.expired_at IS NULL))[1:10] AS closest_rew_properties_ids FROM prod.vw_property p INNER JOIN prod.vw_comparable_listing cl ON ((((COALESCE(upper(p.land_area_sqft), 0))::numeric >= cl.land_size_sqft_lower) AND ((COALESCE(upper(p.land_area_sqft), 0))::numeric <= cl.land_size_sqft_upper) AND (((upper(p.interior_area_finished_sqft))::numeric >= cl.interior_size_sqft_lower) AND ((upper(p.interior_area_finished_sqft))::numeric <= cl.interior_size_sqft_upper)) AND (((p.num_bed)::numeric >= cl.num_bed_lower) AND ((p.num_bed)::numeric <= cl.num_bed_upper)) -- Within 2km AND st_dwithin(p.location, cl.location, 2000) AND ( CASE WHEN (p.rew_property_type = ANY (ARRAY['duplex'::prod.rew_property_type_enum, 'fourplex'::prod.rew_property_type_enum, 'triplex'::prod.rew_property_type_enum])) THEN 'townhouse'::prod.rew_property_type_enum ELSE p.rew_property_type END = cl.property_type_group))) JOIN properties ps ON ps.sid = p.sid GROUP BY 1;

Any suggestion will be appreciated. Also would like to add that AWS RDS large instance was at 52% due to this query.

Here is the Query plan: