QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=9.32..22.02 rows=100 width=287)
Output: trips.vendor_id, (count(*)), l.trip_id, l.vendor_id, l.pickup_date, l.pickup_datetime, l.dropoff_date, l.dropoff_datetime, l.store_and_fwd_flag, l.rate_code_id, l.pickup_longitude, l.pickup_latitude, l.dropoff_longitude, l.dropoff_latitude, l.passenger_count, l.trip_distance, l.fare_amount, l.extra, l.mta_tax, l.tip_amount, l.tolls_amount, l.ehail_fee, l.improvement_surcharge, l.total_amount, l.payment_type, l.trip_type, l.pickup, l.dropoff, l.cab_type, l.pickup_nyct2010_gid, l.pickup_ctlabel, l.pickup_borocode, l.pickup_ct2010, l.pickup_boroct2010, l.pickup_cdeligibil, l.pickup_ntacode, l.pickup_ntaname, l.pickup_puma, l.dropoff_nyct2010_gid, l.dropoff_ctlabel, l.dropoff_borocode, l.dropoff_ct2010, l.dropoff_boroct2010, l.dropoff_cdeligibil, l.dropoff_ntacode, l.dropoff_ntaname, l.dropoff_puma
Merge Cond: (trips.vendor_id = l.vendor_id)
-> GroupAggregate (cost=1.00..12.50 rows=200 width=40)
Output: trips.vendor_id, count(*)
Group Key: trips.vendor_id
-> Foreign Scan on taxi.trips (cost=1.00..5.50 rows=1000 width=32)
Output: trips.trip_id, trips.vendor_id, trips.pickup_date, trips.pickup_datetime, trips.dropoff_date, trips.dropoff_datetime, trips.store_and_fwd_flag, trips.rate_code_id, trips.pickup_longitude, trips.pickup_latitude, trips.dropoff_longitude, trips.dropoff_latitude, trips.passenger_count, trips.trip_distance, trips.fare_amount, trips.extra, trips.mta_tax, trips.tip_amount, trips.tolls_amount, trips.ehail_fee, trips.improvement_surcharge, trips.total_amount, trips.payment_type, trips.trip_type, trips.pickup, trips.dropoff, trips.cab_type, trips.pickup_nyct2010_gid, trips.pickup_ctlabel, trips.pickup_borocode, trips.pickup_ct2010, trips.pickup_boroct2010, trips.pickup_cdeligibil, trips.pickup_ntacode, trips.pickup_ntaname, trips.pickup_puma, trips.dropoff_nyct2010_gid, trips.dropoff_ctlabel, trips.dropoff_borocode, trips.dropoff_ct2010, trips.dropoff_boroct2010, trips.dropoff_cdeligibil, trips.dropoff_ntacode, trips.dropoff_ntaname, trips.dropoff_puma
Remote SQL: SELECT vendor_id FROM taxi.trips ORDER BY vendor_id ASC NULLS LAST
-> Sort (cost=8.32..8.57 rows=100 width=247)
Output: l.trip_id, l.vendor_id, l.pickup_date, l.pickup_datetime, l.dropoff_date, l.dropoff_datetime, l.store_and_fwd_flag, l.rate_code_id, l.pickup_longitude, l.pickup_latitude, l.dropoff_longitude, l.dropoff_latitude, l.passenger_count, l.trip_distance, l.fare_amount, l.extra, l.mta_tax, l.tip_amount, l.tolls_amount, l.ehail_fee, l.improvement_surcharge, l.total_amount, l.payment_type, l.trip_type, l.pickup, l.dropoff, l.cab_type, l.pickup_nyct2010_gid, l.pickup_ctlabel, l.pickup_borocode, l.pickup_ct2010, l.pickup_boroct2010, l.pickup_cdeligibil, l.pickup_ntacode, l.pickup_ntaname, l.pickup_puma, l.dropoff_nyct2010_gid, l.dropoff_ctlabel, l.dropoff_borocode, l.dropoff_ct2010, l.dropoff_boroct2010, l.dropoff_cdeligibil, l.dropoff_ntacode, l.dropoff_ntaname, l.dropoff_puma
Sort Key: l.vendor_id
-> Seq Scan on taxi.trips_local l (cost=0.00..5.00 rows=100 width=247)
Output: l.trip_id, l.vendor_id, l.pickup_date, l.pickup_datetime, l.dropoff_date, l.dropoff_datetime, l.store_and_fwd_flag, l.rate_code_id, l.pickup_longitude, l.pickup_latitude, l.dropoff_longitude, l.dropoff_latitude, l.passenger_count, l.trip_distance, l.fare_amount, l.extra, l.mta_tax, l.tip_amount, l.tolls_amount, l.ehail_fee, l.improvement_surcharge, l.total_amount, l.payment_type, l.trip_type, l.pickup, l.dropoff, l.cab_type, l.pickup_nyct2010_gid, l.pickup_ctlabel, l.pickup_borocode, l.pickup_ct2010, l.pickup_boroct2010, l.pickup_cdeligibil, l.pickup_ntacode, l.pickup_ntaname, l.pickup_puma, l.dropoff_nyct2010_gid, l.dropoff_ctlabel, l.dropoff_borocode, l.dropoff_ct2010, l.dropoff_boroct2010, l.dropoff_cdeligibil, l.dropoff_ntacode, l.dropoff_ntaname, l.dropoff_puma
(14 rows)
Time: 33.099 ms
Which seems wild, frankly. Figure out whether it's possible to convince PostgreSQL to push down the list of l.vendor_id values to query directly against the trips table in ClickHouse.
taxi=# explain (verbose) WITH agg AS (select vendor_id, count(*) num FROM trips group by vendor_id) select agg.vendor_id from agg;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on agg (cost=1.00..12.00 rows=200 width=32)
Output: agg.vendor_id
-> GroupAggregate (cost=1.00..10.00 rows=200 width=40)
Output: trips.vendor_id, NULL::bigint
Group Key: trips.vendor_id
-> Foreign Scan on taxi.trips (cost=1.00..5.50 rows=1000 width=32)
Output: trips.trip_id, trips.vendor_id, trips.pickup_date, trips.pickup_datetime, trips.dropoff_date, trips.dropoff_datetime, trips.store_and_fwd_flag, trips.rate_code_id, trips.pickup_longitude, trips.pickup_latitude, trips.dropoff_longitude, trips.dropoff_latitude, trips.passenger_count, trips.trip_distance, trips.fare_amount, trips.extra, trips.mta_tax, trips.tip_amount, trips.tolls_amount, trips.ehail_fee, trips.improvement_surcharge, trips.total_amount, trips.payment_type, trips.trip_type, trips.pickup, trips.dropoff, trips.cab_type, trips.pickup_nyct2010_gid, trips.pickup_ctlabel, trips.pickup_borocode, trips.pickup_ct2010, trips.pickup_boroct2010, trips.pickup_cdeligibil, trips.pickup_ntacode, trips.pickup_ntaname, trips.pickup_puma, trips.dropoff_nyct2010_gid, trips.dropoff_ctlabel, trips.dropoff_borocode, trips.dropoff_ct2010, trips.dropoff_boroct2010, trips.dropoff_cdeligibil, trips.dropoff_ntacode, trips.dropoff_ntaname, trips.dropoff_puma
Remote SQL: SELECT vendor_id FROM taxi.trips ORDER BY vendor_id ASC NULLS LAST
(8 rows)
#58 improved quite a few subquery joins, specifically SEMI JOINs, but others still don't push down consistently. An example using the taxi data from the tutorial:
Produces this plan:
Which seems wild, frankly. Figure out whether it's possible to convince PostgreSQL to push down the list of
l.vendor_idvalues to query directly against thetripstable in ClickHouse.Another
Another variant:
LIMIT Subquery
Here's another one:
Could
LIMIT 10be pushed down? Maybe it's enough to tell users to put theLIMITin the subquery: