Update sample tax bills for 2024 by jeancochrane · Pull Request #79 · ccao-data/ptaxsim

jeancochrane · 2026-03-26T23:47:38Z

This PR updates our set of sample tax bills in the data-raw/sample_tax_bills/ subdirectory to include sample bills for 2024. It builds off of #78, since sample tax bills are an important part of some unit tests.

…s-to-include-2024-data-and-new-functions

…and-new-functions' into jeancochrane/update-sample-tax-bills-for-2024

…bills_summary

…es in 2024

…he root of the project

…ills-for-2024

…lt package

jeancochrane · 2026-04-14T22:56:35Z

+#      rolling up funds and transit TIF distributions into their parent agency.
+#      In these cases, it's important that the parent agency have priority 1.


Am I using "parent agency" correctly here, and elsewhere in the PR?

This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)

That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):

Agency Name Agency Number Priority

City of Chicago - TIF Transit RPM1 030210900 1

TIF TRANSIT CITY OF CHICAGO-RPM1 030210900 1

City of Chicago - TIF RPM1 Distribution 030210900 2

Board of Education - from Transit TIF 030210900 2

Board of Education - TIF Transit RPM1 030210900 3

BOARD OF EDUCATION - TIF RPM1 030210900 3

I tweaked the docs to clarify this point in ad73e8b.

jeancochrane · 2026-04-15T16:25:17Z

+# Helper function to roll up funds and subagencies into their parent
+# agency for the purposes of reporting tax bill totals. This is useful
+# because the Clerk and the Treasurer do not fully agree on which funds
+# and subagencies should get their own line items starting in 2024, so we
+# want to ignore those differences and make sure the overall totals match
+# at the level of parent agencies
+rollup_agencies <- function(df) {
+  df %>%
+    # Filter out PINs in transit TIFs because we already know that the
+    # Treasurer calculates agency distributions differently than we do
+    filter(!pin %in% transit_tif_pins) %>%
+    # Filter out agencies with $0 bill amounts, since they seem to be
+    # especially susceptible to being left off of one side of the comparison
+    filter(final_tax > 0L) %>%
+    # Order by agency num so that we can always be sure that the parent agency
+    # number (which ends in 0) will get used as the final `agency_num`
+    # for the group
+    arrange(year, pin, agency_num) %>%
+    # Group by parent agency number
+    mutate(agency_num = substr(agency_num, 1, 8)) %>%
+    group_by(year, pin, agency_num) %>%
+    summarize(
+      agency_name = first(agency_name),
+      final_tax = sum(final_tax, na.rm = TRUE),
+      .groups = "drop"
+    ) %>%
+    # Convert output to dataframe because it is the simplest possible data
+    # structure for the purposes of comparison
+    as.data.frame()
+}


I'm curious to get your feedback on this approach! I'm not 100% confident that I'm doing this correctly, or that we should even do it at all. Here's a motivating example explaining why rolling up agencies is important to get the bill comparison tests passing:

Bill calculated using tax_bill (Clerk data)

all_bills_actual %>% filter(pin == '11302010040000', str_detect(agency_num, "^03038")) %>% select(year, pin, agency_num, agency_name, agency_tax_rate, final_tax)

year pin agency_num agency_name agency_tax_rate final_tax <int> <char> <char> <char> <num> <num> 1: 2024 11302010040000 030380000 CITY OF EVANSTON 0.01519095 1047.01

Bill extracted from Treasurer PDF

all_bills_expected %>% filter(pin == '11302010040000', str_detect(agency_num, "^03038")) %>% select(year, pin, agency_num, agency_name, rate, final_tax)

year pin agency_num agency_name rate final_tax <int> <chr> <chr> <chr> <dbl> <dbl> 1 2024 11302010040000 030380000 CITY OF EVANSTON 1.27 874. 2 2024 11302010040000 030380001 CITY OF EVANSTON LIBRARY FUND 0.231 159. 3 2024 11302010040000 030380002 CITY OF EVANSTON GENERAL ASSISTANCE 0.02 13.8

Note that the rate and tax totals match up when you roll up these Treasurer agencies. Hence, this function groups agencies by the first 8 digits of the agency num (everything before the last digit) and selects the agency with the lowest agency_num as the "parent" (in practice, this winds up being the agency whose number ends in 0). That seemed reasonable to me, but I'm curious what you think as someone who has had more experience with the new fund reporting -- is there a more comprehensive way to resolve these discrepancies using e.g. the agency crosswalk?

I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!

OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.

I actually don't think the crosswalk is the right fit for this operation; see #79 (comment).

kyrasturgill

This is incredible work! I don't really have much to add here, besides affirming that I think your parent agency roll-up method makes sense. I echo your concern about that pattern maybe not holding in future years but if it works this year, then I think it's fine to keep it.
Thank you for slogging through these changes!

kyrasturgill · 2026-04-21T21:29:17Z

+# We maintain this file by hand. When adding a new year of sample bills, you
+# will likely encounter agencies that are not yet present in this list. To


[nitpick, non-blocking] I wonder if this is likely in future years, when it's not as weird as 2024? Maybe this could be "it is possible"?

Good point! Tweaked in 6df55bb.

kyrasturgill · 2026-04-21T21:46:57Z

+#      rolling up funds and transit TIF distributions into their parent agency.
+#      In these cases, it's important that the parent agency have priority 1.


This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)

kyrasturgill · 2026-04-22T17:52:39Z

    variant = "lookup_agency_over_time"
  )
  expect_snapshot_value(
+    # We expect this to change every year as we add new bills to the summary DF


This specific test I think I just need to talk through with you to make sure I understand, but I trust your judgement here!

kyrasturgill · 2026-04-22T18:18:17Z

+# Helper function to roll up funds and subagencies into their parent
+# agency for the purposes of reporting tax bill totals. This is useful
+# because the Clerk and the Treasurer do not fully agree on which funds
+# and subagencies should get their own line items starting in 2024, so we
+# want to ignore those differences and make sure the overall totals match
+# at the level of parent agencies
+rollup_agencies <- function(df) {
+  df %>%
+    # Filter out PINs in transit TIFs because we already know that the
+    # Treasurer calculates agency distributions differently than we do
+    filter(!pin %in% transit_tif_pins) %>%
+    # Filter out agencies with $0 bill amounts, since they seem to be
+    # especially susceptible to being left off of one side of the comparison
+    filter(final_tax > 0L) %>%
+    # Order by agency num so that we can always be sure that the parent agency
+    # number (which ends in 0) will get used as the final `agency_num`
+    # for the group
+    arrange(year, pin, agency_num) %>%
+    # Group by parent agency number
+    mutate(agency_num = substr(agency_num, 1, 8)) %>%
+    group_by(year, pin, agency_num) %>%
+    summarize(
+      agency_name = first(agency_name),
+      final_tax = sum(final_tax, na.rm = TRUE),
+      .groups = "drop"
+    ) %>%
+    # Convert output to dataframe because it is the simplest possible data
+    # structure for the purposes of comparison
+    as.data.frame()
+}


I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!

…_bills_detail.R`

…uture-proof

jeancochrane

Thanks for the review @kyrasturgill! I resolved all comments that don't need further discussion. I think the only remaining question is whether it would be better to use the agency crosswalk in place of the rollup_agencies() function in the test, so as to use a consistent interface for comparing agencies across years. I think that is a desirable change for us to make, but I'm curious to get your take as a test of my understanding.

jeancochrane · 2026-04-22T19:40:16Z

+# Helper function to roll up funds and subagencies into their parent
+# agency for the purposes of reporting tax bill totals. This is useful
+# because the Clerk and the Treasurer do not fully agree on which funds
+# and subagencies should get their own line items starting in 2024, so we
+# want to ignore those differences and make sure the overall totals match
+# at the level of parent agencies
+rollup_agencies <- function(df) {
+  df %>%
+    # Filter out PINs in transit TIFs because we already know that the
+    # Treasurer calculates agency distributions differently than we do
+    filter(!pin %in% transit_tif_pins) %>%
+    # Filter out agencies with $0 bill amounts, since they seem to be
+    # especially susceptible to being left off of one side of the comparison
+    filter(final_tax > 0L) %>%
+    # Order by agency num so that we can always be sure that the parent agency
+    # number (which ends in 0) will get used as the final `agency_num`
+    # for the group
+    arrange(year, pin, agency_num) %>%
+    # Group by parent agency number
+    mutate(agency_num = substr(agency_num, 1, 8)) %>%
+    group_by(year, pin, agency_num) %>%
+    summarize(
+      agency_name = first(agency_name),
+      final_tax = sum(final_tax, na.rm = TRUE),
+      .groups = "drop"
+    ) %>%
+    # Convert output to dataframe because it is the simplest possible data
+    # structure for the purposes of comparison
+    as.data.frame()
+}


OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.

jeancochrane · 2026-04-22T19:41:47Z

+# We maintain this file by hand. When adding a new year of sample bills, you
+# will likely encounter agencies that are not yet present in this list. To


Good point! Tweaked in 6df55bb.

jeancochrane · 2026-04-22T19:41:51Z

+#      rolling up funds and transit TIF distributions into their parent agency.
+#      In these cases, it's important that the parent agency have priority 1.


That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):

Agency Name Agency Number Priority

City of Chicago - TIF Transit RPM1 030210900 1

TIF TRANSIT CITY OF CHICAGO-RPM1 030210900 1

City of Chicago - TIF RPM1 Distribution 030210900 2

Board of Education - from Transit TIF 030210900 2

Board of Education - TIF Transit RPM1 030210900 3

BOARD OF EDUCATION - TIF RPM1 030210900 3

I tweaked the docs to clarify this point in ad73e8b.

jeancochrane · 2026-04-22T20:15:59Z

    variant = "lookup_agency_over_time"
  )
  expect_snapshot_value(
+    # We expect this to change every year as we add new bills to the summary DF


We talked this through in person and decided that it would be better to restrict the set of years for this test so that it doesn't fail every time we add new years of sample bills. I updated the test to do that in 7b3104d and added an extra comment clarifying the purpose of these snapshot tests.

jeancochrane · 2026-04-28T16:47:15Z

Replying to my question from above:

I think the only remaining question is whether it would be better to use the agency crosswalk in place of the rollup_agencies() function in the test, so as to use a consistent interface for comparing agencies across years.

After poking around at it, I think this is actually not a feasibile idea. Though some of the mismatches could be resolved with the crosswalk, ~49 of the mismatches are for years before 2024, so the agency-to-fund change is not the primary driver of these differences.

jeancochrane added 3 commits March 25, 2026 16:40

Update unit tests for new 2024 data and functions

9a4f8a4

Update sample tax bills for 2024

f05e905

Fix a few final lines of test-tax_bill.R

ef48dc8

jeancochrane mentioned this pull request Mar 27, 2026

Update unit tests for new 2024 data and functions #78

Merged

jeancochrane added 3 commits March 30, 2026 11:34

Add to vetdis calc in tests

28caaab

Merge branch '2024-data-update' into jeancochrane/72-update-unit-test…

dadbe77

…s-to-include-2024-data-and-new-functions

Merge branch 'jeancochrane/72-update-unit-tests-to-include-2024-data-…

b0684df

…and-new-functions' into jeancochrane/update-sample-tax-bills-for-2024

Base automatically changed from jeancochrane/72-update-unit-tests-to-include-2024-data-and-new-functions to 2024-data-update April 2, 2026 16:29

jeancochrane added 3 commits April 2, 2026 11:59

Further updates to sample tax bills, and test contents of sample_tax_…

dac3bb4

…bills_summary

Update tax_bill test to account for fund-level reporting differenci…

cf2a2a0

…es in 2024

Update test-tax_bill.R to finish 2024 sample bill tests

e0dcd36

jeancochrane mentioned this pull request Apr 14, 2026

Upgrade to testthat 3rd edition #81

Closed

jeancochrane added 7 commits April 14, 2026 17:14

Make sure test-sample_tax_bills_summary.R always runs relative to t…

ea1c91c

…he root of the project

Split agency lookup snapshot tests out into dedicated test

3f707fc

Merge branch '2024-data-update' into jeancochrane/update-sample-tax-b…

0746750

…ills-for-2024

Update lookup_agency snapshot tests for 2024 data

d636e00

Fail loudly if sample_tax_bills directory doesn't exist in tests

b22a117

Update test-sample_tax_bills_summary.R so that tests can run on bui…

e984b74

…lt package

Small tweak and add docs to tax_bill comparison test

cc0752c

jeancochrane commented Apr 15, 2026

View reviewed changes

jeancochrane marked this pull request as ready for review April 15, 2026 17:10

jeancochrane requested a review from kyrasturgill as a code owner April 15, 2026 17:10

kyrasturgill approved these changes Apr 22, 2026

View reviewed changes

jeancochrane added 3 commits April 22, 2026 14:12

Add some uncertainty to language about future agencies in `sample_tax…

6df55bb

…_bills_detail.R`

Clarify the way name_priority works for transit TIFs in `sample_tax…

ad73e8b

…_bills_detail.R`

Better comments on lookup_agency() snapshot tests, plus make them f…

7b3104d

…uture-proof

jeancochrane commented Apr 22, 2026

View reviewed changes

jeancochrane merged commit ceb577e into 2024-data-update Apr 28, 2026
7 checks passed

jeancochrane deleted the jeancochrane/update-sample-tax-bills-for-2024 branch April 28, 2026 16:48

		# rolling up funds and transit TIF distributions into their parent agency.
		# In these cases, it's important that the parent agency have priority 1.

Agency Name	Agency Number	Priority
City of Chicago - TIF Transit RPM1	030210900	1
TIF TRANSIT CITY OF CHICAGO-RPM1	030210900	1
City of Chicago - TIF RPM1 Distribution	030210900	2
Board of Education - from Transit TIF	030210900	2
Board of Education - TIF Transit RPM1	030210900	3
BOARD OF EDUCATION - TIF RPM1	030210900	3

		# We maintain this file by hand. When adding a new year of sample bills, you
		# will likely encounter agencies that are not yet present in this list. To

Conversation

jeancochrane commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Bill calculated using tax_bill (Clerk data)

Bill extracted from Treasurer PDF

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyrasturgill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeancochrane commented Mar 26, 2026 •

edited

Loading

Bill calculated using `tax_bill` (Clerk data)