Skip to content

Update sample tax bills for 2024#79

Merged
jeancochrane merged 19 commits into
2024-data-updatefrom
jeancochrane/update-sample-tax-bills-for-2024
Apr 28, 2026
Merged

Update sample tax bills for 2024#79
jeancochrane merged 19 commits into
2024-data-updatefrom
jeancochrane/update-sample-tax-bills-for-2024

Conversation

@jeancochrane
Copy link
Copy Markdown
Member

@jeancochrane jeancochrane commented Mar 26, 2026

This PR updates our set of sample tax bills in the data-raw/sample_tax_bills/ subdirectory to include sample bills for 2024. It builds off of #78, since sample tax bills are an important part of some unit tests.

Base automatically changed from jeancochrane/72-update-unit-tests-to-include-2024-data-and-new-functions to 2024-data-update April 2, 2026 16:29
Comment thread data-raw/sample_tax_bills/agency_name_match.csv
Comment thread data-raw/sample_tax_bills/sample_tax_bills_detail.csv
Comment thread data-raw/sample_tax_bills/sample_tax_bills_detail.R
Comment on lines +129 to +130
# rolling up funds and transit TIF distributions into their parent agency.
# In these cases, it's important that the parent agency have priority 1.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I using "parent agency" correctly here, and elsewhere in the PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):

Agency Name Agency Number Priority
City of Chicago - TIF Transit RPM1 030210900 1
TIF TRANSIT CITY OF CHICAGO-RPM1 030210900 1
City of Chicago - TIF RPM1 Distribution 030210900 2
Board of Education - from Transit TIF 030210900 2
Board of Education - TIF Transit RPM1 030210900 3
BOARD OF EDUCATION - TIF RPM1 030210900 3

I tweaked the docs to clarify this point in ad73e8b.

Comment thread data-raw/sample_tax_bills/sample_tax_bills_detail.R
Comment on lines +170 to +199
# Helper function to roll up funds and subagencies into their parent
# agency for the purposes of reporting tax bill totals. This is useful
# because the Clerk and the Treasurer do not fully agree on which funds
# and subagencies should get their own line items starting in 2024, so we
# want to ignore those differences and make sure the overall totals match
# at the level of parent agencies
rollup_agencies <- function(df) {
df %>%
# Filter out PINs in transit TIFs because we already know that the
# Treasurer calculates agency distributions differently than we do
filter(!pin %in% transit_tif_pins) %>%
# Filter out agencies with $0 bill amounts, since they seem to be
# especially susceptible to being left off of one side of the comparison
filter(final_tax > 0L) %>%
# Order by agency num so that we can always be sure that the parent agency
# number (which ends in 0) will get used as the final `agency_num`
# for the group
arrange(year, pin, agency_num) %>%
# Group by parent agency number
mutate(agency_num = substr(agency_num, 1, 8)) %>%
group_by(year, pin, agency_num) %>%
summarize(
agency_name = first(agency_name),
final_tax = sum(final_tax, na.rm = TRUE),
.groups = "drop"
) %>%
# Convert output to dataframe because it is the simplest possible data
# structure for the purposes of comparison
as.data.frame()
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to get your feedback on this approach! I'm not 100% confident that I'm doing this correctly, or that we should even do it at all. Here's a motivating example explaining why rolling up agencies is important to get the bill comparison tests passing:

Bill calculated using tax_bill (Clerk data)

all_bills_actual %>%
  filter(pin == '11302010040000', str_detect(agency_num"^03038")) %>%
  select(yearpinagency_numagency_nameagency_tax_ratefinal_tax)
    year            pin agency_num      agency_name agency_tax_rate final_tax
   <int>         <char>     <char>           <char>           <num>     <num>
1:  2024 11302010040000  030380000 CITY OF EVANSTON      0.01519095   1047.01

Bill extracted from Treasurer PDF

all_bills_expected %>%
  filter(pin == '11302010040000', str_detect(agency_num, "^03038")) %>%
  select(year, pin, agency_num, agency_name, rate, final_tax)
   year pin            agency_num agency_name                          rate final_tax
  <int> <chr>          <chr>      <chr>                               <dbl>     <dbl>
1  2024 11302010040000 030380000  CITY OF EVANSTON                    1.27      874. 
2  2024 11302010040000 030380001  CITY OF EVANSTON LIBRARY FUND       0.231     159. 
3  2024 11302010040000 030380002  CITY OF EVANSTON GENERAL ASSISTANCE 0.02      13.8

Note that the rate and tax totals match up when you roll up these Treasurer agencies. Hence, this function groups agencies by the first 8 digits of the agency num (everything before the last digit) and selects the agency with the lowest agency_num as the "parent" (in practice, this winds up being the agency whose number ends in 0). That seemed reasonable to me, but I'm curious what you think as someone who has had more experience with the new fund reporting -- is there a more comprehensive way to resolve these discrepancies using e.g. the agency crosswalk?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think the crosswalk is the right fit for this operation; see #79 (comment).

Comment thread tests/testthat/test-tax_bill.R
Comment thread tests/testthat/test-tax_bill.R
Comment thread tests/testthat/test-tax_bill.R
Comment thread tests/testthat/test-tax_bill.R
@jeancochrane jeancochrane marked this pull request as ready for review April 15, 2026 17:10
Copy link
Copy Markdown
Member

@kyrasturgill kyrasturgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredible work! I don't really have much to add here, besides affirming that I think your parent agency roll-up method makes sense. I echo your concern about that pattern maybe not holding in future years but if it works this year, then I think it's fine to keep it.
Thank you for slogging through these changes!

Comment on lines +102 to +103
# We maintain this file by hand. When adding a new year of sample bills, you
# will likely encounter agencies that are not yet present in this list. To
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick, non-blocking] I wonder if this is likely in future years, when it's not as weird as 2024? Maybe this could be "it is possible"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Tweaked in 6df55bb.

Comment on lines +129 to +130
# rolling up funds and transit TIF distributions into their parent agency.
# In these cases, it's important that the parent agency have priority 1.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)

Comment thread tests/testthat/test-lookup.R Outdated
variant = "lookup_agency_over_time"
)
expect_snapshot_value(
# We expect this to change every year as we add new bills to the summary DF
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific test I think I just need to talk through with you to make sure I understand, but I trust your judgement here!

Comment thread tests/testthat/test-sample_tax_bills_summary.R
Comment on lines +170 to +199
# Helper function to roll up funds and subagencies into their parent
# agency for the purposes of reporting tax bill totals. This is useful
# because the Clerk and the Treasurer do not fully agree on which funds
# and subagencies should get their own line items starting in 2024, so we
# want to ignore those differences and make sure the overall totals match
# at the level of parent agencies
rollup_agencies <- function(df) {
df %>%
# Filter out PINs in transit TIFs because we already know that the
# Treasurer calculates agency distributions differently than we do
filter(!pin %in% transit_tif_pins) %>%
# Filter out agencies with $0 bill amounts, since they seem to be
# especially susceptible to being left off of one side of the comparison
filter(final_tax > 0L) %>%
# Order by agency num so that we can always be sure that the parent agency
# number (which ends in 0) will get used as the final `agency_num`
# for the group
arrange(year, pin, agency_num) %>%
# Group by parent agency number
mutate(agency_num = substr(agency_num, 1, 8)) %>%
group_by(year, pin, agency_num) %>%
summarize(
agency_name = first(agency_name),
final_tax = sum(final_tax, na.rm = TRUE),
.groups = "drop"
) %>%
# Convert output to dataframe because it is the simplest possible data
# structure for the purposes of comparison
as.data.frame()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!

Comment thread tests/testthat/test-tax_bill.R
Comment thread tests/testthat/test-tax_bill.R
Copy link
Copy Markdown
Member Author

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @kyrasturgill! I resolved all comments that don't need further discussion. I think the only remaining question is whether it would be better to use the agency crosswalk in place of the rollup_agencies() function in the test, so as to use a consistent interface for comparing agencies across years. I think that is a desirable change for us to make, but I'm curious to get your take as a test of my understanding.

Comment on lines +170 to +199
# Helper function to roll up funds and subagencies into their parent
# agency for the purposes of reporting tax bill totals. This is useful
# because the Clerk and the Treasurer do not fully agree on which funds
# and subagencies should get their own line items starting in 2024, so we
# want to ignore those differences and make sure the overall totals match
# at the level of parent agencies
rollup_agencies <- function(df) {
df %>%
# Filter out PINs in transit TIFs because we already know that the
# Treasurer calculates agency distributions differently than we do
filter(!pin %in% transit_tif_pins) %>%
# Filter out agencies with $0 bill amounts, since they seem to be
# especially susceptible to being left off of one side of the comparison
filter(final_tax > 0L) %>%
# Order by agency num so that we can always be sure that the parent agency
# number (which ends in 0) will get used as the final `agency_num`
# for the group
arrange(year, pin, agency_num) %>%
# Group by parent agency number
mutate(agency_num = substr(agency_num, 1, 8)) %>%
group_by(year, pin, agency_num) %>%
summarize(
agency_name = first(agency_name),
final_tax = sum(final_tax, na.rm = TRUE),
.groups = "drop"
) %>%
# Convert output to dataframe because it is the simplest possible data
# structure for the purposes of comparison
as.data.frame()
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.

Comment on lines +102 to +103
# We maintain this file by hand. When adding a new year of sample bills, you
# will likely encounter agencies that are not yet present in this list. To
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Tweaked in 6df55bb.

Comment on lines +129 to +130
# rolling up funds and transit TIF distributions into their parent agency.
# In these cases, it's important that the parent agency have priority 1.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):

Agency Name Agency Number Priority
City of Chicago - TIF Transit RPM1 030210900 1
TIF TRANSIT CITY OF CHICAGO-RPM1 030210900 1
City of Chicago - TIF RPM1 Distribution 030210900 2
Board of Education - from Transit TIF 030210900 2
Board of Education - TIF Transit RPM1 030210900 3
BOARD OF EDUCATION - TIF RPM1 030210900 3

I tweaked the docs to clarify this point in ad73e8b.

Comment thread tests/testthat/test-lookup.R Outdated
variant = "lookup_agency_over_time"
)
expect_snapshot_value(
# We expect this to change every year as we add new bills to the summary DF
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked this through in person and decided that it would be better to restrict the set of years for this test so that it doesn't fail every time we add new years of sample bills. I updated the test to do that in 7b3104d and added an extra comment clarifying the purpose of these snapshot tests.

@jeancochrane
Copy link
Copy Markdown
Member Author

Replying to my question from above:

I think the only remaining question is whether it would be better to use the agency crosswalk in place of the rollup_agencies() function in the test, so as to use a consistent interface for comparing agencies across years.

After poking around at it, I think this is actually not a feasibile idea. Though some of the mismatches could be resolved with the crosswalk, ~49 of the mismatches are for years before 2024, so the agency-to-fund change is not the primary driver of these differences.

@jeancochrane jeancochrane merged commit ceb577e into 2024-data-update Apr 28, 2026
7 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/update-sample-tax-bills-for-2024 branch April 28, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants