Update sample tax bills for 2024#79
Conversation
…s-to-include-2024-data-and-new-functions
…and-new-functions' into jeancochrane/update-sample-tax-bills-for-2024
…he root of the project
| # rolling up funds and transit TIF distributions into their parent agency. | ||
| # In these cases, it's important that the parent agency have priority 1. |
There was a problem hiding this comment.
Am I using "parent agency" correctly here, and elsewhere in the PR?
There was a problem hiding this comment.
This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)
There was a problem hiding this comment.
That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):
| Agency Name | Agency Number | Priority |
|---|---|---|
| City of Chicago - TIF Transit RPM1 | 030210900 | 1 |
| TIF TRANSIT CITY OF CHICAGO-RPM1 | 030210900 | 1 |
| City of Chicago - TIF RPM1 Distribution | 030210900 | 2 |
| Board of Education - from Transit TIF | 030210900 | 2 |
| Board of Education - TIF Transit RPM1 | 030210900 | 3 |
| BOARD OF EDUCATION - TIF RPM1 | 030210900 | 3 |
I tweaked the docs to clarify this point in ad73e8b.
| # Helper function to roll up funds and subagencies into their parent | ||
| # agency for the purposes of reporting tax bill totals. This is useful | ||
| # because the Clerk and the Treasurer do not fully agree on which funds | ||
| # and subagencies should get their own line items starting in 2024, so we | ||
| # want to ignore those differences and make sure the overall totals match | ||
| # at the level of parent agencies | ||
| rollup_agencies <- function(df) { | ||
| df %>% | ||
| # Filter out PINs in transit TIFs because we already know that the | ||
| # Treasurer calculates agency distributions differently than we do | ||
| filter(!pin %in% transit_tif_pins) %>% | ||
| # Filter out agencies with $0 bill amounts, since they seem to be | ||
| # especially susceptible to being left off of one side of the comparison | ||
| filter(final_tax > 0L) %>% | ||
| # Order by agency num so that we can always be sure that the parent agency | ||
| # number (which ends in 0) will get used as the final `agency_num` | ||
| # for the group | ||
| arrange(year, pin, agency_num) %>% | ||
| # Group by parent agency number | ||
| mutate(agency_num = substr(agency_num, 1, 8)) %>% | ||
| group_by(year, pin, agency_num) %>% | ||
| summarize( | ||
| agency_name = first(agency_name), | ||
| final_tax = sum(final_tax, na.rm = TRUE), | ||
| .groups = "drop" | ||
| ) %>% | ||
| # Convert output to dataframe because it is the simplest possible data | ||
| # structure for the purposes of comparison | ||
| as.data.frame() | ||
| } |
There was a problem hiding this comment.
I'm curious to get your feedback on this approach! I'm not 100% confident that I'm doing this correctly, or that we should even do it at all. Here's a motivating example explaining why rolling up agencies is important to get the bill comparison tests passing:
Bill calculated using tax_bill (Clerk data)
all_bills_actual %>%
filter(pin == '11302010040000', str_detect(agency_num, "^03038")) %>%
select(year, pin, agency_num, agency_name, agency_tax_rate, final_tax) year pin agency_num agency_name agency_tax_rate final_tax
<int> <char> <char> <char> <num> <num>
1: 2024 11302010040000 030380000 CITY OF EVANSTON 0.01519095 1047.01
Bill extracted from Treasurer PDF
all_bills_expected %>%
filter(pin == '11302010040000', str_detect(agency_num, "^03038")) %>%
select(year, pin, agency_num, agency_name, rate, final_tax) year pin agency_num agency_name rate final_tax
<int> <chr> <chr> <chr> <dbl> <dbl>
1 2024 11302010040000 030380000 CITY OF EVANSTON 1.27 874.
2 2024 11302010040000 030380001 CITY OF EVANSTON LIBRARY FUND 0.231 159.
3 2024 11302010040000 030380002 CITY OF EVANSTON GENERAL ASSISTANCE 0.02 13.8
Note that the rate and tax totals match up when you roll up these Treasurer agencies. Hence, this function groups agencies by the first 8 digits of the agency num (everything before the last digit) and selects the agency with the lowest agency_num as the "parent" (in practice, this winds up being the agency whose number ends in 0). That seemed reasonable to me, but I'm curious what you think as someone who has had more experience with the new fund reporting -- is there a more comprehensive way to resolve these discrepancies using e.g. the agency crosswalk?
There was a problem hiding this comment.
I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!
There was a problem hiding this comment.
OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.
There was a problem hiding this comment.
I actually don't think the crosswalk is the right fit for this operation; see #79 (comment).
kyrasturgill
left a comment
There was a problem hiding this comment.
This is incredible work! I don't really have much to add here, besides affirming that I think your parent agency roll-up method makes sense. I echo your concern about that pattern maybe not holding in future years but if it works this year, then I think it's fine to keep it.
Thank you for slogging through these changes!
| # We maintain this file by hand. When adding a new year of sample bills, you | ||
| # will likely encounter agencies that are not yet present in this list. To |
There was a problem hiding this comment.
[nitpick, non-blocking] I wonder if this is likely in future years, when it's not as weird as 2024? Maybe this could be "it is possible"?
| # rolling up funds and transit TIF distributions into their parent agency. | ||
| # In these cases, it's important that the parent agency have priority 1. |
There was a problem hiding this comment.
This makes sense to me! The only thing I was slightly confused by was the transit TIF having a "parent" agency. For those cases, is the fund with priority 1 just the name of the transit TIF (no specification of CPS or other agency distribution?)
| variant = "lookup_agency_over_time" | ||
| ) | ||
| expect_snapshot_value( | ||
| # We expect this to change every year as we add new bills to the summary DF |
There was a problem hiding this comment.
This specific test I think I just need to talk through with you to make sure I understand, but I trust your judgement here!
| # Helper function to roll up funds and subagencies into their parent | ||
| # agency for the purposes of reporting tax bill totals. This is useful | ||
| # because the Clerk and the Treasurer do not fully agree on which funds | ||
| # and subagencies should get their own line items starting in 2024, so we | ||
| # want to ignore those differences and make sure the overall totals match | ||
| # at the level of parent agencies | ||
| rollup_agencies <- function(df) { | ||
| df %>% | ||
| # Filter out PINs in transit TIFs because we already know that the | ||
| # Treasurer calculates agency distributions differently than we do | ||
| filter(!pin %in% transit_tif_pins) %>% | ||
| # Filter out agencies with $0 bill amounts, since they seem to be | ||
| # especially susceptible to being left off of one side of the comparison | ||
| filter(final_tax > 0L) %>% | ||
| # Order by agency num so that we can always be sure that the parent agency | ||
| # number (which ends in 0) will get used as the final `agency_num` | ||
| # for the group | ||
| arrange(year, pin, agency_num) %>% | ||
| # Group by parent agency number | ||
| mutate(agency_num = substr(agency_num, 1, 8)) %>% | ||
| group_by(year, pin, agency_num) %>% | ||
| summarize( | ||
| agency_name = first(agency_name), | ||
| final_tax = sum(final_tax, na.rm = TRUE), | ||
| .groups = "drop" | ||
| ) %>% | ||
| # Convert output to dataframe because it is the simplest possible data | ||
| # structure for the purposes of comparison | ||
| as.data.frame() | ||
| } |
There was a problem hiding this comment.
I like this approach a lot actually! It is more intuitively clear to me than the agency crosswalk. The only flag I'd raise is if this pattern does not for some reason hold true in the future - perhaps another question we could pose to the Clerk?
But as long as it's working for the current agency structure, then I think this approach is worth keeping!
jeancochrane
left a comment
There was a problem hiding this comment.
Thanks for the review @kyrasturgill! I resolved all comments that don't need further discussion. I think the only remaining question is whether it would be better to use the agency crosswalk in place of the rollup_agencies() function in the test, so as to use a consistent interface for comparing agencies across years. I think that is a desirable change for us to make, but I'm curious to get your take as a test of my understanding.
| # Helper function to roll up funds and subagencies into their parent | ||
| # agency for the purposes of reporting tax bill totals. This is useful | ||
| # because the Clerk and the Treasurer do not fully agree on which funds | ||
| # and subagencies should get their own line items starting in 2024, so we | ||
| # want to ignore those differences and make sure the overall totals match | ||
| # at the level of parent agencies | ||
| rollup_agencies <- function(df) { | ||
| df %>% | ||
| # Filter out PINs in transit TIFs because we already know that the | ||
| # Treasurer calculates agency distributions differently than we do | ||
| filter(!pin %in% transit_tif_pins) %>% | ||
| # Filter out agencies with $0 bill amounts, since they seem to be | ||
| # especially susceptible to being left off of one side of the comparison | ||
| filter(final_tax > 0L) %>% | ||
| # Order by agency num so that we can always be sure that the parent agency | ||
| # number (which ends in 0) will get used as the final `agency_num` | ||
| # for the group | ||
| arrange(year, pin, agency_num) %>% | ||
| # Group by parent agency number | ||
| mutate(agency_num = substr(agency_num, 1, 8)) %>% | ||
| group_by(year, pin, agency_num) %>% | ||
| summarize( | ||
| agency_name = first(agency_name), | ||
| final_tax = sum(final_tax, na.rm = TRUE), | ||
| .groups = "drop" | ||
| ) %>% | ||
| # Convert output to dataframe because it is the simplest possible data | ||
| # structure for the purposes of comparison | ||
| as.data.frame() | ||
| } |
There was a problem hiding this comment.
OK cool! Out of curiosity, do you think we could use the crosswalk to accomplish this same goal? If so, I think it might be preferable, just so that we can stay consistent with the abstractions that we use to handle this type of problem.
| # We maintain this file by hand. When adding a new year of sample bills, you | ||
| # will likely encounter agencies that are not yet present in this list. To |
| # rolling up funds and transit TIF distributions into their parent agency. | ||
| # In these cases, it's important that the parent agency have priority 1. |
There was a problem hiding this comment.
That's a great question! I believe your summary is correct. To verify, here are all the rows containing transit TIFs (I added the agencies with all caps names, which is the new 2024 format):
| Agency Name | Agency Number | Priority |
|---|---|---|
| City of Chicago - TIF Transit RPM1 | 030210900 | 1 |
| TIF TRANSIT CITY OF CHICAGO-RPM1 | 030210900 | 1 |
| City of Chicago - TIF RPM1 Distribution | 030210900 | 2 |
| Board of Education - from Transit TIF | 030210900 | 2 |
| Board of Education - TIF Transit RPM1 | 030210900 | 3 |
| BOARD OF EDUCATION - TIF RPM1 | 030210900 | 3 |
I tweaked the docs to clarify this point in ad73e8b.
| variant = "lookup_agency_over_time" | ||
| ) | ||
| expect_snapshot_value( | ||
| # We expect this to change every year as we add new bills to the summary DF |
There was a problem hiding this comment.
We talked this through in person and decided that it would be better to restrict the set of years for this test so that it doesn't fail every time we add new years of sample bills. I updated the test to do that in 7b3104d and added an extra comment clarifying the purpose of these snapshot tests.
|
Replying to my question from above:
After poking around at it, I think this is actually not a feasibile idea. Though some of the mismatches could be resolved with the crosswalk, ~49 of the mismatches are for years before 2024, so the agency-to-fund change is not the primary driver of these differences. |
This PR updates our set of sample tax bills in the
data-raw/sample_tax_bills/subdirectory to include sample bills for 2024. It builds off of #78, since sample tax bills are an important part of some unit tests.