Re-work multi-card values and add multi-card pins to challenge groups #328

wagnerlmichael · 2025-01-14T21:39:05Z

This PR updates two things.

Multi-card assessment value calculation
Adds multi-card pins to the challenge groups report

Previously we were using either simple aggregation of the cards' predicted values or a single model prediction of the card with the largest sqft. The decision on which option to use was based on a YoY cap that tracked YoY % changes in value. We tended to over-predict these values. My hypothesis is that because in a way we are gaining value from the location data twice if we predict on multiple cards with the same location data. Since location data is the bulk of the model's value, it is hard not to overpredict.

The new strategy relies on choosing the single card with the highest square footage from a multi-card property, but it then adjusts that card’s square footage by adding the square footage of the remaining cards. By folding the entire property’s building area into one card, the model produces a single prediction, predicting on the location data a single time. The values with this method look much better. For more information on the results and the different specifications tested, see this issue, or this report.

We are going to implement this strategy for multi-cards pins of 2-3 cards. These represent the vast majority of multi-card properties. For 4+, due to data problems/whacky pin shapes, we are going to keep the prior method of predicting each card individually and then summing them. For example, a multi-card sale could have a sale price representing a single card, where in our data is it attached to a full pin.

We tested including this square foot aggregation strategy in the training data in addition to doing it in the assess stage. The multi-card predictions were better, but at the cost of worse overall model performance, next steps for multi-cards might be trying to debug this.

However, with the assess stage aggregation strategy we still see substantial performance gains. More details on these figures in report

MdAPE: Drops 57% (67.02 -> 28.8)
RMSE: Drops 42% (879077 -> 508793)

wagnerlmichael · 2025-01-16T20:25:12Z

pipeline/02-assess.R

+    total_fmv = sum(pred_card_initial_fmv, na.rm = TRUE),
+    total_bldg_sf_pin = sum(char_bldg_sf, na.rm = TRUE),
+    share_bldg_sf = char_bldg_sf / total_bldg_sf_pin,
+    pred_card_initial_fmv = total_fmv * share_bldg_sf


There is a small amount of multi-card pins with a card having a sqft of 0, which results in the price for the card being 0. I'm not sure how to handle this. We could just make some fixed cut if that is the case. I'm not super clear on downstream exposure of these. Open to ideas

wagnerlmichael · 2025-01-17T16:36:32Z

reports/challenge_groups/challenge_groups.qmd

-  run_id: "2024-03-17-stupefied-maya"
-  year: "2024"
+  run_id: "2025-01-10-serene-boni"
+  year: "2025"


Maybe I'll leave this just because it is the current baseline?

wagnerlmichael · 2025-01-24T17:09:07Z

reports/challenge_groups/challenge_groups.qmd

+The sales data we use to measure accuracy is the most recent sale per multi-card
+pin if there was one after 2020.
+
+```{r _decile_ratio_graph}


Ratio decile graph based on sale_recent_1_price after 2020

wagnerlmichael · 2025-01-24T17:09:16Z

reports/challenge_groups/challenge_groups.qmd

+p_deciles
+```
+
+```{r _scatterplot_pred_vs_sale}


Interactive scatter

…tion

* Revert MC data munging * Simplify multi-card handling code

dfsnow

The method and results here look sound. I double-checked everything by comparing the predictions here against the most recent baseline run; everything looks good. Nice work @wagnerlmichael.

…#328) * Add some additive solutions and checks for single card values * Remove other methods * Clean up multi-card edit and replace paths * Remove spaces * De-aggregate card preds * Format * Shorten ariable name * Format variable name length * Format variable name length * Add space back * Remove strings * Add back group by * Add multi-card analysis for challenge groups * Format * Format * Clean up comment * Fix pin aggregation * Switch strategy between 2-3 cards and 4+ cards * Lint * Fix decile calculation * Simplify new multi-card method (#335) * Revert MC data munging * Simplify multi-card handling code --------- Co-authored-by: Dan Snow <dan@sno.ws> Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>

Add some additive solutions and checks for single card values

d489c41

wagnerlmichael linked an issue Jan 15, 2025 that may be closed by this pull request

Improve modeling multi-cards #228

Closed

wagnerlmichael added 8 commits January 15, 2025 22:15

Remove other methods

f0784a5

Clean up multi-card edit and replace paths

05e19ce

Remove spaces

4e138e7

De-aggregate card preds

b6e6bf0

Format

4864530

Shorten ariable name

d63f5d9

Format variable name length

7597b49

Format variable name length

3c5ba1f

wagnerlmichael commented Jan 16, 2025

View reviewed changes

wagnerlmichael added 3 commits January 16, 2025 20:25

Add space back

62b08ec

Remove strings

dd08c0f

Add back group by

cce9520

wagnerlmichael changed the title ~~Add some additive solutions and checks for single card values~~ Re-work multi-card values and add multi-card pins to challenge groups Jan 17, 2025

Add multi-card analysis for challenge groups

5de1919

wagnerlmichael commented Jan 17, 2025

View reviewed changes

wagnerlmichael marked this pull request as ready for review January 17, 2025 16:37

wagnerlmichael requested review from dfsnow, wrridgeway and jeancochrane as code owners January 17, 2025 16:37

wagnerlmichael added 7 commits January 17, 2025 16:40

Format

c4f51e2

Format

0478ac6

Clean up comment

6e5c909

Fix pin aggregation

9234a31

Switch strategy between 2-3 cards and 4+ cards

4ce1912

Lint

344e76f

Fix decile calculation

cd06984

wagnerlmichael commented Jan 24, 2025

View reviewed changes

dfsnow and others added 2 commits January 27, 2025 21:17

Merge branch '2025-assessment-year' into try-additive-multi-card-solu…

7fb14d4

…tion

Simplify new multi-card method (#335)

9c47e1b

* Revert MC data munging * Simplify multi-card handling code

dfsnow approved these changes Jan 28, 2025

View reviewed changes

dfsnow merged commit d45bb26 into 2025-assessment-year Jan 28, 2025
4 checks passed

dfsnow deleted the try-additive-multi-card-solution branch January 28, 2025 16:46

dfsnow mentioned this pull request Feb 20, 2025

Update README and default run_id for 2025 #352

Merged

jeancochrane mentioned this pull request Mar 5, 2025

Fix SHAPs and comps for multicard properties #358

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-work multi-card values and add multi-card pins to challenge groups #328

Re-work multi-card values and add multi-card pins to challenge groups #328

wagnerlmichael commented Jan 14, 2025 •

edited by dfsnow

Loading

wagnerlmichael Jan 16, 2025 •

edited

Loading

wagnerlmichael Jan 17, 2025

wagnerlmichael Jan 24, 2025

wagnerlmichael Jan 24, 2025

dfsnow left a comment

Re-work multi-card values and add multi-card pins to challenge groups #328

Re-work multi-card values and add multi-card pins to challenge groups #328

Conversation

wagnerlmichael commented Jan 14, 2025 • edited by dfsnow Loading

wagnerlmichael Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

wagnerlmichael Jan 17, 2025

Choose a reason for hiding this comment

wagnerlmichael Jan 24, 2025

Choose a reason for hiding this comment

wagnerlmichael Jan 24, 2025

Choose a reason for hiding this comment

dfsnow left a comment

Choose a reason for hiding this comment

wagnerlmichael commented Jan 14, 2025 •

edited by dfsnow

Loading

wagnerlmichael Jan 16, 2025 •

edited

Loading