The *40 routine feels a bit questionable. Looks like it'll lose some bits. 24*40 is a 10-bit quantity, and you'll need to track the last 2 shifted-out bits. Throwing one away and using the other to add back into the LSB is probably not the best idea.This sort of thing is written out three times in the code: https://github.com/dexmac221/C64AIToolChain/blob/a7bbc568c0e..., https://github.com/dexmac221/C64AIToolChain/blob/a7bbc568c0e..., https://github.com/dexmac221/C64AIToolChain/blob/a7bbc568c0e...
I'm not going to spend any time doing the computer's job for it by carefully checking every single one against the other, something it can do with perfect accuracy and no mistakes. Nor am I going to double check whether I missed any, which I probably did, because, ditto.
But, looking at the one at line 541, I might be inclined to suggest the following instead. Please step through this in the debugger or whatever though, rather rather than just taking my word for it.
(to save line count, I've put multiple instructions on a line, separated by ':')
lda tmp_hi:asl:asl:asl:sta ptr_lo ; y*8, an 8-bit quantity
asl:rol ptr_hi ; y*16, a 9-bit quantity
asl:rol ptr_hi ; y*32, a 10-bit quantity
clc:adc ptr_lo:sta ptr_lo ; <(y*32)+y*8=<(y*40)
lda ptr_hi:and #3:adc #0:sta ptr_hi ; >(y*32)+carry out=>(y*40)