3/27/2025 at 1:39:05 PM
Maybe something worth exploring, run it like an inverted generational garbage collector.Separate the problematic domains and run them in a lower priority background crawl; then do the opposite as well, have a higher priority list for crawling sites that have regular updates more often, based on their last update gap.
Basically like a generational garbage collector but in this case it's generational content collector, although arguably, still mostly garbage.
by keyle
3/27/2025 at 3:54:40 PM
Yeah there's definitely more that can be done in this regard. Right now I have fairly limited statistics on the rate of change of the domains. There's probably improvements to be made by making more informed choices based on that sort of data.by marginalia_nu