Quick answer: Pygame collide_mask 4x slower than collide_rect on a moderate-sized scene? Per-call mask creation is the cost - pre-create masks once and reuse.
Action game with 50 enemies. Mask collision adds 8ms per frame; rect adds 0.2ms.
Pre-create masks
self.mask = pygame.mask.from_surface(image)Once in init. collide_mask reuses; no per-call allocation.
Broad-phase with rect
collide_rect first; only run collide_mask on rect hits. 95% of pairs are rejected cheaply.
Or use Mask.overlap directly
collide_mask uses Mask.overlap underneath. Calling directly avoids sprite-handling overhead.
“Mask collision is fast if you don't build the mask each call.”
If your game has many small entities, broad-phase + narrow-phase is the standard architecture. Pygame doesn't enforce it; do it yourself.