Quick answer: Pygame collide_mask 4x slower than collide_rect on a moderate-sized scene? Per-call mask creation is the cost - pre-create masks once and reuse.

Action game with 50 enemies. Mask collision adds 8ms per frame; rect adds 0.2ms.

Pre-create masks

self.mask = pygame.mask.from_surface(image)

Once in init. collide_mask reuses; no per-call allocation.

Broad-phase with rect

collide_rect first; only run collide_mask on rect hits. 95% of pairs are rejected cheaply.

Or use Mask.overlap directly

collide_mask uses Mask.overlap underneath. Calling directly avoids sprite-handling overhead.

“Mask collision is fast if you don't build the mask each call.”

If your game has many small entities, broad-phase + narrow-phase is the standard architecture. Pygame doesn't enforce it; do it yourself.

Related reading