Quick answer: Cache masks per-image (not per-sprite, never per-frame), broad-phase with rect collision first, and only run collide_mask on pairs that pass the rect test. 100 bullets vs 50 enemies goes from 15 FPS to 60.

You want pixel-perfect bullet-to-enemy collision in your bullet hell. You use pygame.sprite.groupcollide with collide_mask. It works for 20 bullets. At 200, FPS drops to single digits. Mask collision is pixel-accurate but expensive; without a broad phase, it’s O(N×M×pixels) and scales terribly.

Mask Cost

Each mask overlap walks every pixel in the overlap region. Two 64×64 sprites overlapping fully = 4096 pixel tests per pair. 200 bullets vs 50 enemies = 10,000 pairs. Before rect-filtering, that’s 40 million pixel tests per frame. You cannot hit 60 FPS at that rate.

Cache Masks

_mask_cache = {}

def get_mask(image):
    if image not in _mask_cache:
        _mask_cache[image] = pygame.mask.from_surface(image)
    return _mask_cache[image]

pygame.mask.from_surface is not free. Call it once per unique image asset and cache. All sprites sharing an image share a mask reference.

Broad-Phase First

def collide_rect_then_mask(a, b):
    if not a.rect.colliderect(b.rect):
        return False
    offset = (b.rect.x - a.rect.x, b.rect.y - a.rect.y)
    return a.mask.overlap(b.mask, offset) is not None

# Use as collided= callback
hits = pygame.sprite.groupcollide(bullets, enemies, True, False,
                                collided=collide_rect_then_mask)

The rect test rules out 90%+ of pairs instantly. Only pairs with overlapping rects pay the mask cost.

When You Don't Need Mask

If your sprites are rectangular (crates, tiles, most platformer characters), skip mask entirely. For circular sprites, collide_circle is faster. Use mask only when sprite shapes are genuinely irregular (bullets with trails, custom hitboxes, destructible geometry).

Spatial Partitioning

For 1000+ sprites, even rect-vs-rect becomes expensive. Add a uniform grid or quadtree. Pygame has no built-in, but a simple dict of (grid_x, grid_y) -> [sprites] turns collision from O(N²) to near O(N).

Verifying

Use pygame.time.Clock to measure per-frame collision time. With the optimization, total mask cost should be under 1 ms regardless of sprite count. Anything higher and the broad phase is broken.

“Pixel-perfect collision isn’t expensive — running it on every pair is. Filter cheaply, refine expensively.”

Related Issues

For broader Pygame performance, see Pygame performance tips. For simpler rect collision bugs, see Pygame sprite collision not detected between groups.

Print the cache hit rate for masks during dev. If you’re building new masks every frame, you’re shipping a 15 FPS game.