Steerable filter

A common example of a steerable filter is the first derivative of a two-dimensional Gaussian function. This filter responds strongly to oriented image features like edges. It is constructed from two basis filters: the partial derivative of the Gaussian with respect to the horizontal direction ( $x$ ) and the vertical direction ( $y$ ).

If $G(x,y)$ is the Gaussian function, and $G_{x}$ and $G_{y}$ are its partial derivatives (which measure the rate of change in the $x$ and $y$ directions, respectively), a new filter $G_{\theta }$ oriented at an angle $\theta$ can be synthesized with the formula: $G_{\theta }=\cos(\theta )G_{x}+\sin(\theta )G_{y}$

Here, the basis filters $G_{x}$ and $G_{y}$ are weighted by $\cos(\theta )$ and $\sin(\theta )$ to "steer" the filter's sensitivity to the desired orientation. This is equivalent to taking the dot product of the direction vector $(\cos \theta ,\sin \theta )$ with the filter's gradient, $(G_{x},G_{y})$ .^[1]

The concept of steerability is foundational to equivariant neural networks, a class of models in deep learning designed to understand symmetries in data.^[5] A network is considered equivariant to a transformation (like a rotation) if transforming the input and then passing it through the network produces the same result as passing the input through the network first and then transforming the output. Formally, for a transformation $T$ and a network $f$ , this property is defined as $f(T({\text{input}}))=T(f({\text{input}}))$ .

This built-in understanding of geometry makes models more data-efficient. For example, a network equivariant to rotation does not need to be shown an object in multiple orientations to learn to recognize it; it inherently understands that a rotated object is still the same object. This leads to better generalization and performance, particularly in scientific applications.^[3]

Mathematical foundation

Equivariant neural networks use principles from group theory to create operations that respect geometric symmetries, such as the SO(3) group for 3D rotations or the E(3) group for rotations and translations.^[3]

Instead of learning standard filter kernels, these networks learn how to combine a fixed set of basis kernels. These basis functions are chosen so that they have well-defined behaviors under transformation groups.

Spherical harmonics are frequently used as basis functions because they form a complete set of functions that behave predictably under rotation, making them ideal for creating steerable 3D kernels.^[6]
Features within the network are treated as geometric tensors, which are mathematical objects (like scalars or vectors) that are "typed" by their behavior under transformations. These types correspond to the irreducible representations (irreps) of the group.^[3]
The tensor product is the fundamental operation used to combine these typed features in a way that preserves equivariance, guaranteeing that the network as a whole respects the desired symmetry.^[3]

Frameworks like e3nn simplify the construction of these networks by automating the complex mathematics of irreducible representations and tensor products.^[3]

Applications

Steerable and equivariant models are highly effective for problems with inherent geometric symmetries. Examples include:

Protein structure analysis: SE(3)-equivariant networks can process 3D molecular structures while respecting their rotational and translational symmetries.^[6]
3D Point cloud processing: Rotation-equivariant filters built from steerable spherical functions can perform tasks like 3D shape classification.^[7]
Computational chemistry: E(3)-equivariant graph neural networks are used to model interatomic potentials for molecular dynamics simulations, creating highly accurate and data-efficient models of physical systems.^[8]

Example

Generalization in deep learning: Equivariant neural networks

Mathematical foundation

Applications

References

Wikiwand - on