Understanding when to use geometry-first reconstruction versus appearance-first capture The...
How Gaussian Splatting Works: A Technical Foundation
From spherical harmonics to loss functions: understanding the science behind real-time neural 3D reconstruction.
The Evolution from NeRFs to Gaussian Splatting
Neural Radiance Fields (NeRFs), introduced by Mildenhall et al. in 2020, changed how we think about 3D reconstruction. Instead of modeling surfaces explicitly, NeRFs learn a continuous function that maps any 3D point and viewing direction to a color and volumetric density.
In practice, NeRFs don't reconstruct a mesh. They learn a volumetric field that assigns each point in space a density and a color value. By casting rays through this field and accumulating samples along each ray, the network learns to reproduce the appearance of the scene from any viewpoint. The result behaves as if the model had learned how light exists within the scene, though what it truly captures is how radiance varies with position and direction.
The advantage is clear: view-dependent appearance and photometric accuracy. However, NeRFs are computationally expensive. Each frame is rendered through volumetric ray marching, integrating hundreds of samples per pixel. Even fast variants like Instant-NGP reduce training time but still fall short of full real-time rendering for large, unbounded scenes.
NeRF proved that radiance, not just geometry, could be reconstructed. Yet for real-world use, the method remained too slow and too opaque to integrate into practical pipelines.
How NeRFs Generate New Views
NeRFs are multi-layer perceptrons trained to represent a specific scene. Training optimizes the network from known camera views; inference uses the trained MLP to render new ones.
Source: Mildenhall et al. (2020) - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
The rendering process follows four steps:
- Ray casting: For every pixel, the camera projects a ray through the scene
- Sampling: Points are taken along that ray (typically 64-128)
- Network query: Each point and direction pair is fed into the network to obtain color and density values
- Volume integration: Results are accumulated along the ray using standard volume-rendering equations to determine the pixel's final color

The Gaussian Splatting Breakthrough
To address NeRF's speed and scalability limits, Bernhard Kerbl et al. (2023) introduced 3D Gaussian Splatting, designed to make radiance fields both efficient and operationally accessible.
Their idea was deceptively simple: replace the implicit volumetric field of NeRF with an explicit, differentiable "cloud" of anisotropic 3D Gaussians. Each Gaussian carries a position, a covariance matrix (defining its shape and orientation in space), an opacity value, and a set of spherical harmonic coefficients describing color that varies with view direction. This explicit, mathematically defined shape is what makes the primitive easy to project and rasterize.
Because Gaussians are explicit primitives, the rendering stage no longer depends on volumetric ray marching. Each Gaussian can be rasterized directly onto the image plane, producing photorealistic frames at interactive rates, now measured in frames per second rather than seconds per frame.
Visually, it's quite astonishing. The output resembles a probabilistic point cloud that coalesces into reality when rasterized. These Gaussians overlap and blend according to their opacity, approximating light transport through alpha compositing. The system achieves over 90 frames per second at 1080p while maintaining competitive image quality against state-of-the-art NeRFs such as Mip-NeRF 360.
Spherical Harmonics: Encoding View-Dependent Color
Each Gaussian stores a compact set of spherical harmonic (SH) coefficients that describe how its color varies with view direction. At render time, these coefficients are evaluated against the camera's orientation to produce a view-dependent RGB value.
Originally used to encode complex lighting environments efficiently, SH here serve a different role: they define a directional color field around each splat. This allows non-Lambertian behavior such as specular reflection and anisotropic highlights, effects that static photogrammetric textures simply cannot reproduce.
The SH order determines angular detail:
- Order 0 gives a constant, diffuse color
- Higher orders (typically 3-4) resolve finer directional variation but increase memory and compute cost
Most practical Gaussian Splatting systems adopt third-order SH, balancing performance with the ability to model complex materials and subtle view-dependent lighting.

Spherical Harmonics Visualization: Higher orders = more memory & compute cost
Loss Functions and Training Objectives
The performance of the training process depends on the loss function. This is an objective measure that tells the model how far its current result is from the desired outcome. During training, the system compares its generated view of a scene with the real captured images and calculates this difference as "loss." The smaller the loss, the closer the reconstruction is to the input.
The process is cyclical: the network adjusts millions of parameters, measures the new loss, and keeps refining until the difference is minimized. In Gaussian Splatting, training typically combines an L1 loss for overall color accuracy with a Differentiable SSIM (D-SSIM) term. The D-SSIM prioritizes structural fidelity, preventing the model from blurring sharp details. Together, these loss functions determine how efficiently and accurately the model learns.
Challenges and Artifacts
Despite its promise, Gaussian Splatting is not a perfect window into reality. The representation assumes that the cumulative opacity along a ray sums neatly to one, which isn't always true in practice. When it doesn't, subtle "ghosting" or transparency artifacts can appear, especially during motion.
Highly specular or transparent materials also remain problematic. Because Gaussian Splatting approximates light transport through blending rather than full path tracing, refractive surfaces can exhibit blurred or incorrect reflections.
However, these are not fundamental limits; they're engineering challenges. Ongoing research in anti-aliasing, adaptive filtering, and hybrid rasterization-path tracing aims to close the gap between splats and physically-based rendering.
Where Research Is Heading
Two years after the original paper, Kerbl's follow-up "The Impact and Outlook of 3D Gaussian Splatting" surveyed the field's rapid expansion. The community has grown far beyond the original implementation, with entire branches focused on efficiency, dynamics, and theoretical grounding.
Efficiency and Portability: Compression and pruning methods now reduce memory footprints dramatically. Current 4D implementations (see below) require up to 13 million Gaussians for complex dynamic scenes, consuming gigabytes of storage. Research like MEGA (Memory-Efficient 4D Gaussian Splatting) demonstrates that comparable quality is achievable with under 1 million Gaussians, making mobile and web deployment increasingly practical. WebGPU-based viewers are already emerging for browser-based splat visualization.
4D Gaussian Splatting for Dynamic Scenes: The original method assumes static environments. 4D Gaussian Splatting lifts primitives into a four-dimensional representation where time is treated as an explicit dimension alongside x, y, and z. Instead of maintaining independent Gaussian clouds per frame, these methods model a single dynamic field whose Gaussians deform over time, capturing nonrigid motion without duplicating millions of primitives across frames.
Early 4D implementations achieved 82 FPS on commodity GPUs. Recent optimizations push past 1000 FPS by identifying and eliminating temporal redundancy: short-lifespan Gaussians that represent transient content, and inactive Gaussians that don't contribute to any given frame. Hybrid approaches use 3D Gaussians for static regions and reserve full 4D representation only for dynamic elements.
The applications are substantial: volumetric video, digital human capture, telepresence, and time-varying digital twins become practical when dynamic scenes render at interactive rates.

Performance comparison: NeRF, Instant-NGP, 3D Gaussian Splatting, and 4D Gaussian Splatting rendering speeds
Human Reconstruction: A dedicated branch of research applies Gaussian Splatting to animatable human avatars. Recent methods achieve up to 361 FPS for human rendering, fast enough for real-time VR applications. Diffusion-based approaches now synthesize multi-view images from single inputs, then reconstruct 3D Gaussians with geometric constraints, reducing the need for dense camera arrays.
Mathematical Refinement: Researchers continue dissecting the assumptions of the original model, tackling aliasing, projection distortion, and color accumulation errors. These theoretical advances translate to practical improvements in rendering stability and gradient propagation during training.
Together, these directions have transformed Gaussian Splatting from a single paper into a new family of methods: dynamic, hierarchical, and efficient.
The Key Distinction
Photogrammetry gave us the geometry of the world. Neural Radiance Fields gave us its light. Gaussian Splatting gives us both, efficiently enough to explore in real time.
Gaussian Splatting doesn't replace photogrammetry outright; it extends it. SfM-derived camera poses remain the backbone of most pipelines. What changes is how we represent the result: from a mesh that stores shape and texture, to a field that stores radiance and appearance.
In that sense, Gaussian Splatting is not just another rendering technique. It's a shift in representation, from surfaces to probabilities, from geometry to perception. And in that shift, we may be witnessing the next logical step in how digital twins evolve: not as static reconstructions, but as living, view-dependent realities.
How 4D Pipeline Can Help
Understanding these technical foundations is essential for evaluating where Gaussian Splatting fits in your production pipeline. At 4D Pipeline, we help teams navigate the complexity of emerging 3D technologies, from feasibility studies through production deployment.
Whether you're evaluating neural reconstruction for digital twins, immersive visualization, or real-time applications, we combine deep technical knowledge with hands-on implementation experience to help you make informed decisions.
You can see examples of our work in our project portfolio.
Let's Connect:
- Sign up for our Newsletter
- Already have a project in mind? Click here to schedule a consultation
References
- Mildenhall, B., et al. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv:2003.08934
- Kerbl, B., et al. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. arXiv:2308.04079
- Kerbl, B. (2025). The Impact and Outlook of 3D Gaussian Splatting. arXiv:2510.26694
- Wu, G., et al. (2024). 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering. CVPR 2024.
- Zhang, X., et al. (2025). MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes. ICCV 2025.
- Yuan, Y., et al. (2025). 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering. arXiv:2503.16422
- Oh, S., et al. (2025). Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation. arXiv:2505.13215
- Barron, J. T., et al. (2022). Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. arXiv:2111.12077
- Müller, T., et al. (2022). Instant Neural Graphics Primitives. arXiv:2201.05989
- Green, R. (2003). Spherical Harmonic Lighting: The Gritty Details. GDC 2003.