Wednesday, January 7, 2026

Reinforcement learning Speeds Optical AI Learning

Optical devices can now learn directly from real experiments, solving tasks, fixing errors and recognizing patterns without needing simulations or detailed models.

Optical computing offers fast and energy-efficient data processing. Diffractive optical networks use passive phase masks and light propagation to perform parallel computation. However, models trained in simulations often underperform in real setups due to misalignment, noise, and modelling errors that are hard to predict.

- Advertisement -

Researchers from the University of California, Los Angeles (UCLA) present a model-free, in-situ training method for diffractive optical processors using proximal policy optimization (PPO). Instead of depending on simulated models, the system learns directly from real optical measurements and updates the diffractive elements on the hardware itself.

Instead of trying to precisely model complex optical behavior, the approach allows the device to learn directly from experiments and experience. Proximal Policy Optimization (PPO) enables this in-situ learning to be fast, stable, and scalable under real experimental conditions.

To show that PPO can train an optical processor without prior knowledge of the system’s physics, extensive experiments were carried out across multiple optical tasks. In one example, the system learned to focus optical energy through a random, unknown diffuser. It achieved this faster than standard policy-gradient methods, showing efficient exploration of the optical parameter space.

- Advertisement -

The same method was also used for hologram generation and aberration correction. In another experiment, a diffractive optical processor was trained directly on hardware to classify handwritten digits using only optical measurements. As training progressed, the output patterns became clearer and more distinct for each digit, leading to correct classification without any digital post-processing.

PPO reuses measured data across multiple update steps while limiting abrupt changes in the control policy. This reduces the number of experimental samples needed and avoids unstable training, making it well suited for noisy optical systems. The method is not limited to diffractive optics and can be extended to other physical systems that provide feedback and allow real-time adjustment.

The work points toward intelligent physical systems that can learn, adapt, and perform computation without relying on detailed physical models. The same approach could be applied to photonic accelerators, nanophotonic processors, adaptive imaging systems, and real-time optical AI hardware.

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a Senior Technology Journalist at EFY with a deep interest in embedded systems, development boards and IoT cloud solutions.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics

×