Achuta Kadambi is an Assistant Professor at UCLA where he directs The Visual Machines Group. He is also one of the founding faculty at Univ.AI.
Powered by Artificial Intelligence, the Visual Machines Group at UCLA creates imaging systems that can see the unseen. In simple words, they work on Computer Vision. However, they follow a unique approach which is a special symbiosis between the camera sensor and a deep learning model. This implies that they combine the physical model of the process, with some sort of data-driven model of inference.
Let's start with what it means to be an Artificial Physicist. We see a person tossing a ball. What we see below are the first three frames of the video. So, given the first three frames, can we predict the future of the shot: whether it will go long or short? Or if it will be a bullseye?
Ideally, we can make these predictions using high-school physics. We have the data about the previous three frames. We have two unknown variables and we can get the physical estimate of the throw by predicting the next few frames, till we reach the frame of our interest. But we also know that, in the practical world, there's a mismatch between physical prediction and actual projection. This is attributed to various factors including, but not limited, to air resistance, and the spin on the ball, among others.
So, we appeal to the engine of our generation, the ResNet. We create a training dataset and feed it into a pattern recognition algorithm like ResNet.
So your input might be the initial trajectory of three frames and the output in the training phase would be labels of where that ball would next be, and the pixel coordinates similar to the standard computer vision approach. But we know that there is a mismatch between the actual trajectory and the ideal trajectory that is being predicted by the ResNet model.
This isn’t the flaw of ResNet alone.
To be fair, we gave the model terrible training data of one sort, ideal type of toss. As a result, the model overfits according to this training data, without adjusting to the new types of trajectories that we might see in the real world.
What this hints at is that there is a new form of playbook called the Scholarly Playbook and that's to work at the seamline of physics and neural networks. So if we combine our physical understanding with networks, we might end up with a result that looks very similar to actual real-world results.
This playbook doesn’t apply only to tossing balls alone. It can be applied to a lot of different problems, including imaging sciences.
We can form a plot to look at how good the data is, versus how good the physics is.
From this, we can infer that if we have a lot of data, and we don't know anything about physics, then sure, we might want to go use ResNet. On the other hand, if we have no data but have a very good physical model, then we can directly apply the physical solutions. The truly interesting engineering problems lie in that midsection region, that middle region where we have some partial data and some partial understanding of physics. That's where physics-based learning comes into play, to work at the same level.
However, the answer is not so simple that you can just blend physics and learning. Unfortunately, neural networks are inherently unstructured. There's no plug-and-play package that says, “go put this equation in and a neural network will try to solve this output”. There are many ways to overcome this hurdle and the Visual Machines Group is looking at quite a number of networks - like residual learning, multi-stream inputs, constrained networks to constrain the manifold - that the neural network searches to find physically realizable solutions, student-teacher models, and so on.
Light is also a very difficult problem to analyze. When we look at the plot below, and look at the goodness of physics on the Y axis, we see that a lot of the models we have for how light behaves, actually misbehave and are not terribly accurate.
What this means for us is that we're playing in a very dangerous regime where our physics is truly partial physics and not like a simple kinematics problem, where the physics is fairly good.
This is where machine learning can have a lot more impact, if done correctly, and works on very difficult physical problems. This has been an active field of work for various researchers, with significant progress being made in recent years.
Dr.Kadambi wishes that more interested students join the field!
In the upcoming blog, we will discuss this further in detail and talk about a new approach to 3D imaging, that has been published at the ICCV conference and is also the foundation for Akasha Imaging, a fully funded startup in the Bay Area.