Evaluating agents based on reasoning models for process optimization

With agentic AI being essentially everywhere there is the need of understanding how good they really are for research purposes, and in particular for materials synthesis.

In this preprint on arxiv I explored how agents based on reasoning models perform in ALD process optimization tasks. The results are both impressive and somewhat mixed. One one hand, reasoning models were able to understand and execute elementary atomic layer deposition process optimization tasks. On the other, they sometime either fail or struggle to find the optimal conditions, which makes them borderline usable for real world application.

For this work I didn’t use a finetuned model, but a commercially available model. The motivation is that these models are widely used, and therefore there is a lot of value in understanding how they perform. The other more prosaic reason is that there are limitations to the type of models that we can access. This left out some good open weight models such as DeepSeek-R1.

For the test, I designed a few ideal and non-ideal self-limited processes that representative of the type of ALD processes a researcher is likely to encounter in the wild. For each of these processes, the task was very simple: find the optimal dose time for the precursor and co-reactant that leads to a saturated growth per cycle with a process time that is as low as possible. This is a very much an idealized version of the type of optimization that is carried out over and over again.

The work is currently under review. As soon as the paper is published I will release the code required to run these challenges. Hopefully someone can take on the challenge of designing performant agents based on open weights models that can be run locally.