For a given model (Gemma-3-27b-It, Gemini-2.5-Flash and Gemini-3.1-Flash-Lite) and a given benchmarking user prompt (basic, intermediate, advanced) and given ten successful syntacticalAI workflow executions, the relative number of (a) valid geometrical configurations and (b) given (a), there is a valid postprocessing routine execution. The light-gray numbers and gray numbers indicate results from the first attempt and the second attempt, respectively
A comparison table presents basic, intermediate, and advanced performance results for Gemma 3 27 b I t, Gemini 2.5 Flash, and Gemini 3.1 Flash Lite.