For a given model (Gemma-3-1b-it, gemma-3-27b-It, Gemini-2.5-Flash-Lite, Gemini-2.5-Flash and Gemini-3.1-Flash-Lite) and a given benchmarking user prompt (basic, intermediate, advanced), the number of tries until the first successful syntacticalAI workflow execution is recorded. The light-gray numbers and gray numbers indicate results from the first attempt and the second attempt, respectively. An “x” indicates more than ten tries without success
Table comparing models across Basic, Intermediate, and Advanced levels.