Reinforcement learning-driven decision support for target-oriented branch pruning on urban trees Open Access

https://www.fs.usda.gov/research/treesearch/12602

Bedker

O’Brien

and

Mielke

(

2012

), “

How to prune trees

”,

USDA, Forest Service, State and Private Forestry, Northeastern Area, 11 Campus Blvd., Ste 200 Newtown Square, PA 19073, FR-01-95

available at:

https://doi.org/10.1109/ACCESS.2024.3433381

Bittencourt

J.C.N.

Costa

D.G.

Portugal

and

Vasques

(

2024

), “

A survey on adaptive smart urban systems

”,

IEEE Access

, Vol.

, pp.

102826

102850

, doi:

https://doi.org/10.1038/s44287-024-00116-8

Brandt

Chave

Fensholt

Ciais

Wigneron

J.-P.

Gieseke

Saatchi

Tucker

C.J.

and

Igel

(

2025

), “

High-resolution sensors and deep learning models for tree resource monitoring

”,

Nature Reviews Electrical Engineering

, Vol.

No.

, pp.

, doi:

https://doi.org/10.1016/j.agrformet.2022.109282

Chau

W.Y.

Wang

Y.-H.

Chiu

S.W.

Tan

P.S.

Leung

M.L.

Lui

H.L.

Lau

Y.M.

Liu

K.-F.

and

Hau

B.C.H.

(

2023

), “

Monitoring of tree tilt motion using lorawan-based wireless tree sensing system (IoTT) during super typhoon Mangkhut

”,

Agricultural and Forest Meteorology

, Vol.

329

, 109282, doi:

https://doi.org/10.48044/jauf.2010.015

Clark

and

Matheny

(

2010

), “

The research foundation to tree pruning: a review of the literature

”,

Arboriculture and Urban Forestry

, Vol.

No.

, pp.

110

120

, doi:

Dan

Yong

and

CaiRong

(

2012

), “

A review of TLS application in forest parameters retrieving

”,

World Forestry Research

, Vol.

No.

, pp.

https://doi.org/10.3390/f13050641

Dervishi

Poschenrieder

Rötzer

Moser-Reischl

and

Pretzsch

(

2022

), “

Effects of climate and drought on stem diameter growth of urban tree species

”,

Forests

, Vol.

No.

, p.

641

, doi:

https://doi.org/10.1109/ICMLA.2017.0-184

Diallo

E.A.O.

Sugiyama

and

Sugawara

(

2017

), “

Learning to coordinate with deep reinforcement learning in doubles pong game

”,

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)

, pp.

, doi:

https://doi.org/10.1016/j.landurbplan.2023.104849

Dowtin

A.L.

Cregg

B.C.

Nowak

D.J.

and

Levia

D.F.

(

2023

), “

Towards optimized runoff reduction by urban tree cover: a review of key physical tree traits, site conditions, and management strategies

”,

Landscape and Urban Planning

, Vol.

239

, 104849, doi:

https://doi.org/10.3390/rs11182074

Lindenbergh

Ledoux

Stoter

and

Nan

(

2019

), “

AdTree: accurate, detailed, and automatic modelling of laser-scanned trees

”,

Remote Sensing

, Vol.

No.

, doi:

https://doi.org/10.48550/arXiv.2412.19437

DeepSeek-AI

Liu

Feng

Xue

Wang

Zhao

Deng

Zhang

Ruan

Dai

(

2024

DeepSeek-V3 Technical Report (No. arXiv:2412.19437)

, arXiv, doi:

https://doi.org/10.3390/rs12183089

Fan

Nan

Dong

and

Chen

(

2020

), “

AdQSM: a new method for estimating above-ground biomass from tls point clouds

”,

Remote Sensing

, Vol.

No.

, doi:

https://doi.org/10.1016/j.ufug.2023.128115

Francis

Disney

and

Law

(

2023

), “

Monitoring canopy quality and improving equitable outcomes of urban tree planting using LiDAR and machine learning

”,

Urban Forestry and Urban Greening

, Vol.

, 128115, doi:

https://doi.org/10.1109/WSC.2016.7822130

M.C.

(

2016

), “

AlphaGo and monte carlo tree search: the simulation optimization perspective

”,

2016 Winter Simulation Conference (WSC), 11-14 December 2016 at Washington, DC, IEEE

, pp.

659

670

, doi:

https://doi.org/10.1016/j.agrformet.2020.108288

Grylls

and

Van Reeuwijk

(

2021

), “

Tree model with drag, transpiration, shading and deposition: identification of cooling regimes and large-eddy simulation

”,

Agricultural and Forest Meteorology

, pp.

298

299

, doi:

https://doi.org/10.1016/j.jedc.2010.10.007

Helmes

K.L.

and

Stockbridge

R.H.

(

2011

), “

Thinning and harvesting in stochastic forest models

”,

Journal of Economic Dynamics and Control

, Vol.

No.

, pp.

, doi:

https://doi.org/10.1071/FP08052

Hemmerling

Kniemeyer

Lanwert

Kurth

and

Buck-Sorlin

(

2008

), “

The rule-based language XL and the modelling environment GroIMP illustrated with simulated tree competition

”,

Functional Plant Biology

, Vol.

No.

, pp.

739

750

, doi:

https://doi.org/10.1080/02827589809382966

Holmgren

and

Thuresson

(

1998

), “

Satellite remote sensing for forestry planning—a review

”,

Scandinavian Journal of Forest Research

, Vol.

Nos

1-4

, pp.

110

, doi:

https://doi.org/10.21273/HORTTECH.25.2.238

Kohek

Š.

Guid

Tojnko

Unuk

and

Kolmanič

(

2015

), “

EduAPPLE: interactive teaching tool for apple tree crown formation

”,

HortTechnology

, Vol.

No.

, pp.

238

246

, doi:

https://doi.org/10.1007/s10546-013-9883-1

Krayenhoff

E.S.

Christen

Martilli

and

Oke

T.R.

(

2014

), “

A multi-layer radiation model for urban neighbourhoods with trees

”,

Boundary-Layer Meteorology

, Vol.

151

No.

, pp.

139

178

, doi:

https://proceedings.mlr.press/v80/lee18b.html

Lee

Kim

S.-A.

Choi

and

Lee

S.-W.

(

2018

), “

Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling

”,

Proceedings of the 35th International Conference on Machine Learning

, pp.

2937

2946

available at:

https://doi.org/10.1145/3474085.3475314

Dai

Shao

and

Ding

(

2021

), “

From voxel to point: IoU-guided 3D object detection for point cloud with voxel-to-point decoder

”,

Proceedings of the 29th ACM International Conference on Multimedia

, pp.

4622

4631

, doi:

https://doi.org/10.1016/j.cirpj.2022.11.003

Zheng

Yin

Wang

and

Wang

(

2023

), “

Deep reinforcement learning in smart manufacturing: a review and prospects

”,

CIRP Journal of Manufacturing Science and Technology

, Vol.

, pp.

101

, doi:

https://doi.org/10.48550/arXiv.1509.02971

Lillicrap

T.P.

Hunt

J.J.

Pritzel

Heess

Erez

Tassa

Silver

and

Wierstra

(

2019

Continuous Control with Deep Reinforcement Learning

(No. arXiv:1509.02971), arXiv

, doi:

https://doi.org/10.1093/aob/mcaa143

Louarn

and

Song

(

2020

), “

Two decades of functional–structural plant modelling: now addressing fundamental questions in systems biology and predictive ecology

”,

Annals of Botany

, Vol.

126

No.

, pp.

501

509

, doi:

https://doi.org/10.14627/537752030

Ludwig

Hensel

Rötzer

Ahmeti

Chen

Erdal

H.I.

Reischel

Shu

Tyc

J.M.

and

Yazdi

(

2024

), “

Digital workflow for novel urban green system design derived from a historical role model

”,

Journal of Digital Landscape Architecture

, Vol.

2024

No.

, pp.

333

345

, doi:

https://doi.org/10.48550/arXiv.1312.5602

Mnih

Kavukcuoglu

Silver

Graves

Antonoglou

Wierstra

and

Riedmiller

(

2013

Playing Atari with Deep Reinforcement Learning

(No. arXiv:1312.5602), arXiv

, doi:

https://doi.org/10.1038/nature14236

Mnih

Kavukcuoglu

Silver

Rusu

A.A.

Veness

Bellemare

M.G.

Graves

Riedmiller

Fidjeland

A.K.

Ostrovski

Petersen

Beattie

Sadik

Antonoglou

King

Kumaran

Wierstra

Legg

and

Hassabis

(

2015

), “

Human-level control through deep reinforcement learning

”,

Nature

, Vol.

518

No.

7540

, pp.

529

533

, doi:

https://doi.org/10.1016/j.scs.2019.101770

Nitoslawski

S.A.

Galle

N.J.

Van Den Bosch

C.K.

and

Steenberg

J.W.N.

(

2019

), “

Smarter ecosystems for smarter cities? A review of trends, technologies, and turning points for smart urban forestry

”,

Sustainable Cities and Society

, Vol.

, 101770, doi:

https://doi.org/10.1016/j.ufug.2021.127391

Oshio

Kiyono

and

Asawa

(

2021

), “

Numerical simulation of the nocturnal cooling effect of urban trees considering the leaf area density distribution

”,

Urban Forestry and Urban Greening

, Vol.

, 127391, doi:

https://doi.org/10.26868/25222708.2019.210698

Palme

La Rosa

Privitera

and

Chiesa

(

2019

), “Evaluating the potential energy savings of an urban green infrastructure through environmental simulation”,

Proceedings of the 16th IBPSA Conference

Rome, Italy, Sept 2-4, 2019

, pp.

3524

3530

, doi:

Pan

and

Jakubiec

(

2022

), “Simulating the impact of deciduous trees on energy, daylight, and visual comfort: impact analysis and a practical framework for implementation”,

Proceedings of eSim Building Simulation Conference 2022: 12th Conference of IBPSA-Canada

June 22-23, 2022 at Ottawa, Canada

https://doi.org/10.1109/TII.2017.2780060

Popović

N.D.

Popović

D.S.

and

Seskar

(

2018

), “

A novel cloud-based advanced distribution management system solution

”,

IEEE Transactions on Industrial Informatics

, Vol.

No.

, pp.

3469

3476

IEEE Transactions on Industrial Informatics

, doi:

https://doi.org/10.48550/arXiv.2003.04664

Portelas

Colas

Weng

Hofmann

and

Oudeyer

P.-Y.

(

2020

Automatic Curriculum Learning for Deep RL: A Short Survey

(No. arXiv:2003.04664), arXiv

, doi:

https://doi.org/10.1007/978-1-4613-8476-2

Prusinkiewicz

and

Lindenmayer

(

1996

The Algorithmic Beauty of Plants

Springer

New York, NY

, doi:

Puterman

M.L.

(

2014

Markov Decision Processes: Discrete Stochastic Dynamic Programming

John Wiley & Sons

Hoboken, NJ

, ISBN:

978-1-118-62587-3

https://doi.org/10.1016/j.buildenv.2019.106606

Rahman

M.A.

Stratopoulos

L.M.F.

Moser-Reischl

Zölch

Häberle

K.-H.

Rötzer

Pretzsch

and

Pauleit

(

2020

), “

Traits of trees for cooling urban heat islands: a meta-analysis

”,

Building and Environment

, Vol.

170

, 106606, doi:

https://doi.org/10.1016/j.ufug.2023.127868

Rambhia

Volk

Rismanchi

Winter

and

Schultmann

(

2023

), “

Supporting decision-makers in estimating irrigation demand for urban street trees

”,

Urban Forestry and Urban Greening

, Vol.

, 127868, doi:

https://doi.org/10.1109/ICOASE.2018.8548937

Rashid

Z.N.

Zebari

S.R.M.

Sharif

K.H.

and

Jacksi

(

2018

), “

Distributed cloud computing and distributed parallel computing: a review

”,

2018 International Conference on Advanced Science and Engineering (ICOASE)

, pp.

167

172

, doi:

https://doi.org/10.3390/rs5020491

Raumonen

Kaasalainen

Åkerblom

Kaasalainen

Kaartinen

Vastaranta

Holopainen

Disney

and

Lewis

(

2013

), “

Fast automatic precision tree models from terrestrial laser scanner data

”,

Remote Sensing

, Vol.

No.

, doi:

https://github.com/QiguanShu/Branch-Pruning-Game-on-Urban-Trees

Shu

and

Boey

K.Z.

(

2024

), “

QiguanShu/Branch-pruning-game-on-urban-trees: this is one of the research project named Urban Green System 4.0 funded by DFG-DACH (LU2505/2-1). Please find more details in our publication

”,

GitHub

available at:

https://doi.org/10.3390/f13111955

Shu

Rötzer

Detter

and

Ludwig

(

2022

), “

Tree information modeling: a data exchange platform for tree design and management

”,

Forests

, Vol.

No.

, doi:

https://doi.org/10.2139/ssrn.4855810

Shu

Rötzer

Yazdi

Moser-Reischl

and

Ludwig

(

2024a

Can Leaf Area Density Be Estimated from Quantitative Structure Models of Trees?

(SSRN Scholarly Paper No. 4855810)

, doi:

https://doi.org/10.3389/fpls.2024.1297390

Shu

Yazdi

Rötzer

and

Ludwig

(

2024b

), “

Predicting resprouting of Platanus × hispanica following branch pruning by means of machine learning

”,

Frontiers in Plant Science

, Vol.

, 1297390, doi:

https://doi.org/10.23919/SpliTech49282.2020.9243756

Silva

Cardoso

Barros

Ribeiro

Carvalho

and

Rito Lima

(

2020

), “

A flexible system for optimising green spaces irrigation

”,

2020 5th International Conference on Smart and Sustainable Technologies (SpliTech)

, pp.

, doi:

https://doi.org/10.1038/nature16961

Silver

Huang

Maddison

C.J.

Guez

Sifre

van den Driessche

Schrittwieser

Antonoglou

Panneershelvam

Lanctot

Dieleman

Grewe

Nham

Kalchbrenner

Sutskever

Lillicrap

Leach

Kavukcuoglu

Graepel

and

Hassabis

(

2016

), “

Mastering the game of go with deep neural networks and tree search

”,

Nature

, Vol.

529

No.

7587

, pp.

484

489

, doi:

https://doi.org/10.1016/j.ufug.2022.127810

Speak

A.F.

and

Salbitano

(

2023

), “

The impact of pruning and mortality on urban tree canopy volume

”,

Urban Forestry and Urban Greening

, Vol.

, 127810, doi:

https://doi.org/10.48550/arXiv.1708.04782

Vinyals

Ewalds

Bartunov

Georgiev

Vezhnevets

A.S.

Yeo

Makhzani

Küttler

Agapiou

Schrittwieser

Quan

(

2017

StarCraft II: A New Challenge for Reinforcement Learning (No. arXiv:1708.04782), arXiv

, doi:

Viswanadhapalli

J.K.

Elumalai

V.K.S.S.

Shah

and

Mahajan

(

2024

), “

Deep reinforcement learning with reward shaping for tracking control and vibration suppression of flexible link manipulator

”,

Applied Soft Computing

, Vol.

152

, 110756, doi:

https://doi.org/10.1016/j.asoc.2023.110756

https://doi.org/10.1016/j.envpol.2012.10.021

Vos

P.E.J.

Maiheu

Vankerkom

and

Janssen

(

2013

), “

Improving local air quality in cities: to tree or not to tree?

”,

Environmental Pollution

, Vol.

183

, pp.

113

122

, doi:

https://doi.org/10.48550/arXiv.1511.06581

Wang

Schaul

Hessel

van Hasselt

Lanctot

and

de Freitas

(

2016

Dueling Network Architectures for Deep Reinforcement Learning

(No. arXiv:1511.06581), arXiv

, doi:

https://doi.org/10.1109/TNNLS.2022.3207346

Wang

Liang

Zhao

Huang

Dai

and

Miao

(

2024

), “

Deep reinforcement learning: a survey

”,

IEEE Transactions on Neural Networks and Learning Systems

, Vol.

No.

, pp.

5064

5078

, doi:

https://doi.org/10.1007/BF00992698

Watkins

C.J.C.H.

and

Dayan

(

1992

), “

Q-learning

”,

Machine Learning

, Vol.

No.

, pp.

279

292

, doi:

https://doi.org/10.1038/s43017-020-00129-5

Wong

N.H.

Tan

C.L.

Kolokotsa

D.D.

and

Takebayashi

(

2021

), “

Greenery as a mitigation and adaptation strategy to urban heat

”,

Nature Reviews Earth and Environment

, Vol.

No.

, pp.

166

181

, doi:

https://doi.org/10.48550/arXiv.2409.15315

(

2024

An Efficient Recommendation Model Based on Knowledge Graph Attention-Assisted Network (KGATAX)

(No. arXiv:2409.15315), arXiv

, doi:

https://doi.org/10.48550/arXiv.1810.06394

Xiong

Wang

Yang

Sun

Han

Zheng

Zhang

Liu

and

Liu

(

2018

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

(No. arXiv:1810.06394), arXiv

, doi:

https://doi.org/10.14627/537740019

Yazdi

Shu

and

Ludwig

(

2023

), “

A target-driven tree planting and maintenance approach for next generation urban green infrastructure (UGI)

”,

JoDLA – Journal of Digital Landscape Architecture

, Vols

8-2023

, p.

178

, doi:

https://doi.org/10.1038/s41597-023-02873-x

Yazdi

Shu

Rötzer

Petzold

and

Ludwig

(

2024

), “

A multilayered urban tree dataset of point clouds, quantitative structure and graph models

”,

Scientific Data

, Vol.

No.

, 1, doi:

https://doi.org/10.1109/JIOT.2021.3078462

Qin

Zhang

Shen

Jiang

and

Guan

(

2021

), “

A review of deep reinforcement learning for smart building energy management

”,

IEEE Internet of Things Journal

, Vol.

No.

, pp.

12046

12063

, doi:

Appendix

1. Definitions for the action space

2. Detailed Result Data

2025

Qiguan Shu, Kai Zhe Boey and Ferdinand Ludwig

Figure 1

The four panels labeled “(a)”, “(b)”, “(c)”, and “(d)” are arranged in a two-by-two layout. The charts summarize reinforcement learning episode statistics, including played rounds, computation time, action frequencies, and parameter usage. Panel (a) is titled “Played Rounds per Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 60 in increments of 5. The vertical axis is labeled “Total Round” and ranges from 0 to 20 in increments of 2. Blue vertical bars represent the number of rounds played in each episode. Early episodes show low values between 1 and 8 rounds. From episode 6 onward, many episodes reach between 15 and 20 rounds. Several peaks reach the maximum value of 20 rounds, including around episodes 7, 9, 25, 31, 35, 36, 39, 40, 43, 45, 46, 47, 49, and 52. Lower values appear intermittently near episodes 23, 28, 30, 44, 53, 55, 56, and 58. Panel (b) is titled “Time Consumption by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 60 in increments of 5. The vertical axis is labeled “Total Time Taken (seconds)” and ranges from 0 to 16000 in increments of 2000. Blue vertical bars represent computation time for each episode. Early episodes mostly remain below 1000 seconds. Larger spikes begin after episode 30. Major peaks occur near episode 31 at about 6400 seconds, episode 35 at about 5600 seconds, episode 40 at about 5200 seconds, episode 44 at 15200 seconds, which is the highest value, episode 46 at about 7800 seconds, episode 47 at about 5200 seconds, episode 51 at about 10100 seconds, episode 52 at about 8800 seconds, and episode 55 at about 4100 seconds. Most remaining episodes remain below 3000 seconds. Panel (c) is titled “Frequency of the Chosen Actions by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 60 in increments of 5. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 16 in increments of 2. Multiple colored line graphs represent frequencies of actions labeled “Action 0” through “Action 8”. A vertical black line near episode 8 separates the regions labeled “Random Actions” on the left and “Chosen by the P-D Q N Network” on the right. “Action 0”, shown in blue, becomes the dominant action after episode 10 and frequently ranges between 4 and 13 occurrences, with peaks near episodes 38 and 46. “Action 1”, shown in orange, shows several high peaks between episodes 10 and 18, including a maximum near 16 around episode 16, but decreases afterward. “Action 3”, shown in red, fluctuates mostly between 1 and 5 with a large spike near episode 51 reaching 11. The remaining actions, including Actions 2, 4, 5, 6, 7, and 8, generally remain below 5 occurrences across most episodes. Panel (d) is titled “Used Action Parameters”. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 50 in increments of 10. Overlapping histograms display parameter usage frequencies for “Action 0”, “Action 1”, and “Action 3”. Blue bars represent “Action 0”, Yellow bars represent “Action 1”, and red bars represent “Action 3”. The horizontal axis contains two parameter scales. The lower scale labeled “Parameters for Action 0” ranges from 0 to 0.05 in increments of 0.01. The upper scale labeled “Parameters for Action 1 and 3” ranges from 0 to 10 in increments of 1. For “Action 0”, the highest frequency occurs near parameter value 0 with 50 occurrences. Frequencies decrease gradually as parameter values approach 0.05. For “Action 1”, the highest frequency occurs near parameter value 10 with 43 occurrences. For “Action 3”, frequencies are distributed more broadly across the range from 0 to 10, with larger concentrations near parameter values 0 and 10. Note: All numerical data values are approximated.

Scenarios of a possible street tree case

Figure 1

Scenarios of a possible street tree case

Figure 2

Four charts show played rounds, time consumption, action frequencies, and used different parameters across training episodes.

The four panels labeled “(a)”, “(b)”, “(c)”, and “(d)” are arranged in a two-by-two layout. The charts summarize reinforcement learning episode statistics, including played rounds, computation time, action frequencies, and parameter usage. Panel (a) is titled “Played Rounds per Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Round” and ranges from 0 to 30 in increments of 2. Blue vertical bars represent the number of rounds played in each episode. Episodes before about 170 mostly remain below 12 rounds, with a peak near 27 for episode 90. After about episode 180, many episodes rapidly increase and frequently reach between 20 and 30 rounds. From about episode 300 onward, most bars remain near the maximum value of 30 rounds with only occasional drops below 20. Panel (b) is titled “Time Consumption by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Time Taken (seconds)” and ranges from 0 to 10. Early episodes before about 170 remain close to 0 seconds. Between episodes 180 and 300, the values fluctuate widely between about 1 and 10 seconds. After about episode 300, most episodes stabilize between 8 and 10 seconds with occasional decreases below 5 seconds. Several peaks slightly exceed 10 seconds near episodes 400, 620, and 820. Panel (c) is titled “Frequency of the Chosen Actions by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 30 in increments of 2. Multiple colored line graphs represent frequencies of actions labeled “Action 0” through “Action 8”. A vertical black line near episode 180 separates the regions labeled “Random Actions” on the left and “Chosen by the P-D Q N Network” on the right. “Action 0”, shown in blue, becomes the dominant action after episode 250 and frequently reaches values between 20 and 30 occurrences, with many peaks at the maximum value of 30. “Action 3”, shown in red, shows strong activity between episodes 200 and 260 with peaks between 18 and 24 before decreasing sharply afterward. “Action 7”, shown in gray, becomes prominent between episodes 240 and 300 with frequencies reaching about 22. “Action 8”, shown in yellow-green, briefly rises near episode 190 with frequencies around 15. The remaining actions mostly remain below 5 occurrences across most episodes. Panel (d) is titled “Used Action Parameters”. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 8000 in increments of 2000. Overlapping histograms display parameter usage frequencies for “Action 0”, “Action 3”, and “Action 7”. Blue bars represent “Action 0”, red bars represent “Action 3”, and gray bars represent “Action 7”. The horizontal axis contains two parameter scales. The lower scale labeled “Parameters for Action 0” ranges from 0 to 0.5 in increments of 0.1. The upper scale labeled “Parameters for Action 3 and 7” ranges from 0 to 10 in increments of 1. For “Action 0”, the highest frequency occurs near parameter value 0 with 8000 occurrences. Frequencies near parameter value 0.1 are about 3000, and near parameter value 0.5 are about 3500. For “Action 3”, the highest frequencies occur near parameter values between 0 and 1, with bars reaching about 600 occurrences near 0 and about 300 near 1. Frequencies decrease sharply beyond parameter value 2. For “Action 7”, the largest frequencies also occur near parameter values between 0 and 1, with bars near 0 reaching about 500 occurrences. Very few occurrences appear beyond parameter value 2. Note: All numerical data values are approximated.

Overview of the workflow proposed in this study

Figure 2

The four panels labeled “(a)”, “(b)”, “(c)”, and “(d)” are arranged in a two-by-two layout. The charts summarize reinforcement learning episode statistics, including played rounds, computation time, action frequencies, and parameter usage. Panel (a) is titled “Played Rounds per Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Round” and ranges from 0 to 30 in increments of 2. Blue vertical bars represent the number of rounds played in each episode. Episodes before about 170 mostly remain below 12 rounds, with a peak near 27 for episode 90. After about episode 180, many episodes rapidly increase and frequently reach between 20 and 30 rounds. From about episode 300 onward, most bars remain near the maximum value of 30 rounds with only occasional drops below 20. Panel (b) is titled “Time Consumption by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Time Taken (seconds)” and ranges from 0 to 10. Early episodes before about 170 remain close to 0 seconds. Between episodes 180 and 300, the values fluctuate widely between about 1 and 10 seconds. After about episode 300, most episodes stabilize between 8 and 10 seconds with occasional decreases below 5 seconds. Several peaks slightly exceed 10 seconds near episodes 400, 620, and 820. Panel (c) is titled “Frequency of the Chosen Actions by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 30 in increments of 2. Multiple colored line graphs represent frequencies of actions labeled “Action 0” through “Action 8”. A vertical black line near episode 180 separates the regions labeled “Random Actions” on the left and “Chosen by the P-D Q N Network” on the right. “Action 0”, shown in blue, becomes the dominant action after episode 250 and frequently reaches values between 20 and 30 occurrences, with many peaks at the maximum value of 30. “Action 3”, shown in red, shows strong activity between episodes 200 and 260 with peaks between 18 and 24 before decreasing sharply afterward. “Action 7”, shown in gray, becomes prominent between episodes 240 and 300 with frequencies reaching about 22. “Action 8”, shown in yellow-green, briefly rises near episode 190 with frequencies around 15. The remaining actions mostly remain below 5 occurrences across most episodes. Panel (d) is titled “Used Action Parameters”. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 8000 in increments of 2000. Overlapping histograms display parameter usage frequencies for “Action 0”, “Action 3”, and “Action 7”. Blue bars represent “Action 0”, red bars represent “Action 3”, and gray bars represent “Action 7”. The horizontal axis contains two parameter scales. The lower scale labeled “Parameters for Action 0” ranges from 0 to 0.5 in increments of 0.1. The upper scale labeled “Parameters for Action 3 and 7” ranges from 0 to 10 in increments of 1. For “Action 0”, the highest frequency occurs near parameter value 0 with 8000 occurrences. Frequencies near parameter value 0.1 are about 3000, and near parameter value 0.5 are about 3500. For “Action 3”, the highest frequencies occur near parameter values between 0 and 1, with bars reaching about 600 occurrences near 0 and about 300 near 1. Frequencies decrease sharply beyond parameter value 2. For “Action 7”, the largest frequencies also occur near parameter values between 0 and 1, with bars near 0 reaching about 500 occurrences. Very few occurrences appear beyond parameter value 2. Note: All numerical data values are approximated.

Overview of the workflow proposed in this study

Figure 3

Sequential voxel canopy growth states across multiple rounds are shown with scores, actions, and parameters.

The two horizontal sections show sequential rounds of voxelized canopy growth represented in three-dimensional coordinate plots. Each panel is labeled “Leaf State in L A D” and displays a green voxel canopy structure inside a three-dimensional coordinate grid. Below each plot are corresponding “Score”, “Action”, and “Parameter” values. The final panel on the lower-right is labeled “Target” and shows the target voxel canopy configuration. The top section displays rounds 0 through 14 at intervals of two. Round 0: The voxel canopy is very small and compact near the center of the coordinate space. Score: 0.1. Round 2: The voxel canopy becomes slightly larger and wider with a rounded triangular shape. Score: 0.1. Round 4: The canopy expands further upward and outward, forming a denser mound-like structure. Score: 0.1. Round 6: The voxel canopy increases in width and height with a smoother dome-like form. Score: 0.1. Round 8: The canopy becomes denser and more elevated with a broad rounded surface. Score: 0.14. Round 10: The canopy enlarges significantly and forms a larger hemispherical structure. Score: 0.82. Round 12: The voxel structure becomes taller and fuller with increased density across the canopy surface. Score: 2.15. Round 14: The canopy grows into a broad dome-like structure occupying most of the coordinate space. Score: 4.48. The lower section displays rounds 16 through 30 and the target state. Round 16: The canopy becomes larger and more compact with a smoother curved upper surface. Score: 7.81. Round 18: The voxel canopy continues expanding upward and outward with increased density. Score: 10.84. Round 20: The canopy develops into a larger dome-like structure with broad horizontal spread. Score: 14.20. Round 22: The voxel canopy becomes taller and denser with a rounded upper surface. Score: 15.87. Round 24: The canopy reaches one of the densest and largest states in the sequence. Score: 15.90. Round 26: The canopy remains broad and dense with slight irregularities along the upper edge. Score: 14.73. Round 28: The voxel structure becomes slightly flatter along the top while maintaining a large spread. Score: 13.13. Round 30: The canopy remains large and dense with a rounded upper form occupying most of the coordinate space. Score: 11.99. The “Target” panel displays a compact rounded voxel canopy occupying a smaller region near the center-left of the coordinate grid. The Action and Parameter are labeled below a rightward curved arrow between two consecutive plots as follows: Between rounds 0 and 2: Action: 0. Parameter: 0.035 and 0.1. Between rounds 2 and 4: Action: 0. Parameter: 0.15 and 0.4. Between rounds 4 and 6: Action: 0. Parameter: 0.3 and 0.2. Between rounds 6 and 8: Action: 0. Parameter: 0.05 and 0.45. Between rounds 8 and 10: Action: 0. Parameter: 0.5 and 0.45. Between rounds 10 and 12: Action: 0. Parameter: 0.25 and 0.5. Between rounds 12 and 14: Action: 0. Parameter: 0.15 and 0.05. Between rounds 14 and 16: Action: 0. Parameter: 0.05 and 0.1. Between rounds 16 and 18: Action: 0. Parameter: 0.05 and 0.5. Between rounds 18 and 20: Action: 0. Parameter: 0.05 and 0.05. Between rounds 20 and 22: Action: 0. Parameter: 0.05 and 0.05. Between rounds 22 and 24: Action: 0. Parameter: 0.4 and 0.05. Between rounds 24 and 26: Action: 0. Parameter: 0.05 and 0.05. Between rounds 26 and 28: Action: 0. Parameter: 0.05 and 0.05. Between rounds 28 and 30: Action: 0. Parameter: 0.05 and 0.05. Curved arrows between consecutive rounds indicate the progression sequence from one canopy state to the next.

Plotted tree states displayed to players in an example episode: (a) Initial branch state in QSM of a young plane tree; (b) Estimated LAD of this young plane tree at its initial state; (c) A randomly populated target LAD in the game

Figure 3

Figure 4

Four tree pruning method diagrams compare thinning, raising, reduction, and topping operations.

The four side-by-side tree diagrams are labeled “(a)”, “(b)”, “(c)”, and “(d)” beneath each panel. Each panel shows a leafless tree structure with branches color-coded into red “Cut Branches” and gray “Kept Branches”. A legend appears below each diagram indicating the two branch categories. Panel (a) is titled “Thinning” with the subtitle “minimum distance 0.05 meters”. Numerous interior and overlapping branches throughout the tree canopy are highlighted in red to indicate branches removed during thinning. Gray branches remain distributed throughout the structure. Panel (b) is titled “Raising” with the subtitle “raised height 0.4 meters”. Two semi-transparent horizontal planes appear near the base of the canopy, with a vertical dimension marker labeled “0.4 meters”. Lower branches beneath the raised height threshold are highlighted in red, while upper branches remain gray. Panel (c) is titled “Reduction” with the subtitle “distance 0.8 meters from west”. Two semi-transparent vertical planes are positioned on the left side of the tree with a horizontal dimension marker labeled “0.8 meters”. Branches extending beyond the reduction boundary toward the left side are shown in red, while interior branches remain gray. Panel (d) is titled “Topping” with the subtitle “cylinder depth 3”. A semi-transparent polygonal or cylindrical boundary surrounds the upper canopy region. Branches extending beyond the upper boundary are highlighted in red, indicating topped branches, while interior branches are retained in gray.

Four predefined pruning strategies for the players to choose from

Figure 5

A workflow diagram illustrates a deep reinforcement learning model for tree maintenance decision-making.

The diagram presents a workflow connecting a “Tree Growth Game” environment with a “D R L Model” for evaluating tree maintenance strategies. The diagram is divided into two large colored regions. The left pink region is labeled “Tree Growth Game” and the right blue-gray region is labeled “D R L Model”. On the left side, a rounded rectangular panel titled “Environment” contains a three-dimensional visualization in a three-dimensional coordinate plane labeled “Tree Growth Simulation”. A rightward arrow points to a curved panel labeled “Future State of The Tree” under “D R L Model”. Inside the curved panel labeled “Future State of The Tree” is a green voxel-style three-dimensional tree structure. Purple text below the panel labels it as “State S subscript t plus 1”. To the lower-right side of this panel is another curved white panel labeled “Targeted State of the Tree”. Inside this panel is a flattened green voxel-style tree canopy structure. The purple text below labels it as “Goal S subscript T A R”. Between the two state panels is a circular comparison symbol containing an “X”. The text above the comparison node reads “R S subscript t plus 1 comma S subscript T A R”. A rightward arrow from “Future State of The Tree” and an upward arrow from “Targeted State of the Tree” connect to this node. Two dashed horizontal arrows extend rightward from the comparison node toward a vertical divider line. The upper dashed line is labeled “Reward R subscript t plus 1”. The lower dashed line is labeled “State S subscript t plus 1 minus S subscript T A R”. To the right of the divider, their corresponding arrows labeled “R subscript t” and “S subscript t minus S subscript T A R” point right to another panel labeled “Evaluation”. On the far right is the rounded rectangular panel labeled “Evaluation”. Inside the panel is a neural network-style diagram composed of connected circular nodes. The text below the panel labels it as “Agent”. A feedback arrow extends upward from the “Evaluation” panel and points toward a white box near the upper-left area labeled “Decision in Strategies for Tree Maintenance”. Above the feedback arrow is the purple expression “Q S subscript t comma A subscript t”. Purple text below the decision box reads “Action A subscript t”. This arrow continues leftward towards the “Tree Growth Simulation” under “Environment”.

Structure of the DRL model in decision-making

Figure 5

Structure of the DRL model in decision-making

Figure 6

Three-dimensional tree growth and voxel canopy states are shown across multiple rounds with scores, actions, and parameters.

The two horizontal sections show sequential rounds of a tree growth and reinforcement learning process. Each round contains two three-dimensional coordinate plots. The upper plot in each round is labeled “Branch State in Q S M” and displays a detailed branch structure of a tree. The lower plot is labeled “Leaf State in L A D” and displays a voxelized green canopy structure. Below each pair of plots are corresponding “Score”, “Action”, and “Parameter” values. The final column on the lower-right is labeled “Target” and shows the target voxel canopy configuration. The top section displays rounds 0 through 7. Round 0: The branch structure is compact with a small canopy. The leaf voxel structure forms a rounded compact cluster. Score: 6.43. Round 1: The branch canopy becomes denser and slightly larger. The voxel canopy expands upward and outward. Score: 10.03. Round 2: The branch structure further increases in density and width. The voxel canopy becomes larger and more spherical. Score: 13.56. Round 3: The branch structure grows slightly taller and fuller. The voxel canopy enlarges further with greater density. Score: 15.33. Round 4: The branch structure reaches one of the densest canopy states in the sequence. The voxel canopy appears broad and compact. Score: 19.10. Round 5: The branch structure becomes narrow with a tall trunk-like form and sparse upper branches. The voxel canopy changes into a vertically elongated cluster. Score: 6.04. Round 6: The branch structure remains slender and vertical with sparse branching. The voxel canopy becomes thinner and more irregular. Score: 5.62. Round 7: The branch structure becomes slightly fuller than the previous round while remaining vertically narrow. The voxel canopy becomes taller and denser. Score: 10.07. The lower section displays rounds 8 through 14 and the target state. Round 8: The branch structure is vertically elongated with sparse branches. The voxel canopy forms a tall, dense column. Score: 12.27. Round 9: The branch structure becomes denser and slightly wider. The voxel canopy expands upward and outward. Score: 14.19. Round 10: The branch structure shifts into a bent upper canopy form. The voxel canopy becomes asymmetrical with an enlarged upper section. Score: 6.20. Round 11: The branch structure remains asymmetrical and sparse. The voxel canopy becomes irregular with separated lower voxels. Score: 4.71. Round 12: The branch structure thickens slightly and extends upward. The voxel canopy becomes denser with a broader upper section. Score: 8.39. Round 13: The branch structure becomes narrow and tall again. The voxel canopy forms a vertical elongated mass with a side extension near the top. Score: 6.42. Round 14: The branch structure remains vertically narrow with moderate branching. The voxel canopy becomes irregular and asymmetrical with detached lower sections. Score: 4.57. The “Target” panel displays the desired voxel canopy state as a dense, horizontally spread green voxel cluster occupying a compact low-height region within the coordinate space. The Action and Parameter are labeled below a rightward curved arrow between two consecutive plots as follows: Between rounds 0 and 1: Action: 0. Parameter: 0.016. Between rounds 1 and 2: Action: 0. Parameter: 0.037. Between rounds 2 and 3: Action: 0. Parameter: 0.016. Between rounds 3 and 4: Action: 0. Parameter: 0.015. Between rounds 4 and 5: Action: 3. Parameter: 8.93. Between rounds 5 and 6: Action: 1. Parameter: 9.88. Between rounds 6 and 7: Action: 0. Parameter: 0.047. Between rounds 7 and 8: Action: 0. Parameter: 0.008. Between rounds 8 and 9: Action: 0. Parameter: 0.003. Between rounds 9 and 10: Action: 1. Parameter: 9.30. Between rounds 10 and 11: Action: 0. Parameter: 0.019. Between rounds 11 and 12: Action: 0. Parameter: 0.029. Between rounds 12 and 13: Action: 3. Parameter: 9.34. Between rounds 13 and 14: Action: 0. Parameter: 0.011. Between rounds 14 and “Target”: Action: 0. Parameter: 0.016 below a dashed curved arrow. Curved arrows between consecutive rounds indicate the transition sequence from one round to the next. Dashed arrows near the final rounds indicate progression toward the target state.

Rendered records of the tree growth game at Episode 55 with the maximum rewards at round 4

Figure 6

Rendered records of the tree growth game at Episode 55 with the maximum rewards at round 4

Figure 7

A multi-line chart compares maximum, average, and cumulative rewards across training episodes.

The chart titled “Reward by Episode” displays reward values across training episodes using three line graphs and their corresponding linear regression trend lines. The horizontal axis is labeled “Episode” and ranges from 0 to 60 in increments of 5. The left vertical axis is labeled “Reward in Single Round” and ranges from 0 to 20 in increments of 2. The right vertical axis is labeled “Cumulative Reward” and ranges from 0 to 200 in increments of 20. Three primary data series are shown. A green line with circular markers represents “Max Reward per Round”. A blue line with square markers represents “Average Reward per Round”. An orange line with triangular markers represents “Cumulative Reward”. Dashed trend lines of matching colors indicate linear regression trends for each reward type. The green “Max Reward per Round” series fluctuates strongly across the episodes. It begins near 0 at episode 0, rises above 5 by episode 2, and continues with multiple peaks and drops throughout the chart. Major peaks occur near episode 12 at about 12.5, episode 20 at about 17.8, episode 33 at about 12.6, episode 44 at about 15.8, episode 46 at about 15.9, episode 48 at about 15.5, episode 51 at about 15.5, episode 54 at about 15.7, and episode 55 at 19.0, which is the highest value in the series. Several episodes such as 0, 4, 23, 28, 45, 49, and 56 show values near 0. The green dashed regression line shows a gradual upward trend from about 4.5 at episode 0 to about 9.5 at episode 60. The blue “Average Reward per Round” series varies within a narrower range. It begins near 0 at episode 0 and mostly fluctuates between 1 and 6. Higher values occur near episode 20 at about 9.6, episode 44 at about 9.8, episode 48 at about 10.0, and episode 55 at about 9.0. Several dips to 0 appear near episodes 0, 23, 28, 45, 49, and 56. The blue dashed regression line shows a modest upward trend from about 2.3 at episode 0 to about 4.3 at episode 60. The orange “Cumulative Reward” series displays the largest fluctuations and corresponds to the right vertical axis. It begins near 0 at episode 0 and varies widely across the chart. Moderate peaks occur near episode 11 at about 70, episode 31 at about 115, and episode 35 at about 120. The highest peaks occur near episode 44 at approximately 195 and episode 48 at 200. Additional large peaks occur near episode 46 at about 150 and episode 55 at about 135. Multiple episodes show cumulative rewards close to 0, including episodes 0, 4, 23, 24, 28, 29, 45, 49, and 56. The orange dashed regression line trends upward gradually from about 25 at episode 0 to about 68 at episode 60. Three legends appear on the right side of the chart, identifying the solid data lines, the cumulative reward series, and the dashed regression lines, respectively. Note: All numerical data values are approximated.

The trend of the gained reward using linear regression

Figure 7

The trend of the gained reward using linear regression

Figure 8

A multi-line chart compares maximum, average, and cumulative rewards across one thousand training episodes.

The chart titled “Reward by Episode” displays reward values across one thousand training episodes using three line graphs and their corresponding linear regression trend lines. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The left vertical axis is labeled “Reward in Single Round” and ranges from negative 100 to 50 in increments of 10. The right vertical axis is labeled “Cumulative Reward” and ranges from negative 100 to 550 in increments of 50. Three primary data series are shown. A green line with circular markers represents “Max Reward per Round”. A blue line with square markers represents “Average Reward per Round”. An orange line with triangular markers represents “Cumulative Reward”. Dashed trend lines of matching colors indicate linear regression trends for each reward type. The green “Max Reward per Round” series begins near negative 100 during the early episodes and fluctuates strongly. Between episodes 0 and 180, many values remain between negative 100 and 0. After episode 180, the series rises sharply and stabilizes mostly between 10 and 20. Several prominent peaks occur near episodes 270 and 370 at 40 and 42, respectively. Additional peaks between 20 and 30 appear throughout episodes 450 to 800. The green dashed regression line shows a steady upward trend from about negative 15 near episode 0 to 30 near episode 1000. The blue “Average Reward per Round” series also begins near negative 100 and shows large fluctuations during the early episodes. From episode 0 to 180, many values remain between negative 100 and 0. After episode 180, the average reward gradually improves and fluctuates mostly between 0 and 10. Occasional drops to negative 100 continue near episodes 400, 600, and 650. The highest average reward values occur near episodes 260 and 380 at 18. The blue dashed regression line increases gradually from about negative 20 near episode 0 to 20 near episode 1000. The orange “Cumulative Reward” series corresponds to the right vertical axis and shows the largest variation. During the early episodes, many cumulative reward values remain near negative 100. Between episodes 200 and 500, the series fluctuates heavily between negative 100 and positive 250. Large peaks occur near episode 270 at 530 and near episode 380 at 550, which is the highest cumulative reward value in the chart. After episode 500, the cumulative reward stabilizes mostly between 200 and 250 with smaller fluctuations. The orange dashed regression line rises steadily from near 0 at episode 0 to 300 at episode 1000. Three legends appear on the right side of the chart, identifying the solid data lines, the cumulative reward series, and the dashed regression lines, respectively. Note: All numerical data values are approximated.

The trend of the gained reward in the simplified tree growth game with binary voxels

Figure 8

The trend of the gained reward in the simplified tree growth game with binary voxels

Figure A1

Statistics from the log file regarding the game round, consumed time, and actions chosen in the experiments

Figure A1

Statistics from the log file regarding the game round, consumed time, and actions chosen in the experiments

Figure A2

The four panels labeled “(a)”, “(b)”, “(c)”, and “(d)” are arranged in a two-by-two layout. The charts summarize reinforcement learning episode statistics, including played rounds, computation time, action frequencies, and parameter usage. Panel (a) is titled “Played Rounds per Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Round” and ranges from 0 to 30 in increments of 2. Blue vertical bars represent the number of rounds played in each episode. Episodes before about 170 mostly remain below 12 rounds, with a peak near 27 for episode 90. After about episode 180, many episodes rapidly increase and frequently reach between 20 and 30 rounds. From about episode 300 onward, most bars remain near the maximum value of 30 rounds with only occasional drops below 20. Panel (b) is titled “Time Consumption by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Time Taken (seconds)” and ranges from 0 to 10. Early episodes before about 170 remain close to 0 seconds. Between episodes 180 and 300, the values fluctuate widely between about 1 and 10 seconds. After about episode 300, most episodes stabilize between 8 and 10 seconds with occasional decreases below 5 seconds. Several peaks slightly exceed 10 seconds near episodes 400, 620, and 820. Panel (c) is titled “Frequency of the Chosen Actions by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 30 in increments of 2. Multiple colored line graphs represent frequencies of actions labeled “Action 0” through “Action 8”. A vertical black line near episode 180 separates the regions labeled “Random Actions” on the left and “Chosen by the P-D Q N Network” on the right. “Action 0”, shown in blue, becomes the dominant action after episode 250 and frequently reaches values between 20 and 30 occurrences, with many peaks at the maximum value of 30. “Action 3”, shown in red, shows strong activity between episodes 200 and 260 with peaks between 18 and 24 before decreasing sharply afterward. “Action 7”, shown in gray, becomes prominent between episodes 240 and 300 with frequencies reaching about 22. “Action 8”, shown in yellow-green, briefly rises near episode 190 with frequencies around 15. The remaining actions mostly remain below 5 occurrences across most episodes. Panel (d) is titled “Used Action Parameters”. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 8000 in increments of 2000. Overlapping histograms display parameter usage frequencies for “Action 0”, “Action 3”, and “Action 7”. Blue bars represent “Action 0”, red bars represent “Action 3”, and gray bars represent “Action 7”. The horizontal axis contains two parameter scales. The lower scale labeled “Parameters for Action 0” ranges from 0 to 0.5 in increments of 0.1. The upper scale labeled “Parameters for Action 3 and 7” ranges from 0 to 10 in increments of 1. For “Action 0”, the highest frequency occurs near parameter value 0 with 8000 occurrences. Frequencies near parameter value 0.1 are about 3000, and near parameter value 0.5 are about 3500. For “Action 3”, the highest frequencies occur near parameter values between 0 and 1, with bars reaching about 600 occurrences near 0 and about 300 near 1. Frequencies decrease sharply beyond parameter value 2. For “Action 7”, the largest frequencies also occur near parameter values between 0 and 1, with bars near 0 reaching about 500 occurrences. Very few occurrences appear beyond parameter value 2. Note: All numerical data values are approximated.

Statistics from the log file regarding the simplified tree growth game using binary voxels. Compacted information is total rounds, consumed time, and actions chosen for each episode

Figure A2

The four panels labeled “(a)”, “(b)”, “(c)”, and “(d)” are arranged in a two-by-two layout. The charts summarize reinforcement learning episode statistics, including played rounds, computation time, action frequencies, and parameter usage. Panel (a) is titled “Played Rounds per Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Round” and ranges from 0 to 30 in increments of 2. Blue vertical bars represent the number of rounds played in each episode. Episodes before about 170 mostly remain below 12 rounds, with a peak near 27 for episode 90. After about episode 180, many episodes rapidly increase and frequently reach between 20 and 30 rounds. From about episode 300 onward, most bars remain near the maximum value of 30 rounds with only occasional drops below 20. Panel (b) is titled “Time Consumption by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Total Time Taken (seconds)” and ranges from 0 to 10. Early episodes before about 170 remain close to 0 seconds. Between episodes 180 and 300, the values fluctuate widely between about 1 and 10 seconds. After about episode 300, most episodes stabilize between 8 and 10 seconds with occasional decreases below 5 seconds. Several peaks slightly exceed 10 seconds near episodes 400, 620, and 820. Panel (c) is titled “Frequency of the Chosen Actions by Episode”. The horizontal axis is labeled “Episode” and ranges from 0 to 1000 in increments of 100. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 30 in increments of 2. Multiple colored line graphs represent frequencies of actions labeled “Action 0” through “Action 8”. A vertical black line near episode 180 separates the regions labeled “Random Actions” on the left and “Chosen by the P-D Q N Network” on the right. “Action 0”, shown in blue, becomes the dominant action after episode 250 and frequently reaches values between 20 and 30 occurrences, with many peaks at the maximum value of 30. “Action 3”, shown in red, shows strong activity between episodes 200 and 260 with peaks between 18 and 24 before decreasing sharply afterward. “Action 7”, shown in gray, becomes prominent between episodes 240 and 300 with frequencies reaching about 22. “Action 8”, shown in yellow-green, briefly rises near episode 190 with frequencies around 15. The remaining actions mostly remain below 5 occurrences across most episodes. Panel (d) is titled “Used Action Parameters”. The vertical axis is labeled “Frequency (times)” and ranges from 0 to 8000 in increments of 2000. Overlapping histograms display parameter usage frequencies for “Action 0”, “Action 3”, and “Action 7”. Blue bars represent “Action 0”, red bars represent “Action 3”, and gray bars represent “Action 7”. The horizontal axis contains two parameter scales. The lower scale labeled “Parameters for Action 0” ranges from 0 to 0.5 in increments of 0.1. The upper scale labeled “Parameters for Action 3 and 7” ranges from 0 to 10 in increments of 1. For “Action 0”, the highest frequency occurs near parameter value 0 with 8000 occurrences. Frequencies near parameter value 0.1 are about 3000, and near parameter value 0.5 are about 3500. For “Action 3”, the highest frequencies occur near parameter values between 0 and 1, with bars reaching about 600 occurrences near 0 and about 300 near 1. Frequencies decrease sharply beyond parameter value 2. For “Action 7”, the largest frequencies also occur near parameter values between 0 and 1, with bars near 0 reaching about 500 occurrences. Very few occurrences appear beyond parameter value 2. Note: All numerical data values are approximated.

Statistics from the log file regarding the simplified tree growth game using binary voxels. Compacted information is total rounds, consumed time, and actions chosen for each episode

Figure A3

Rendered records of the tree growth in the binary tree growth game at Episode 800, where the pruning strategy was stabilized. The starting state at round 0 and the target state were always the same in the training

Figure A3

https://doi.org/10.3390/f8060184

Table A1

Action space for agents to decide in the tree growth game and its simplified binary version

Action no.	Pruning type $K$	Range $M_{k}$	Meaning of the parameter	Range (in binary version) $M_{k}$	Meaning of the parameter (in the binary version)
0	Thinning	(float) [0, 0.05]	Branches with a distance to any other branch below this number in meters will be cut	(float) [0, 0.5)	The rate of solid voxels to be deleted
1	Raising	(float) [0, 10]	Branches within this height in meters from the crown start will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the bottom
2	Reduction east	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the east, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the east
3	Reduction south	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the south, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the south
4	Reduction west	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the west, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the west
5	Reduction north	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the north, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the north
6	Reduction top	(float) [0, 5]	Branches within this distance in meters below the crown’s top will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the top
7	Topping	(integer) [0, 5]	Cylinders that within this number from an end of a branch will be cut	(float) (0, 5]	Delete voxels whose distance from their center to the mean center of all solid voxels is among the furthest ones within this range in meters
8	No action (only with a generic growth)	–	Do not conduct any manual pruning	–	Do not conduct any manual pruning

Action no.	Pruning type $K$	Range $M_{k}$	Meaning of the parameter	Range (in binary version) $M_{k}$	Meaning of the parameter (in the binary version)
0	Thinning	(float) [0, 0.05]	Branches with a distance to any other branch below this number in meters will be cut	(float) [0, 0.5)	The rate of solid voxels to be deleted
1	Raising	(float) [0, 10]	Branches within this height in meters from the crown start will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the bottom
2	Reduction east	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the east, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the east
3	Reduction south	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the south, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the south
4	Reduction west	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the west, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the west
5	Reduction north	(float) [0, 10]	Branches within this distance, meters from the crown’s outreach from the north, will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the north
6	Reduction top	(float) [0, 5]	Branches within this distance in meters below the crown’s top will be cut	(Integer) [1, 10]	The number of solid voxel layers to be deleted from the top
7	Topping	(integer) [0, 5]	Cylinders that within this number from an end of a branch will be cut	(float) (0, 5]	Delete voxels whose distance from their center to the mean center of all solid voxels is among the furthest ones within this range in meters
8	No action (only with a generic growth)	–	Do not conduct any manual pruning	–	Do not conduct any manual pruning

Note(s): The parameters have different meanings in the binary version of the game

Source(s): Authors’ own work

Abegg

Kükenbrink

Zell

Schaepman

M.E.

and

Morsdorf

(

2017

), “

Terrestrial laser scanning for forest inventories—tree diameter distribution and scanner location impact on occlusion

”,

Forests

, Vol.

No.

, p.

184

, 6, doi:

https://www.fs.usda.gov/research/treesearch/12602

Bedker

O’Brien

and

Mielke

(

2012

), “

How to prune trees

”,

USDA, Forest Service, State and Private Forestry, Northeastern Area, 11 Campus Blvd., Ste 200 Newtown Square, PA 19073, FR-01-95

available at:

https://doi.org/10.1109/ACCESS.2024.3433381

Bittencourt

J.C.N.

Costa

D.G.

Portugal

and

Vasques

(

2024

), “

A survey on adaptive smart urban systems

”,

IEEE Access

, Vol.

, pp.

102826

102850

, doi:

https://doi.org/10.1038/s44287-024-00116-8

Brandt

Chave

Fensholt

Ciais

Wigneron

J.-P.

Gieseke

Saatchi

Tucker

C.J.

and

Igel

(

2025

), “

High-resolution sensors and deep learning models for tree resource monitoring

”,

Nature Reviews Electrical Engineering

, Vol.

No.

, pp.

, doi:

https://doi.org/10.1016/j.agrformet.2022.109282

Chau

W.Y.

Wang

Y.-H.

Chiu

S.W.

Tan

P.S.

Leung

M.L.

Lui

H.L.

Lau

Y.M.

Liu

K.-F.

and

Hau

B.C.H.

(

2023

), “

Monitoring of tree tilt motion using lorawan-based wireless tree sensing system (IoTT) during super typhoon Mangkhut

”,

Agricultural and Forest Meteorology

, Vol.

329

, 109282, doi:

https://doi.org/10.48044/jauf.2010.015

Clark

and

Matheny

(

2010

), “

The research foundation to tree pruning: a review of the literature

”,

Arboriculture and Urban Forestry

, Vol.

No.

, pp.

110

120

, doi:

Dan

Yong

and

CaiRong

(

2012

), “

A review of TLS application in forest parameters retrieving

”,

World Forestry Research

, Vol.

No.

, pp.

https://doi.org/10.3390/f13050641

Dervishi

Poschenrieder

Rötzer

Moser-Reischl

and

Pretzsch

(

2022

), “

Effects of climate and drought on stem diameter growth of urban tree species

”,

Forests

, Vol.

No.

, p.

641

, doi:

https://doi.org/10.1109/ICMLA.2017.0-184

Diallo

E.A.O.

Sugiyama

and

Sugawara

(

2017

), “

Learning to coordinate with deep reinforcement learning in doubles pong game

”,

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)

, pp.

, doi:

https://doi.org/10.1016/j.landurbplan.2023.104849

Dowtin

A.L.

Cregg

B.C.

Nowak

D.J.

and

Levia

D.F.

(

2023

), “

Towards optimized runoff reduction by urban tree cover: a review of key physical tree traits, site conditions, and management strategies

”,

Landscape and Urban Planning

, Vol.

239

, 104849, doi:

https://doi.org/10.3390/rs11182074

Lindenbergh

Ledoux

Stoter

and

Nan

(

2019

), “

AdTree: accurate, detailed, and automatic modelling of laser-scanned trees

”,

Remote Sensing

, Vol.

No.

, doi:

https://doi.org/10.48550/arXiv.2412.19437

DeepSeek-AI

Liu

Feng

Xue

Wang

Zhao

Deng

Zhang

Ruan

Dai

(

2024

DeepSeek-V3 Technical Report (No. arXiv:2412.19437)

, arXiv, doi:

https://doi.org/10.3390/rs12183089

Fan

Nan

Dong

and

Chen

(

2020

), “

AdQSM: a new method for estimating above-ground biomass from tls point clouds

”,

Remote Sensing

, Vol.

No.

, doi:

https://doi.org/10.1016/j.ufug.2023.128115

Francis

Disney

and

Law

(

2023

), “

Monitoring canopy quality and improving equitable outcomes of urban tree planting using LiDAR and machine learning

”,

Urban Forestry and Urban Greening

, Vol.

, 128115, doi:

https://doi.org/10.1109/WSC.2016.7822130

M.C.

(

2016

), “

AlphaGo and monte carlo tree search: the simulation optimization perspective

”,

2016 Winter Simulation Conference (WSC), 11-14 December 2016 at Washington, DC, IEEE

, pp.

659

670

, doi:

https://doi.org/10.1016/j.agrformet.2020.108288

Grylls

and

Van Reeuwijk

(

2021

), “

Tree model with drag, transpiration, shading and deposition: identification of cooling regimes and large-eddy simulation

”,

Agricultural and Forest Meteorology

, pp.

298

299

, doi:

https://doi.org/10.1016/j.jedc.2010.10.007

Helmes

K.L.

and

Stockbridge

R.H.

(

2011

), “

Thinning and harvesting in stochastic forest models

”,

Journal of Economic Dynamics and Control

, Vol.

No.

, pp.

, doi:

https://doi.org/10.1071/FP08052

Hemmerling

Kniemeyer

Lanwert

Kurth

and

Buck-Sorlin

(

2008

), “

The rule-based language XL and the modelling environment GroIMP illustrated with simulated tree competition

”,

Functional Plant Biology

, Vol.

No.

, pp.

739

750

, doi:

https://doi.org/10.1080/02827589809382966

Holmgren

and

Thuresson

(

1998

), “

Satellite remote sensing for forestry planning—a review

”,

Scandinavian Journal of Forest Research

, Vol.

Nos

1-4

, pp.

110

, doi:

https://doi.org/10.21273/HORTTECH.25.2.238

Kohek

Š.

Guid

Tojnko

Unuk

and

Kolmanič

(

2015

), “

EduAPPLE: interactive teaching tool for apple tree crown formation

”,

HortTechnology

, Vol.

No.

, pp.

238

246

, doi:

https://doi.org/10.1007/s10546-013-9883-1

Krayenhoff

E.S.

Christen

Martilli

and

Oke

T.R.

(

2014

), “

A multi-layer radiation model for urban neighbourhoods with trees

”,

Boundary-Layer Meteorology

, Vol.

151

No.

, pp.

139

178

, doi:

https://proceedings.mlr.press/v80/lee18b.html

Lee

Kim

S.-A.

Choi

and

Lee

S.-W.

(

2018

), “

Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling

”,

Proceedings of the 35th International Conference on Machine Learning

, pp.

2937

2946

available at:

https://doi.org/10.1145/3474085.3475314

Dai

Shao

and

Ding

(

2021

), “

From voxel to point: IoU-guided 3D object detection for point cloud with voxel-to-point decoder

”,

Proceedings of the 29th ACM International Conference on Multimedia

, pp.

4622

4631

, doi:

https://doi.org/10.1016/j.cirpj.2022.11.003

Zheng

Yin

Wang

and

Wang

(

2023

), “

Deep reinforcement learning in smart manufacturing: a review and prospects

”,

CIRP Journal of Manufacturing Science and Technology

, Vol.

, pp.

101

, doi:

https://doi.org/10.48550/arXiv.1509.02971

Lillicrap

T.P.

Hunt

J.J.

Pritzel

Heess

Erez

Tassa

Silver

and

Wierstra

(

2019

Continuous Control with Deep Reinforcement Learning

(No. arXiv:1509.02971), arXiv

, doi:

https://doi.org/10.1093/aob/mcaa143

Louarn

and

Song

(

2020

), “

Two decades of functional–structural plant modelling: now addressing fundamental questions in systems biology and predictive ecology

”,

Annals of Botany

, Vol.

126

No.

, pp.

501

509

, doi:

https://doi.org/10.14627/537752030

Ludwig

Hensel

Rötzer

Ahmeti

Chen

Erdal

H.I.

Reischel

Shu

Tyc

J.M.

and

Yazdi

(

2024

), “

Digital workflow for novel urban green system design derived from a historical role model

”,

Journal of Digital Landscape Architecture

, Vol.

2024

No.

, pp.

333

345

, doi:

https://doi.org/10.48550/arXiv.1312.5602

Mnih

Kavukcuoglu

Silver

Graves

Antonoglou

Wierstra

and

Riedmiller

(

2013

Playing Atari with Deep Reinforcement Learning

(No. arXiv:1312.5602), arXiv

, doi:

https://doi.org/10.1038/nature14236

Mnih

Kavukcuoglu

Silver

Rusu

A.A.

Veness

Bellemare

M.G.

Graves

Riedmiller

Fidjeland

A.K.

Ostrovski

Petersen

Beattie

Sadik

Antonoglou

King

Kumaran

Wierstra

Legg

and

Hassabis

(

2015

), “

Human-level control through deep reinforcement learning

”,

Nature

, Vol.

518

No.

7540

, pp.

529

533

, doi:

https://doi.org/10.1016/j.scs.2019.101770

Nitoslawski

S.A.

Galle

N.J.

Van Den Bosch

C.K.

and

Steenberg

J.W.N.

(

2019

), “

Smarter ecosystems for smarter cities? A review of trends, technologies, and turning points for smart urban forestry

”,

Sustainable Cities and Society

, Vol.

, 101770, doi:

https://doi.org/10.1016/j.ufug.2021.127391

Oshio

Kiyono

and

Asawa

(

2021

), “

Numerical simulation of the nocturnal cooling effect of urban trees considering the leaf area density distribution

”,

Urban Forestry and Urban Greening

, Vol.

, 127391, doi:

https://doi.org/10.26868/25222708.2019.210698

Palme

La Rosa

Privitera

and

Chiesa

(

2019

), “Evaluating the potential energy savings of an urban green infrastructure through environmental simulation”,

Proceedings of the 16th IBPSA Conference

Rome, Italy, Sept 2-4, 2019

, pp.

3524

3530

, doi:

Pan

and

Jakubiec

(

2022

), “Simulating the impact of deciduous trees on energy, daylight, and visual comfort: impact analysis and a practical framework for implementation”,

Proceedings of eSim Building Simulation Conference 2022: 12th Conference of IBPSA-Canada

June 22-23, 2022 at Ottawa, Canada

https://doi.org/10.1109/TII.2017.2780060

Popović

N.D.

Popović

D.S.

and

Seskar

(

2018

), “

A novel cloud-based advanced distribution management system solution

”,

IEEE Transactions on Industrial Informatics

, Vol.

No.

, pp.

3469

3476

IEEE Transactions on Industrial Informatics

, doi:

https://doi.org/10.48550/arXiv.2003.04664

Portelas

Colas

Weng

Hofmann

and

Oudeyer

P.-Y.

(

2020

Automatic Curriculum Learning for Deep RL: A Short Survey

(No. arXiv:2003.04664), arXiv

, doi:

https://doi.org/10.1007/978-1-4613-8476-2

Prusinkiewicz

and

Lindenmayer

(

1996

The Algorithmic Beauty of Plants

Springer

New York, NY

, doi:

Puterman

M.L.

(

2014

Markov Decision Processes: Discrete Stochastic Dynamic Programming

John Wiley & Sons

Hoboken, NJ

, ISBN:

978-1-118-62587-3

https://doi.org/10.1016/j.buildenv.2019.106606

Rahman

M.A.

Stratopoulos

L.M.F.

Moser-Reischl

Zölch

Häberle

K.-H.

Rötzer

Pretzsch

and

Pauleit

(

2020

), “

Traits of trees for cooling urban heat islands: a meta-analysis

”,

Building and Environment

, Vol.

170

, 106606, doi:

https://doi.org/10.1016/j.ufug.2023.127868

Rambhia

Volk

Rismanchi

Winter

and

Schultmann

(

2023

), “

Supporting decision-makers in estimating irrigation demand for urban street trees

”,

Urban Forestry and Urban Greening

, Vol.

, 127868, doi:

https://doi.org/10.1109/ICOASE.2018.8548937

Rashid

Z.N.

Zebari

S.R.M.

Sharif

K.H.

and

Jacksi

(

2018

), “

Distributed cloud computing and distributed parallel computing: a review

”,

2018 International Conference on Advanced Science and Engineering (ICOASE)

, pp.

167

172

, doi:

https://doi.org/10.3390/rs5020491

Raumonen

Kaasalainen

Åkerblom

Kaasalainen

Kaartinen

Vastaranta

Holopainen

Disney

and

Lewis

(

2013

), “

Fast automatic precision tree models from terrestrial laser scanner data

”,

Remote Sensing

, Vol.

No.

, doi:

https://github.com/QiguanShu/Branch-Pruning-Game-on-Urban-Trees

Shu

and

Boey

K.Z.

(

2024

), “

QiguanShu/Branch-pruning-game-on-urban-trees: this is one of the research project named Urban Green System 4.0 funded by DFG-DACH (LU2505/2-1). Please find more details in our publication

”,

GitHub

available at:

https://doi.org/10.3390/f13111955

Shu

Rötzer

Detter

and

Ludwig

(

2022

), “

Tree information modeling: a data exchange platform for tree design and management

”,

Forests

, Vol.

No.

, doi:

https://doi.org/10.2139/ssrn.4855810

Shu

Rötzer

Yazdi

Moser-Reischl

and

Ludwig

(

2024a

Can Leaf Area Density Be Estimated from Quantitative Structure Models of Trees?

(SSRN Scholarly Paper No. 4855810)

, doi:

https://doi.org/10.3389/fpls.2024.1297390

Shu

Yazdi

Rötzer

and

Ludwig

(

2024b

), “

Predicting resprouting of Platanus × hispanica following branch pruning by means of machine learning

”,

Frontiers in Plant Science

, Vol.

, 1297390, doi:

https://doi.org/10.23919/SpliTech49282.2020.9243756

Silva

Cardoso

Barros

Ribeiro

Carvalho

and

Rito Lima

(

2020

), “

A flexible system for optimising green spaces irrigation

”,

2020 5th International Conference on Smart and Sustainable Technologies (SpliTech)

, pp.

, doi:

https://doi.org/10.1038/nature16961

Silver

Huang

Maddison

C.J.

Guez

Sifre

van den Driessche

Schrittwieser

Antonoglou

Panneershelvam

Lanctot

Dieleman

Grewe

Nham

Kalchbrenner

Sutskever

Lillicrap

Leach

Kavukcuoglu

Graepel

and

Hassabis

(

2016

), “

Mastering the game of go with deep neural networks and tree search

”,

Nature

, Vol.

529

No.

7587

, pp.

484

489

, doi:

https://doi.org/10.1016/j.ufug.2022.127810

Speak

A.F.

and

Salbitano

(

2023

), “

The impact of pruning and mortality on urban tree canopy volume

”,

Urban Forestry and Urban Greening

, Vol.

, 127810, doi:

https://doi.org/10.48550/arXiv.1708.04782

Vinyals

Ewalds

Bartunov

Georgiev

Vezhnevets

A.S.

Yeo

Makhzani

Küttler

Agapiou

Schrittwieser

Quan

(

2017

StarCraft II: A New Challenge for Reinforcement Learning (No. arXiv:1708.04782), arXiv

, doi:

Viswanadhapalli

J.K.

Elumalai

V.K.S.S.

Shah

and

Mahajan

(

2024

), “

Deep reinforcement learning with reward shaping for tracking control and vibration suppression of flexible link manipulator

”,

Applied Soft Computing

, Vol.

152

, 110756, doi:

https://doi.org/10.1016/j.asoc.2023.110756

https://doi.org/10.1016/j.envpol.2012.10.021

Vos

P.E.J.

Maiheu

Vankerkom

and

Janssen

(

2013

), “

Improving local air quality in cities: to tree or not to tree?

”,

Environmental Pollution

, Vol.

183

, pp.

113

122

, doi:

https://doi.org/10.48550/arXiv.1511.06581

Wang

Schaul

Hessel

van Hasselt

Lanctot

and

de Freitas

(

2016

Dueling Network Architectures for Deep Reinforcement Learning

(No. arXiv:1511.06581), arXiv

, doi:

https://doi.org/10.1109/TNNLS.2022.3207346

Wang

Liang

Zhao

Huang

Dai

and

Miao

(

2024

), “

Deep reinforcement learning: a survey

”,

IEEE Transactions on Neural Networks and Learning Systems

, Vol.

No.

, pp.

5064

5078

, doi:

https://doi.org/10.1007/BF00992698

Watkins

C.J.C.H.

and

Dayan

(

1992

), “

Q-learning

”,

Machine Learning

, Vol.

No.

, pp.

279

292

, doi:

https://doi.org/10.1038/s43017-020-00129-5

Wong

N.H.

Tan

C.L.

Kolokotsa

D.D.

and

Takebayashi

(

2021

), “

Greenery as a mitigation and adaptation strategy to urban heat

”,

Nature Reviews Earth and Environment

, Vol.

No.

, pp.

166

181

, doi:

https://doi.org/10.48550/arXiv.2409.15315

(

2024

An Efficient Recommendation Model Based on Knowledge Graph Attention-Assisted Network (KGATAX)

(No. arXiv:2409.15315), arXiv

, doi:

https://doi.org/10.48550/arXiv.1810.06394

Xiong

Wang

Yang

Sun

Han

Zheng

Zhang

Liu

and

Liu

(

2018

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

(No. arXiv:1810.06394), arXiv

, doi:

https://doi.org/10.14627/537740019

Yazdi

Shu

and

Ludwig

(

2023

), “

A target-driven tree planting and maintenance approach for next generation urban green infrastructure (UGI)

”,

JoDLA – Journal of Digital Landscape Architecture

, Vols

8-2023

, p.

178

, doi:

https://doi.org/10.1038/s41597-023-02873-x

Yazdi

Shu

Rötzer

Petzold

and

Ludwig

(

2024

), “

A multilayered urban tree dataset of point clouds, quantitative structure and graph models

”,

Scientific Data

, Vol.

No.

, 1, doi:

https://doi.org/10.1109/JIOT.2021.3078462

Qin

Zhang

Shen

Jiang

and

Guan

(

2021

), “

A review of deep reinforcement learning for smart building energy management

”,

IEEE Internet of Things Journal

, Vol.

No.

, pp.

12046

12063

, doi: