XML semantic query optimization (XSQO) is an important area in eXtensible Markup Language (XML) query processing. However, the experiments evaluating semantic optimization methods often suffer because of the lack of suitable data sets. To evaluate XSQO methods it is necessary to be able to build datasets with specific characteristics. In particular, it is necessary to be able to set: selectivity of embedded elements, selectivity of values of elements, depth, fan‐out and size. The aim of this paper is to describe the requirements of such a generator, and the challenges of building the generator.
The paper considers that there is currently no generator that gives this flexibility, so the paper discusses the design and building of such a generator.
The main characteristic of the generator is that it is possible to adapt existing XML documents, including XML benchmarks, for experiments that evaluate XSQO methods. With the generator, users are able to modify not only the structure of XML documents but also content quickly and directly.
The paper provides information of value to information technology professionals.
