Review of how the generalized regression estimators contribute to estimating the financial and economic data with missing observations under unequal probability sampling

Lawson, Nuanpan

doi:10.1108/AJEB-12-2023-0134

Purpose

Knowing financial and economic information beforehand benefits in planning and developing policies for every country especially for a developing country like Thailand and for other Asian countries. Unfortunately, missing data or non-response plays an essential role in many areas of studies including finance and economics. Eradication of missing data in a proper way before further analysis can gain remarkable outcomes and can be effective for planning policies. This review on the generalized regression estimators for population total can be applied to financial, economic and other data when missing data are present.

Design/methodology/approach

The generalized regression estimators for estimating population total, including the variance estimators under unequal probability sampling without replacement with missing data are explored under the reverse framework. Applications to financial and economic data in Thailand are also reviewed.

Findings

The review of literatures related to the proposed estimator shows the best performance, giving smaller variances in all scenarios.

Originality/value

The generalized regression estimators can assist in estimating financial and economic data that contain missing values with different missing mechanisms and can be used in other applications which help gain more superior estimators.

1. Introduction

Generalized regression (GREG) estimation is optimized for design-based estimations of population totals for survey sampling, which are often used in financial data which are seldom complete, becoming an inherent issue requiring a solution. An opulence of economic advancement is imperative in every country to maintain the country’s infrastructure and quality of life of citizens which calls for statistical analysis of data, where the problems of missing data and suitable estimators arise. Measures have been placed on a plethora of aspects to ensure economic development in Thailand, as seen in sustainable development plans in “Thailand 4.0”, as Thailand is a country highly dependent on revenue from tourism. With this reason, the economy is liable to fluctuations, especially recently due to the coronavirus pandemic. After withdrawal of revenue from foreign tourists, the economy became more focused on citizens’ assets, income, and cash flow within the country. A myriad of policies have been enforced to provide stability to individuals’ financial stability and capability to manage their assets during a pandemic. Analysis of the population’s financial issues is vital for proper repairment of the crisis and instigation of solutions and endorsement for citizens in need throughout the duration of the pandemic. Data on the population’s expenses is required for insight on the financial obstacles being faced and to further analyze then address the concerns suitably.

Furthermore, the government has induced many means to stimulate tourism within the country such as the “Thai Travel Together” campaign which allows cash flow within the country and mitigates hardships inflicted upon the economy as a result of the crisis from COVID-19. Moreover, additional facets impact the economy, including unsubstantial investment that afflicts the economy on a large-scale. Sustainable development plans have been enforced to target ten industries and try to resolve production efficiency and competitiveness afflicting Thailand’s industrial economic structure.

However, missing data or nonresponse often occurs in real world data which can obscure facts used for decision making in business and economics, so opportunities are lost due to incomplete data. Missing data occurs due to nonresponse or participants choosing not to answer specific questions for instance. Missing data can occur when it does not depend on missing values or observed values, called missing completely at random (MCAR) or uniform nonresponse, or the missingness correlates to the observations but is not related to the missing values and this is called missing at random (MAR). Therefore, resolving nonresponse is imperative for appropriate financial planning. Difficulties in acquiring accurate data can be a result of lack of records or nonresponse derived from surveys. In conclusion, statistical methods that tackle nonresponse are vital measures to solving this problem. The nonresponse issue was first recommended by Hansen and Hurwitz (1946) in the mail survey. They introduced an unbiased estimator for population mean that used data from a sample survey on both respondents and non-respondents under unequal probability sampling without replacement (UPWOR). Horvitz and Thompson (1952) suggested using the weight to create an unbiased population total estimator under unequal probability sampling for with and without replacement. The first order of inclusion probability is used as the weight for correction of the bias. Unfortunately, there is an issue in calculating variance in Horvitz and Thompson due to it requiring joint inclusion probabilities which are hard to find in some complex survey designs. Later, Hajek (1964) proposed a new estimator to correct an issue of the variance estimator which produces less variance with respect to Horvitz and Thompson (1952), but only when there is no relationship between the study variable and the inclusion probabilities. Their new estimator is a ratio estimator, which is the ratio of sample means of two random variables. for estimating population total which is an approximately unbiased ratio estimator.

The GREG estimator is a special type of calibration estimator and improves this method of estimation using auxiliary information. It is in the shape of the Horvitz and Thompson (1952) estimator which integrates with the weighting approach as it can assist in reducing the nonresponse bias. Bethlehem and Keller (1987) introduced to use weights using linear models which is a new weighting method that can be used in person-based estimations. Many works have been done based on GREG to use the benefit of the relationship between the study and auxiliary variables to skyrocket the efficiency of the population total or population mean estimators and also the variance estimators (see, e.g. Montanari, 1987; Särndal et al., 1992; Estevao and Särndal, 2003; Särndal and Lundström, 2005; Särndal, 2007). The two-phase framework concerns studying the selected sample and nonresponse in the first and second phases, respectively, under nonresponse. It is a popular technique to use to study the GREG estimators’ variance (see, e.g. Rao, 1990; Särndal, 1992; Deville and Särndal, 1994; Särndal and Lundström, 2005).

Fay (1991) invented an alternative to the two-phase measure, the reverse framework. The name comes from the order of studies being reversed, nonresponse is a candidate in the first phase and the sampling shown in the second phase (see, e.g. Shao and Steel, 1999; Haziza and Rao, 2006; Haziza, 2010). Under this reverse method, the population total estimators and the GREG estimators along with their variance estimators were investigated within the MCAR and MAR nonresponse mechanisms and under different assumptions for the response probabilities and the sampling fractions (Lawson, 2017; Lawson and Ponkaew, 2019; Lawson and Siripanich, 2022; Ponkaew and Lawson, 2023).

In this paper, the GREG estimators under the reverse framework will be reviewed. The structure of this paper is as follows. The literature review is shown in section 2. The basic setup and the generalized regression estimators with missing data are reviewed in sections 3 and 4, respectively. Examples of the application related to financial and economic data in Bangkok, Thailand are displayed in section 5. Lastly, some conclusions and discussions are presented in section 6.

2. Literature review

First of all, let’s see how the generalized regression estimators have been developed and can be useful for estimating financial, economic, and other data. The generalized regression estimator can estimate the population mean or total. It is in the shape of Horvitz and Thompson’s (1952), a very well-known population total estimator under unequal probability sampling for both including and not including replacement. Nevertheless, the Horvitz and Thompson’s variance estimator is facing issues as it calls for the known joint inclusion probabilities, also known as the second order inclusion probabilities. They are the probabilities of two different units of populations selected in the sample. These values are difficult to find in complex survey designs and therefore the Horvitz and Thompson estimator is not easy to use in practice. Sometimes they are difficult to be calculated. Under unequal probability sampling using replacement, the formulas of the variance estimators are in their simple forms because these probability values, which is different from the variance formula under UPWOR which requires joint inclusion probabilities.

Some researchers also made an effort to solve this issue in the estimation of variance (Sen, 1953; Yates and Grundy, 1953) but still face the same issue requiring joint inclusion probability which is not known or hard to find. Therefore, some methods have been suggested in estimating the joint inclusion probability (Hartley and Rao, 1962; Hajek, 1964, 1981; Brewer, 2002; Brewer and Donadio, 2003).

The GREG estimators assist in finding population mean and total when there is information based on the related auxiliary variable to the study variable. The formula of the GREG estimator is in the structure of the Horvitz and Thompson (1952) estimator with additional adjustments calculated from an auxiliary variable. Optimal GREG estimators were developed using the known value of the regression coefficient in the population (Montanari, 1987; Berger et al., 2003) under different sampling plans such as stratified two-stage cluster sampling. The Taylor linearization method is used to study the variance and associated variance of the GREG estimator which is in a nonlinear form and therefore it needs to be transformed to a linear one. A drawback of the GREG variance estimator under this situation is that it requires complex methods in calculating the variance under UPWOR due to the requirement of the known joint inclusion probabilities as same as Horvitz and Thompson’s (1952) method. With nonresponse, Särndal and Lundström (2005) have introduced an almost unbiased GREG estimator for estimating population total and a variance estimator under the two-phase framework which requires nonresponse propensities. Under the reverse framework, some literatures explored GREG estimators including missing data. A GREG estimator based on the population total estimator when unit nonresponse appears within the study variable with a negligible sampling fraction under an unstratified, one-stage sample, with probability being unequal has been suggested when the nonresponse mechanism is MCAR. This is quite a restrictive assumption where the response probability is constant and tend to not occur in practice and also the estimator is in a nonlinear form (Lawson and Ponkaew, 2019). However, they proposed to use the modified automated linearization method to deal with this problem and showed that their estimator is unbiased and response probability is not essential. Recently in 2023, under the same assumptions of the previous work, the ratio method of estimation is applied to create the new GREG estimators (Ponkaew and Lawson, 2023). Their estimators are more efficient than the previous work in terms of giving smaller relative bias and root mean square errors as the criterions. We can also see from the application results that were applied to the Thai maize agricultural industry in Thailand in 2019 based on the data from the Office of the Agricultural Economics that their estimators provide a smaller variance in estimating the estimate values of total yield of maize in Thailand which could help in planning for policies for the economics part of Thailand’s agriculture in the future.

Under a more flexible nonresponse mechanism such as MAR to allow for more practicality to use in realistic situations, an approximately unbiased GREG estimator and its variance under UPWOR has been suggested in less controlled circumstances, with the response probabilities both known and unknown and the nonresponse mechanism is non-uniform, with both a small sampling fraction or any sampling fraction. This type of nonresponse mechanism can be called MAR or the ignorable nonresponse mechanism. The less restrictive situations in this estimator can assist by acquiring vital data imperative for financial and economic projects in many areas where missingness happens in the study variable. For example, to study farm profitability and resilience, which brings in revenue for the country can be investigated using the GREG estimators by estimating liabilities and net worth using some variables for instance farm type, farm size, region, tenure, and economic performance. Nevertheless, economic data, e.g. the agricultural industry such as total yield, total profit, and total income can be applied using the GREG estimator to find out these values in advance for planning for effective decision making which can develop economic wealth for the whole nation. Handling missingness appropriately can benefit the reliability of the data that is utilized for planning in Thailand and other countries around the world (Lawson and Siripanich, 2022).

3. Basic setup

The notations and the basic notions under the reverse framework will be introduced. Let $y$ be a study variable and a population total of the $y$ variable is $Y = \sum_{i \in U} y_{i}$ where $U = {1, 2, . . ., N}$ and $N$ is a population size. Let $x$ be an auxiliary variable and the population total of the $x$ variable is $X = \sum_{i \in U} x_{i}$ ⁠. The order of the paired ith values of the study variable $y$ and auxiliary variable $x$ is $(y_{i}, x_{i})$ ⁠, $i = 1, 2, . . ., N$ ⁠. For the ratio estimator, the variable $x$ is an auxiliary variable. The auxiliary variables $k$ and $w$ are used to define the first and joint inclusion probabilities under UPWOR and utilized to construct the ratio estimator respectively. A sample $s$ of size $n$ is drawn using UPWOR. For selecting the population unit $i$ in $U$ ⁠, the known and nonzero probability is represented by $P_{i} = X_{i} / X$ where $\sum_{i = 1}^{N} P_{i} = 1 .$ Let, $π_{i} = P (i \in s) = \sum_{i \in s} P (s)$ be the first order inclusion probability and $π_{i j} = P (i \land j \in s) = \sum_{{i, j} \subset s} P (s)$ be the second order inclusion probability. Assume that the information of $n \times (q + 1)$ matrix of values $x$ or $X_{n} = {(\begin{array}{c} x_{1} & x_{2} & \dots & x_{n} \end{array})}^{'}$ is known for all $x_{i}$ when $i \in s$ ⁠. The expectation and variance according to UPWOR sampling are defined as $E_{S}$ and $V_{S}$ respectively.

The population total GREG estimator is

{\hat{Y}}_{G REG} = \sum_{i \in s} \frac{y_{i}}{π_{i}} + {[\sum_{i \in U} x_{i} - \sum_{i \in s} \frac{x_{i}}{π_{i}}]}^{'} {(\sum_{i \in s} \frac{q_{i} x_{i} x_{i}^{'}}{π_{i}})}^{- 1} (\sum_{i \in s} \frac{q_{i} x_{i} y_{i}}{π_{i}}) = {\hat{Y}}_{H T} + {[X - {\hat{X}}_{H T}]}^{'} {\hat{β}}_{r}

where $x_{i} = {(x_{i 1}, . . ., x_{i j}, . . ., x_{i m})}^{'}$ ⁠, i = 1, 2, …, n, are the column vectors of the auxiliary variable with $m \geq 1, {\hat{Y}}_{H T} = \sum_{i \in s} \frac{y_{i}}{π_{i}}$ ⁠, ${\hat{X}}_{H T} = \sum_{i \in s} \frac{x_{i}}{π_{i}}$ ⁠, $X = \sum_{i \in U} x_{i}$ ⁠, ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{q_{i} x_{i} x_{i}^{'}}{π_{i}})}^{- 1} (\sum_{i \in s} \frac{q_{i} x_{i} y_{i}}{π_{i}})$ and $q_{i}$ are calculated by the linear assisting model $ξ$ ⁠: $E_{ξ} (y_{i}) = β^{'} x_{i}$ and $V_{ξ} (y_{i}) = σ_{i}^{2}$ that is $q_{i} = 1 / σ_{i}^{2}$ ⁠.

Under nonresponse, $R$ and $r_{i}$ denote the response mechanism and the $y_{i}$ response indicator variable, respectively.

r_{i} = {\begin{cases} 1, if y_{i} is observed \\ 0, if y_{i} is missing . \end{cases}

Let $p_{i}$ be the response probability shown as $p_{i} = P (r_{i} = 1) .$ Let $E_{R}$ and $V_{R}$ be the expectation and variance operators according to the response mechanism, and $E$ and $V$ be the overall expectation and variance operators, respectively. Therefore, $E_{R} (r_{i}) = P (r_{i} = 1) = p$ and $V_{R} (r_{i}) = p (1 - p)$ ⁠.

The GREG estimator ${\hat{Y}}_{G R E G}$ variance from the reverse framework is

V ({\hat{Y}}_{G R E G}) = E_{R} V_{S} ({\hat{Y}}_{G R E G} | R) + V_{R} E_{S} ({\hat{Y}}_{G R E G} | R),

4. Generalized regression estimators with missing data

Numerous works have investigated the GREG estimators with missing data under the two-phase framework to study the GREG estimators’ variance where in the first phase only the interested sample is examined and in the second phase only the nonresponse is contemplated. Under the two-phase framework, the GREG estimator and variance were studied in the presence of nonresponse (Särndal and Lundström, 2005). They also recommended an automated linearization method in finding the variance of the GREG estimator where the partial derivatives are not obligatory as in the Taylor series linearization (see, e.g. Estevao and Särndal, 2003; Särndal and Lundström, 2005; Särndal, 2007).

A GREG estimator for population total with nonresponse using the two-phase framework is (Särndal and Lundström, 2005)

{\hat{Y}}_{G R E G . S L} = \sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p_{i}} + {[X - \sum_{i \in s} \frac{r_{i} x_{i}}{π_{i} p_{i}}]}^{'} {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}}) = {\hat{Y}}_{r} + {[X - {\hat{X}}_{r}]}^{'} {\hat{β}}_{r},

where ${\hat{Y}}_{r} = \sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p_{i}},$ ${\hat{X}}_{r} = \sum_{i \in s} \frac{r_{i} x_{i}}{π_{i} p_{i}}$ , ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}}),$

The variance of ${\hat{Y}}_{G R E G . S L}$ is

V ({\hat{Y}}_{G R E G . S L}) = \sum_{i \in U} D_{i} e_{i}^{2} + \sum_{i \in U} \sum_{j \ {i} \in U} D_{i j} e_{i} + \sum_{i \in U} \frac{(1 - p_{i})}{p_{i}} e_{i}^{2},

where $e_{i} = (y_{i} - x_{i}^{'} β)$ ⁠, $β = {(\sum_{i \in U} q_{i} x_{i} x_{i}^{'})}^{- 1} (\sum_{i \in U} q_{i} x_{i} y_{i})$ ⁠, $D_{i} = \frac{1 - π_{i}}{π_{i}}$ ⁠, $D_{i j} = \frac{π_{i j} - π_{i} π_{j}}{π_{i} π_{j}}$ ⁠.

When $p_{i}$ is known for all $i \in s$ under the reverse framework, $V ({\hat{Y}}_{G R E G . S L})$ is

\hat{V} ({\hat{Y}}_{G R E G . S L}) = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {\hat{e}}_{i}^{2}}{p_{i}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} {\hat{e}}_{i}}{p_{i}} \frac{r_{j} {\hat{e}}_{j}}{p_{j}} + \sum_{i \in s} \frac{(1 - p_{i})}{π_{i} p_{i}^{2}} {\hat{e}}_{i}^{2},

where ${\hat{e}}_{i} = (y_{i} - x_{i}^{'} {\hat{β}}_{r})$ ⁠, ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}})$ ⁠, ${\hat{D}}_{i} = \frac{1 - π_{i}}{π_{i}^{2}}$ ⁠, ${\hat{D}}_{i j} = \frac{π_{i j} - π_{i} π_{j}}{π_{i j} π_{i} π_{j}}$ ⁠.

When $p_{i}$ is unknown for all $i \in s$ ⁠, let ${\hat{p}}_{i}$ be the estimator of $p_{i}$ ⁠, then the estimator of $V ({\hat{Y}}_{G R E G . S L})$ is

\hat{V} ({\hat{Y}}_{G R E G . S L}) = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {\hat{e}}_{i}^{2}}{{\hat{p}}_{i}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} {\hat{e}}_{i}}{{\hat{p}}_{i}} \frac{r_{j} {\hat{e}}_{j}}{{\hat{p}}_{j}} + \sum_{i \in s} \frac{(1 - {\hat{p}}_{i})}{π_{i} {\hat{p}}_{i}^{2}} {\hat{e}}_{i}^{2},

where ${\hat{e}}_{i} = (y_{i} - x_{i}^{'} {\hat{β}}_{r}), {\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} {\hat{p}}_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} {\hat{p}}_{i}}) .$

Apart from the two-phase framework, the reverse framework by Fay (1991) is also studied to investigate the GREG estimators variance with the order of the selected sample and nonresponse reversed in the phases of sampling. Again, the same issue arises in the variance estimator which is in a nonlinear form and as a result it needs to be transformed to a linear function. Under the reverse framework, a new GREG estimator has been suggested MCAR or the uniform nonresponse mechanism where the response probability is constant. Most researchers (Lawson and Ponkaew, 2019; Ponkaew and Lawson, 2023) considered it under this assumption due to simplicity. A new GREG estimator for nonresponse under UPWOR was developed based on Lawson’s (2017) concept, a nonlinear estimator for population total/mean and is an almost unbiased estimator with probability being proportional to size sampling consisting of replacement. The benefit of the Lawson estimator is that the response probability is not required in the estimation but is under the assumption that the probabilities of response are the same for all units and the sampling fraction can be omitted. Lawson’s (2017) population mean estimator is

{\hat{\bar{Y}}}_{r} = \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p_{i}}}{\sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}}} .

When $p_{i} = p$ for all units $i$ in $U$ ⁠, then

{\hat{\bar{Y}}}_{r} = \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p}}{\sum_{i \in s} \frac{r_{i}}{π_{i} p}} = \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i}}}{\sum_{i \in s} \frac{r_{i}}{π_{i}}} .

Additionally, the Lawson (2017) estimator for estimating the population total is

{\hat{Y}}_{r} = N {\hat{\bar{Y}}}_{r} = N \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i}}}{\sum_{i \in s} \frac{r_{i}}{π_{i}}} .

The associated variance estimator for ${\hat{Y}}_{r}$ is

V ({\hat{\bar{Y}}}_{r}) = \frac{1}{N^{2} p^{2}} [n \sum_{i \in U} \frac{1}{π_{i}^{2}} p {(y_{i} - \bar{Y})}^{2} P_{i} - \frac{1}{n} {(\sum_{i \in U} \frac{p}{π_{i}} (y_{i} - \bar{Y}) P_{i})}^{2}] .

The estimated variance of $V ({\hat{\bar{Y}}}_{r})$ is

\hat{V} ({\hat{\bar{Y}}}_{r}) = \frac{1}{{(\sum_{i \in s} \frac{r_{i}}{π_{i}})}^{2}} \frac{n}{n - 1} \sum_{i \in U} \frac{r_{i}}{π_{i}^{2}} {(y_{i} - {\hat{\bar{Y}}}_{r})}^{2} .

The associated variance estimator for the ${\hat{Y}}_{r}$ is

V ({\hat{Y}}_{r}) = \frac{1}{p^{2}} [n \sum_{i \in U} \frac{1}{π_{i}^{2}} p {(y_{i} - \bar{Y})}^{2} P_{i} - \frac{1}{n} {(\sum_{i \in U} \frac{p}{π_{i}} (y_{i} - \bar{Y}) P_{i})}^{2}],

(3.6)

and the estimated variance of $V ({\hat{Y}}_{r})$ is

\hat{V} ({\hat{Y}}_{r}) = \frac{N^{2}}{{(\sum_{i \in s} \frac{r_{i}}{π_{i}})}^{2}} \frac{n}{n - 1} \sum_{i \in U} \frac{r_{i}}{π_{i}^{2}} {(y_{i} - {\hat{\bar{Y}}}_{r})}^{2} .

Under the same assumptions where the nonresponse mechanism is MCAR, the sampling fraction is can be omitted under UPWOR, based on the Lawson (2017) estimator, a new GREG estimator has been suggested as follows (Lawson and Ponkaew, 2019).

{\hat{\bar{Y}}}_{G R E G . L P} = {\hat{\bar{Y}}}_{r} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r} .

where ${\hat{\bar{Y}}}_{r} = \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p}}{\sum_{i \in s} \frac{r_{i}}{π_{i} p}} = \frac{\sum_{i \in s} \frac{r_{i} y_{i}}{π_{i}}}{\sum_{i \in s} \frac{r_{i}}{π_{i}}}$ ⁠, ${\hat{\bar{X}}}_{r} = \frac{\sum_{i \in s} \frac{r_{i} x_{i}}{π_{i}}}{\sum_{i \in s} \frac{r_{i}}{π_{i}}}$ ⁠, $\bar{X} = \sum_{i \in U} x_{i} / N$ ⁠,

{\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p}) = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i}}) and q_{i} = 1 / σ_{i}^{2} .

(3.9)

When the population size $N$ is known, the population total GREG estimator is

{\hat{Y}}_{G R E G . L P} = N {\hat{\bar{Y}}}_{G R E G . L P} = N [{\hat{\bar{Y}}}_{r} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r}] .

They also assumed that ${\hat{β}}_{r} - β = O_{p} ({n_{r}}^{- \frac{1}{2}})$ and $r_{n} \to 0$ as $n \to \infty$ ⁠, where ${r_{n}}$ is a sequence consisting of positive real numbers. For the GREG estimators’ variance, they considered two situations; replace $\sum_{i \in s} \frac{r_{i}}{π_{i}}$ by $\sum_{i \in U} r_{i},$ then $V_{1} ({\hat{Y}}_{G R E G . L P}) \approx \frac{1}{p} \sum_{i \in U} \frac{(1 - π_{i})}{π_{i}} {(y_{i} - x_{i}^{'} β)}^{2} + \sum_{i \in U} \sum_{j \neq i \in U} D_{i j} (y_{i} - x_{i}^{'} β) (y_{j} - x_{j}^{'} β)$ and using the Taylor linearization approach, then

$V_{2} ({\hat{Y}}_{G R E G . L P}) \approx \frac{1}{p} \sum_{i \in U} \frac{(1 - π_{i})}{π_{i}} {(e_{i} - \bar{e})}^{2} + \sum_{i \in U} \sum_{j \neq i \in U} D_{i j} (e_{i} - \bar{e}) (e_{j} - \bar{e})$ ⁠. The estimated variances of these estimators are respectively,

{\hat{V}}_{1} ({\hat{Y}}_{G R E G . L P}) \approx {(\frac{N}{\sum_{i \in s} \frac{r_{i}}{π_{i}}})}^{2} [\sum_{i \in s} \frac{(1 - π_{i})}{π_{i}^{2}} r_{i} {(y_{i} - x_{i}^{'} {\hat{β}}_{r})}^{2} + \sum_{i \in s} \sum_{j \neq i \in s} {\overset{‿}{D}}_{i j} r_{i} (y_{i} - x_{i}^{'} {\hat{β}}_{r}) r_{j} (y_{j} - x_{j}^{'} {\hat{β}}_{r})] .

{\hat{V}}_{2} ({\hat{Y}}_{G R E G . L P}) \approx {(\frac{N}{\sum_{i \in s} \frac{r_{i}}{π_{i}}})}^{2} [\sum_{i \in s} \frac{(1 - π_{i})}{π_{i}^{2}} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2} + \sum_{i \in s} \sum_{j \neq i \in s} {\overset{‿}{D}}_{i j} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2} r_{j} {({\hat{e}}_{j} - {\hat{\bar{e}}}_{r})}^{2}] .

where ${\hat{e}}_{i} = (y_{i} - x_{i}^{'} {\hat{β}}_{r})$ ⁠.

They also studied in theory that ${\hat{V}}_{1} ({\hat{Y}}_{G R E G . L P})$ and ${\hat{V}}_{2} ({\hat{Y}}_{G R E G . L P})$ are almost unbiased estimators.

Later, a new GREG estimator derived from the ratio method has been proposed based on the work of Lawson and Ponkaew (2019) using the same assumptions where the nonresponse mechanism is MCAR and they stretched it to cover the situation where the sampling fraction is also large and therefore it cannot be neglected. They also developed to cases where the response probabilities are known and unknown assisting with the benefit of the known auxiliary variable with nonresponse. Usually under the reverse framework the second part of the variance component is omitted but they considered the case that the variance component in this part cannot be ignored (Ponkaew and Lawson, 2023). Therefore, $V_{2} = V_{R} E_{S} ({\hat{Y}}_{G R E G . L P} | R)$ ⁠. Again, they considered the automated linearization approach in the transformation of the ${\hat{Y}}_{G R E G . L P}$ into a less complex form. They assumed three assumptions in their study; the response mechanism is MCAR, ${\hat{β}}_{r} - β = O_{p} (n_{r}^{- \frac{1}{2}})$ ⁠, and $V (\sum_{i \in s} \frac{b_{i}}{π_{i}}) \to 0$ as $n \to \infty$ where $b_{i} = w_{i}$ or $r_{i}$ ⁠.

Their GREG estimators for population mean and total are respectively,

{\hat{\bar{Y}}}_{G R E G . R}^{*} = {\hat{\bar{Y}}}_{R}^{*} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r} .

{\hat{Y}}_{G R E G . R}^{*} = N {\hat{\bar{Y}}}_{G R E G . R}^{*} = N [{\hat{\bar{Y}}}_{R}^{*} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r}] .

where ${\hat{\bar{Y}}}_{R}^{*} = \frac{{\hat{\bar{Y}}}_{r}^{(1)}}{{\hat{\bar{w}}}_{H T}} \bar{W}$ ⁠, ${\hat{\bar{X}}}_{r} = \sum_{i \in s} \frac{r_{i} x_{i}}{π_{i}} / \sum_{i \in s} \frac{r_{i}}{π_{i}}$ , ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i}}{π_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i}}),$

\bar{X} = \frac{1}{N} \sum_{i \in U} x_{i} .

Under the reverse framework the $V ({\hat{\bar{Y}}}_{G R E G . R}^{*})$ can be gained by,

V ({\hat{\bar{Y}}}_{G R E G . R}^{*}) = E_{R} V_{S} ({\hat{\bar{Y}}}_{G R E G . R}^{*} | R) + V_{R} E_{S} ({\hat{\bar{Y}}}_{G R E G . R}^{*} | R) = V_{1} + V_{2} .

where $V_{1} = E_{R} V_{S} ({\hat{\bar{Y}}}_{G R E G . R}^{*} | R)$ ⁠, $V_{2} = V_{R} E_{S} ({\hat{\bar{Y}}}_{G R E G . R}^{*} | R) .$

The variance of Ponkaew and Lawson (2023) are

(1)
$V_{1} ({\hat{Y}}_{G R E G . R}^{*})$ is

V_{1} ({\hat{Y}}_{G R E G . R}^{*}) \approx \sum_{i \in U} (D_{i} {(N p e_{i})}^{2} + (1 - p) p^{- 1} {(e_{i} + {\bar{X}}^{'} β)}^{2}) + {(N p)}^{2} \sum_{i \in U} \sum_{j \ {i} \in U} D_{i j} e_{i} e_{j} .

(2)
$V_{2} ({\hat{Y}}_{G R E G . R}^{*})$ is

V_{2} ({\hat{Y}}_{G R E G . R}^{*}) \approx \sum_{i \in U} (D_{i} {(e_{i} - o_{i})}^{2} + (1 - p) p^{- 1} {(e_{i} + {\bar{X}}^{'} β)}^{2}) + \sum_{i \in U} \sum_{j \ {i} \in U} D_{i j} (e_{i} - o_{i}) (e_{j} - o_{i}) .

(1)
The estimators of $V_{1} ({\hat{Y}}_{G R E G . R}^{*})$ are

{\hat{V}}_{1} ({\hat{Y}}_{G R E G . R}^{*}) = {\begin{cases} {\hat{E}}_{1 p} + \frac{(1 - p)}{p^{2}} \sum_{i \in s} \begin{array}{c} \frac{r_{i}}{π_{i}} {({\hat{e}}_{i} + \frac{1}{N p} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}})}^{2}, w h e n p i s k n o w n \end{array} \\ {\hat{E}}_{1 \hat{p}} + \frac{(1 - \hat{p})}{{\hat{p}}^{2}} \sum_{i \in s} \frac{r_{i}}{π_{i}} {({\hat{e}}_{i} + \frac{1}{N \hat{p}} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}})}^{2}, w h e n p i s u n k n o w n \end{cases} .

where $\hat{p} = \sum_{i \in s} \frac{r_{i}}{π_{i}} {(\sum_{i \in s} \frac{1}{π_{i}})}^{- 1}$ ⁠, ${\hat{Z}}_{1 i p} = r_{i} (\frac{{\hat{N}}_{r} y_{i}}{N p} - x_{i}^{'} {\hat{β}}_{r})$ ⁠, ${\hat{Z}}_{1 i \hat{p}} = r_{i} (\frac{{\hat{N}}_{r} y_{i}}{N \hat{p}} - x_{i}^{'} {\hat{β}}_{r}),$

{\hat{e}}_{i} = y_{i} - x_{i}^{'} {\hat{β}}_{r}, {\hat{N}}_{r} = \sum_{i \in s} \frac{r_{i}}{π_{i}}, {\hat{E}}_{1 p} = N^{2} [\sum_{i \in s} {\hat{D}}_{i} {\hat{Z}}_{1 i p}^{2} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} {\hat{Z}}_{1 i p} {\hat{Z}}_{1 j p}] .

{\hat{E}}_{1 \hat{p}} = N^{2} [\sum_{i \in s} {\hat{D}}_{i} {\hat{Z}}_{1 i \hat{p}}^{2} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} {\hat{Z}}_{1 i \hat{p}} {\hat{Z}}_{1 j \hat{p}}] .

(2)
The estimators of $V_{2} ({\hat{Y}}_{G R E G . R}^{*})$ are

{\hat{V}}_{2} ({\hat{Y}}_{G R E G . R}^{*}) = {\begin{cases} {\hat{E}}_{1 p} + \frac{(1 - p)}{p^{2}} \sum_{i \in s} \begin{array}{c} \frac{r_{i}}{π_{i}} {({\hat{e}}_{i} + \frac{1}{N} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}})}^{2}, w h e n p i s k n o w n \end{array} \\ {\hat{E}}_{2 \hat{p}} + \frac{(1 - \hat{p})}{{\hat{p}}^{2}} \sum_{i \in s} \frac{r_{i}}{π_{i}} {({\hat{e}}_{i} + \frac{1}{N} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}})}^{2}, w h e n p i s u n k n o w n \end{cases} .

where $\hat{p} = \sum_{i \in s} \frac{r_{i}}{π_{i}} {(\sum_{i \in s} \frac{1}{π_{i}})}^{- 1}$ ⁠, ${\hat{Z}}_{2 i p} = \frac{1}{N} (\frac{r_{i} y_{i}}{p} - \frac{\frac{1}{N} \sum_{i \in s} \frac{r_{i} y_{i}}{π_{i}}}{\bar{W}} w_{i}) - \frac{r_{i}}{{\hat{N}}_{r}} (x_{i}^{'} {\hat{β}}_{r} - \frac{1}{{\hat{N}}_{r}} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}}),$

{\hat{Z}}_{2 i \hat{p}} = \frac{1}{N} (\frac{r_{i} y_{i}}{\hat{p}} - \frac{\frac{1}{N} \sum_{i \in s} \frac{r_{i} y_{i}}{π_{i}}}{\bar{W}} w_{i}) - \frac{r_{i}}{{\hat{N}}_{r}} (x_{i}^{'} {\hat{β}}_{r} - \frac{1}{{\hat{N}}_{r}} \sum_{i \in s} \frac{r_{i} x_{i}^{'} {\hat{β}}_{r}}{π_{i}}), {\hat{E}}_{2 p} = N^{2} (\sum_{i \in s} {\hat{D}}_{i} {\hat{Z}}_{2 i p}^{2} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} {\hat{Z}}_{2 i p} {\hat{Z}}_{2 j p}), {\hat{E}}_{2 \hat{p}} = N^{2} (\sum_{i \in s} {\hat{D}}_{i} {\hat{Z}}_{2 i \hat{p}}^{2} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} {\hat{Z}}_{2 i \hat{p}} {\hat{Z}}_{2 j \hat{p}}) .

Unfortunately, the works we mentioned above are considered under a strong assumption when the nonresponse mechanism is MCAR where the response probability is constant only. The novel GREG estimators for population mean and total under a more flexible situation where nonresponse occurs under missing at random or MAR, which is a more practical situation, were proposed based on the previous works when the auxiliary variable is known to improve the efficiency of the estimators (Lawson and Siripanich (2022). In their study, they assumed that, $C_{1}$ ⁠: $r_{n} \to 0$ as $n \to \infty$ ⁠, where ${r_{n}}$ is a sequence of positive real numbers and $C_{2} : {\hat{β}}_{r} - β = O_{p} ({n_{r}}^{- \frac{1}{2}})$ and $V (\sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}}) \to 0$ as $n \to \infty$ and the sampling fraction is negligible and non-negligible.

The Lawson and Siripanich (2022) estimator are

{\hat{\bar{Y}}}_{G R E G . L S}^{*} = \frac{\sum_{i \in s} r_{i} y_{i} / π_{i} p_{i}}{\sum_{i \in s} r_{i} / π_{i} p_{i}} + {[\bar{X} - \frac{\sum_{i \in s} r_{i} x_{i} / π_{i} p_{i}}{\sum_{i \in s} r_{i} / π_{i} p_{i}}]}^{'} {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}}) = {\hat{\bar{Y}}}_{r} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r},

where ${\hat{\bar{Y}}}_{r} = \sum_{i \in s} \frac{r_{i} y_{i}}{π_{i} p_{i}} / \sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}}$ ⁠, ${\hat{\bar{X}}}_{r} = \sum_{i \in s} \frac{r_{i} x_{i}}{π_{i} p_{i}} / \sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}}$ ⁠, $\bar{X} = \sum_{i \in U} x_{i} / N,$

{\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}}) .

{\hat{Y}}_{G R E G . L S}^{*} = N {\hat{\bar{Y}}}_{G R E G . L S}^{*} = N [{\hat{\bar{Y}}}_{r} + {(\bar{X} - {\hat{\bar{X}}}_{r})}^{'} {\hat{β}}_{r}] .

In variance estimation due to the nonlinear estimator, they suggested two estimation techniques called the modified automated linearization approaches to deal with this issue. They suggested to replace $\sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}}$ by $\sum_{i \in U} \frac{r_{i}}{p_{i}}$ in their estimators and used the Taylor linearization approach to transform nonlinear estimator to linear form.

Their variance estimators are

V_{1} ({\hat{Y}}_{G R E G . L S}^{*}) \approx \sum_{i \in U} \frac{1}{p_{i}} (D_{i} e_{i}^{2} + (1 - p_{i}) {(e_{i} - \bar{e})}^{2}) + \sum_{i \in U} \sum_{j \ {i} \in U} D_{i j} e_{i} e_{j} .

V_{2} ({\hat{Y}}_{G R E G . L S}^{*}) \approx \sum_{i \in U} \frac{1}{π_{i} p_{i}} {(e_{i} - \bar{e})}^{2} + \sum_{i \in U} \sum_{j \ {i} \in U} D_{i j} e_{i} e_{j} .

The estimators of $V_{1} ({\hat{Y}}_{G R E G . L S}^{*})$ are

{\hat{V}}_{1} ({\hat{Y}}_{G R E G . L S}^{*}) = {\begin{cases} \frac{N^{2}}{{(\sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}})}^{2}} {\hat{Ε}}_{1 p_{i}} + \sum_{i \in s} \frac{(1 - p_{i})}{π_{i} p_{i}^{2}} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}, when p_{i} is known for all i \in s \\ \frac{N^{2}}{{(\sum_{i \in s} \frac{r_{i}}{π_{i} {\hat{p}}_{i}})}^{2}} {\hat{Ε}}_{1 {\hat{p}}_{i}} + \sum_{i \in s} \frac{(1 - {\hat{p}}_{i})}{π_{i} {\hat{p}}_{i}^{2}} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}, when p_{i} is unknown for all i \in s \end{cases},

where ${\hat{Ε}}_{1 p_{i}} = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {\hat{e}}_{i}^{2}}{p_{i}^{2}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} {\hat{e}}_{i}}{p_{i}} \frac{r_{j} {\hat{e}}_{j}}{p_{j}}$ ⁠, ${\hat{Ε}}_{1 {\hat{p}}_{i}} = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {\hat{e}}_{i}^{2}}{{\hat{p}}_{i}^{2}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} {\hat{e}}_{i}}{{\hat{p}}_{i}} \frac{r_{j} {\hat{e}}_{j}}{{\hat{p}}_{j}}$ ⁠,

${\hat{p}}_{i}$ is the estimator of $p_{i}$ for all $i \in s$ ⁠, ${\hat{e}}_{i} = (y_{i} - x_{i}^{'} {\hat{β}}_{r})$ ⁠, ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} p_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} p_{i}})$ if $p_{i}$ is known for all $i \in s$ otherwise ${\hat{β}}_{r} = {(\sum_{i \in s} \frac{r_{i} q_{i} x_{i} x_{i}^{'}}{π_{i} {\hat{p}}_{i}})}^{- 1} (\sum_{i \in s} \frac{r_{i} q_{i} x_{i} y_{i}}{π_{i} {\hat{p}}_{i}})$ ⁠, $\hat{\bar{e}} = N^{- 1} \sum_{i \in s} \frac{r_{i} {\hat{e}}_{i}}{p_{i}}$ if $p_{i}$ is known for all $i \in s$ otherwise $\hat{\bar{e}} = N^{- 1} \sum_{i \in s} \frac{r_{i} {\hat{e}}_{i}}{{\hat{p}}_{i}}$ ⁠.

The estimators of $V_{2} ({\hat{Y}}_{G R E G . L S}^{*})$ are

{\hat{V}}_{2} ({\hat{Y}}_{G R E G . L S}^{*}) = {\begin{cases} \frac{N^{2}}{{(\sum_{i \in s} \frac{r_{i}}{π_{i} p_{i}})}^{2}} {\hat{Ε}}_{2 p_{i}} + \sum_{i \in s} \frac{(1 - p_{i})}{π_{i} p_{i}^{2}} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}, when p_{i} is known for all i \in s \\ \frac{N^{2}}{{(\sum_{i \in s} \frac{r_{i}}{π_{i} {\hat{p}}_{i}})}^{2}} {\hat{Ε}}_{2 {\hat{p}}_{i}} + \sum_{i \in s} \frac{(1 - {\hat{p}}_{i})}{π_{i} {\hat{p}}_{i}^{2}} r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}, when p_{i} is unknown for all i \in s \end{cases},

where ${\hat{Ε}}_{2 p_{i}} = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}}{p_{i}^{2}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} ({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}{p_{i}} \frac{r_{j} ({\hat{e}}_{j} - {\hat{\bar{e}}}_{r})}{p_{j}},$

{\hat{Ε}}_{2 {\hat{p}}_{i}} = \sum_{i \in s} {\hat{D}}_{i} \frac{r_{i} {({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}^{2}}{{\hat{p}}_{i}^{2}} + \sum_{i \in s} \sum_{j \ {i} \in s} {\hat{D}}_{i j} \frac{r_{i} ({\hat{e}}_{i} - {\hat{\bar{e}}}_{r})}{{\hat{p}}_{i}} \frac{r_{j} ({\hat{e}}_{j} - {\hat{\bar{e}}}_{r})}{{\hat{p}}_{j}} .

These GREG estimators can be calculated using any statistical packages, e.g. R program which was used in the reviewed studies. Due to these new GREG estimators are new estimators under the presence of missing data under unequal probability sampling and so unfortunately there is no function in R that can be used straight away. Although they are not that complex to use in the estimation.

5. Examples of application to financial and economic data

The GREG estimator was applied to estimate the total monthly household income from five communities in Bang Sue district, Bangkok, Thailand (Lawson and Siripanich, 2022). The results were based on a sample of size 195 households that was drawn using UPWOR with Midzuno's (1952) scheme out of 1,181 households which consists of 30% nonresponse in the monthly income. The monthly expenditure, age and work in hours per week were considered as the auxiliary variables to assist in estimating the total income and the variance. The logistic regression model was used to find the unknown response probability using the age variable.

Their results showed that their suggested GREG estimator gave the estimated total income for all households equal to 36,068,543 baht and smaller variances in regards to the Särndal and Lundström (2005) estimator.

Data on total monthly income in households is the key to understanding a core part of a country’s economy. Information on the financial status of citizens contributes to money flow in the economy and provides invaluable insights for strategizing policies to overcome economic inequalities. Estimation of these statistics allow policymakers to identify income disparities within the nation, integrate measures to assert equality and stabilize the economy, leading to the amelioration of quality of life on a myriad of aspects.

Another example was found in studying Thailand’s agriculture which is one of the sources of income that support Thailand’s economy (Ponkaew and Lawson, 2023). The Thai maize of Thailand in 2019 from the Office of the Agricultural Economics was studied based on a sample size of 25 provinces being selected using the UPWOR method by Midzuno (1952) out of 63 provinces. The data contained a 30% nonresponse rate. The total yield of maize estimates for all provinces in Thailand in 2019 was found using their suggested GREG estimator and cultivated area and the harvest area in 2019 were considered as the auxiliary variables along with the cultivated area in 2018 as the size variable. The estimates of total yield of maize for all provinces in Thailand was 525,124 with the smallest variance with respect to the existing estimator.

Statistical estimation of agricultural yield is imperative for agricultural countries such as Thailand and a large part of Asia. These nations’ histories have all consisted of agriculture as their geography and climate incline toward successful growing of crops. In prevailing times, export plays an inherent role as one of the major income sources, and an opulence of land is recruited for farming. These farmers are often short on resources and must go through many lengths to save on time and money, to ensure that their yields bring in profit and not losses. The prediction of crop yields can help policymakers working with farmers to anticipate food shortages leading to losses, and potential risks of farming strategies. As many countries are dependent on agriculture, estimation of accurate yields is an essential component of their economies.

6. Conclusions and discussions

We can see that the GREG estimators can be useful to estimate financial and economic data in Thailand and also other countries. Most of these data contain nonresponse which could occur usually during the collection process and as a result it needs to be take care of to gain more accuracy. Many reviewed works based on the GREG estimators under missing data studied under the reverse framework could benefit in the estimation process where we can apply them to real data, e.g. household income, revenue for business, and inflation and unemployment rate.

The GREG estimators are studied under the MCAR and MAR nonresponse mechanisms where both the sampling fractions are small and therefore it can be negligible or either large and cannot be omitted. These GREG estimators are also almost unbiased estimators with reduced variance regarding the existing estimators. The GREG estimators’ variance estimators are useful to help in estimating the boundary of the variable of interest to see the lower bound and upper bound for these possible values based on survey sampling. Smaller variance from the GREG estimators can benefit in creating more accuracy for the confidence interval for financial and economic data.

The GREG estimators can assist in estimating these data and therefore knowing these data can be helpful in planning in order to define policies of countries to increase the value of business and finance in the future. The integral concept of economic stability can only be enforced by the support of accurate statistical estimation of financial and economic data through policies and efficient decisions. Flexible statistics can monitor and predict situations such as economic trends, employment figures, and inflation rates, which benefit policymakers, economists, and investors. Most crucial being introducing suitable policies to tackle the nation’s financial issues and fill in economic niches, for the well-being of the population through sustainable economic growth.

The GREG estimators can be applied to further studies in any survey designs other than UPWOR for instance, stratified cluster sampling, cluster samplings where nonresponse happens in the study variable and can assist in any application to real data.

Many thanks to Prof. Sa-Aat Niwitpong and Prof. Hung Nguyen for recommending the Asian Journal of Economics and Banking.

References

Berger

,

Y.G.

,

Tirari

,

E.H.M.

and

Till´

,

Y.

(

2003

), “

Towards optimal regression estimation in sample surveys

”,

Australian and New Zealand Journal of Statistics

, Vol.

45

No.

3

, pp.

319

-

329

, doi:

https://doi.org/10.1111/1467-842x.00286

.

Google Scholar

Crossref

Bethlehem

,

J.G.

and

Keller

,

W.J.

(

1987

), “

Linear weighting of sample survey data

”,

Journal of Official Statistics

, Vol.

3

No.

2

, pp.

141

-

153

.

Google Scholar

Brewer

,

K.R.W.

(

2002

),

Combined Survey Sampling Inference: Weighing Basu's Elephants

,

Arnold

,

London

.

Google Scholar

Brewer

,

K.R.W.

and

Donadio

,

M.E.

(

2003

), “

The high entropy variance of the Horvitz-Thompson estimator

”,

Survey Methodology

, Vol.

29

No.

2

, pp.

189

-

196

.

Google Scholar

Deville

,

J.C.

and

Särndal

,

C.E.

(

1994

), “

Variance estimation for the regression imputed Horvitz Thompson estimator

”,

Journal of Official Statistics

, Vol.

10

No.

4

, pp.

381

-

394

.

Google Scholar

Estevao

,

V.M.

and

Särndal

,

C.E.

(

2003

), “

A new perspective on calibration estimators

”,

JSM- Section on Survey Research Methods

, pp.

1346

-

1356

.

Google Scholar

Fay

,

R.E.

(

1991

), “

A design-based perspective on missing data variance

”,

Proceedings of the 1991 Annual Research Conference

,

US Bureau of the Census

, pp.

429

-

440

.

Google Scholar

Hajek

,

J.

(

1964

), “

Asymptotic theory of rejective sampling with varying probabilities from a finite population

”,

Annals of Mathematical Statistics

, Vol.

35

No.

4

, pp.

1491

-

1523

, doi:

https://doi.org/10.1214/aoms/1177700375

.

Google Scholar

Crossref

Hajek

,

J.

(

1981

),

Sampling from Finite Population

,

Marcel Dekker

,

New York

.

Google Scholar

Hansen

,

M.H.

and

Hurwitz

,

W.N.

(

1946

), “

The problem of nonresponse in sample surveys

”,

Journal of the American Statistical Association

, Vol.

41

No.

236

, pp.

517

-

529

, doi:

https://doi.org/10.1080/01621459.1946.10501894

.

Google Scholar

Crossref

PubMed

Hartley

,

H.O.

and

Rao

,

J.N.K.

(

1962

), “

Sampling with unequal probability and without replacement

”,

The Annals of Mathematical Statistics

, Vol.

33

No.

2

, pp.

350

-

374

, doi:

https://doi.org/10.1214/aoms/1177704564

.

Google Scholar

Crossref

Haziza

,

D.

(

2010

), “

Resampling methods for variance estimation in the presence of missing survey data

”,

Proceedings of the Annual Conference of the Italian Statistical Society

.

Google Scholar

Haziza

,

D.

and

Rao

,

J.N.K.

(

2006

), “

A nonresponse model approach to inference under imputation for missing survey data

”,

Survey Methodology

, Vol.

32

No.

1

, pp.

53

-

64

.

Google Scholar

Horvitz

,

D.F.

and

Thompson

,

D.J.

(

1952

), “

A generalization of sampling without replacement from a finite universe

”,

Journal of the American Statistical Association

, Vol.

47

No.

260

, pp.

663

-

685

, doi:

https://doi.org/10.1080/01621459.1952.10483446

.

Google Scholar

Crossref

Lawson

,

N.

(

2017

), “

Variance estimation in the presence of nonresponse under probability proportional to size sampling

”,

Proceedings of the 6th Annual International Conference on Computational Mathematics, Computational Geometry and Statistics (CMCGS 2017)

,

Singapore

, doi:

https://doi.org/10.5176/2251-1911_cmcgs17.32

.

Google Scholar

Crossref

Lawson

,

N.

and

Ponkaew

,

C.

(

2019

), “

New generalized regression estimator in the presence of nonresponse under unequal probability sampling

”,

Communications in Statistics -Theory and Methods

, Vol.

48

No.

10

, pp.

2483

-

2498

, doi:

https://doi.org/10.1080/03610926.2018.1465091

.

Google Scholar

Crossref

Lawson

,

N.

and

Siripanich

,

P.

(

2022

), “

A new generalized regression estimator and variance estimation for unequal probability sampling without replacement for missing data

”,

Communications in Statistics -Theory and Methods

, Vol.

51

No.

18

, pp.

6296

-

6318

, doi:

https://doi.org/10.1080/03610926.2020.1860224

.

Google Scholar

Crossref

Midzuno

,

H.

(

1952

), “

On the sampling system with probability proportional to sum of sizes

”,

Annals of the Institute of Statistical Mathematics

, Vol.

55

No.

3

, pp.

99

-

107

.

Google Scholar

Montanari

,

G.

(

1987

), “

Post sampling efficient qr-prediction in large sample survey

”,

International Statistics

, Vol.

55

No.

2

, pp.

191

-

202

, doi:

https://doi.org/10.2307/1403195

.

Google Scholar

Crossref

Ponkaew

,

C.

and

Lawson

,

L.

(

2023

), “

New generalized regression estimators using a ratio method and its variance estimation for unequal probability sampling without replacement in the presence of nonresponse

”,

Current Applied Science and Technology

, Vol.

23

No.

2

, doi:

https://doi.org/10.55003/cast.2022.02.23.007

.

Google Scholar

Rao

,

J.N.K.

(

1990

), “

Variance estimation under imputation for missing data

”,

Technical report, Statistics Canada, Ottawa

, pp.

599

-

608

.

Google Scholar

Särndal

,

C.E.

(

1992

), “

Method for estimating the precision of survey estimateswhen imputation has been used

”,

Survey Methodology

, Vol.

18

, pp.

241

-

252

.

Google Scholar

Särndal

,

C.E.

(

2007

), “

The calibration approach in survey theory and practice

”,

Survey Methodology

, Vol.

33

No.

2

, pp.

99

-

119

.

Google Scholar

Särndal

,

C.E.

and

Lundström

,

S.

(

2005

),

Estimation in Surveys with Nonresponse

,

John Wiley & Sons

,

New York

.

Google Scholar

Crossref

Särndal

,

C.E.

,

Swensson

,

B.

and

Wretman

,

J.

(

1992

),

Model Assisted Survey Sampling

,

Springer- Verlag

,

New York

.

Google Scholar

Crossref

Sen

,

A.R.

(

1953

), “

On the estimate of the variance in sampling with varying probabilities

”,

Journal of the Indian Society of Agricultural Statistics

, Vol.

5

, pp.

119

-

127

.

Google Scholar

Shao

,

J.

and

Steel

,

P.

(

1999

), “

Variance estimation for survey data with composite imputation and nonnegligible sampling fractions

”,

Journal of the American Statistical Association

, Vol.

94

No.

445

, pp.

254

-

265

, doi:

https://doi.org/10.2307/2669700

.

Google Scholar

Crossref

Yates

,

F.

and

Grundy

,

P.M.

(

1953

), “

Selection without replacement from within strata with probability proportional to size

”,

Journal of the Royal Statistical Society: Series B

, Vol.

15

No.

2

, pp.

235

-

261

, doi:

https://doi.org/10.1111/j.2517-6161.1953.tb00140.x

.

Google Scholar

Crossref

2024

Nuanpan Lawson

Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Review of how the generalized regression estimators contribute to estimating the financial and economic data with missing observations under unequal probability sampling

1. Introduction

2. Literature review

3. Basic setup

4. Generalized regression estimators with missing data

5. Examples of application to financial and economic data

6. Conclusions and discussions

References

New and popular articles

Email Alerts

Cited By

Review of how the generalized regression estimators contribute to estimating the financial and economic data with missing observations under unequal probability sampling

1. Introduction

2. Literature review

3. Basic setup

4. Generalized regression estimators with missing data

5. Examples of application to financial and economic data

6. Conclusions and discussions

References

New and popular articles

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable