Skip to Main Content

We consider the kernel-based classifier proposed by Younso (2017). This nonparametric classifier allows for the classification of missing spatially dependent data. The weak consistency of the classifier has been studied by Younso (2017). The purpose of this paper is to establish strong consistency of this classifier under mild conditions. The classifier is discussed in a multi-class case. The results are illustrated with simulation studies and real applications.

In many applications one needs to classify spatial data that have been collected incompletely. The classification of incomplete-data problem, in which certain features are missing from particular feature vectors, exists in a wide range of fields, including image labeling, computer vision and others. For example, in the remote sensing technology, because of the internal malfunction of satellite sensors and poor atmospheric conditions such as thick cloud, the acquired remote sensing images often suffer from missing information at certain pixels and one wants to classify these pixels using the information in the nearest identified pixels. Many existing classification algorithms assume either certain parametric distributions for the data or certain forms of separating curves or surfaces. These parametric classifiers are suboptimal and of limited use in practical applications where little information about the underlying distributions is available a priori. In comparison, nonparametric classifiers are usually more flexible in accommodating different data structures, and are hence more desirable. [21] has proposed a nonparametric approach allowing to include contextual features for classifying missing spatial data and has investigated the consistency of the classifier under mild conditions. In nonparametric spatial estimation, the existing works concern mainly the estimation of a probability density and regression functions, see the key references: [2–4,15] and [14]. More recently, [5] has proposed a kernel spatial density estimator allowing for the analysis of spatial clustering. In this work, we establish strong consistency of the classifier proposed by [21] and then, we check its performance with simulation studies and applications. We consider a strictly stationary random field {(Xi,Yi)}iN defined on some probability space (Ω,,) and taking values in d×{0,,M}, for some integer M1. In the problem of classification, for each iN, Xi is a vector of features and Yi is the label (class) of Xi. A point i=(i1,,iN)N will be referred to as a site. For n=(n1,,nN)()N, we define the rectangular region In by In={iN:1iknk,k=1,,N}. We will write n if mink=1,,Nnk. Define n^=n1××nN=card(In) and assume that the random field is observed on a subset SnIn with InSn is a bounded set for n^ large enough. When processing a particular site, its features are not used at all, but only the features of its neighbors will be considered. In other words, we wish to predict the label Yj of a new site j based only on observations in a vicinity, say νjSn, where the set νj is not containing j. Let νj=j+ν, where νN is a fixed bounded set of sites not containing 0 with card(ν)=l (l is also the cardinal of each νj). We assume that X(j)={Xi:iνj} is a random vector taking values in d˜ with d˜=ld, and that the components of X(j) are ordered according to an arbitrary order on indices, for example the lexicographic order. The pair (X(j),Yj) may be completely described by μ, the probability measure for X(j), and η(x), the regression of Yj on X(j)=x. Assume that for each iN, (X(i),Yi) has the same distribution as the pair (X(1),Y1). We will create a classifier g:d˜{0,,M} mapping X(j) into the predicted label of Xj. The error rate, or risk, of a rule g is L(g)={g(X(j))Yj}. This is minimized by the rule

(1.1)

whose error rate L=L(g) is called the Bayes-optimal risk and g(x) is called the Bayes rule. Clearly, g(x) predicts the label Yj of the site j using only x, the value of X(j), while the features vector Xj does not affect the classification procedure at all. This means that g(x) well work event if Xj is completely missing. Unfortunately, we cannot use (1.1) directly because it depends on the distribution of (X(j),Yj) which is generally unknown. So, we take Jn={iSn:νiSn} and we use the training data Dn={(Xi,Yi):iJn} to construct a classifier gn(x). We consider the classifier gn(x) obtained by extending the classifier of [21] to the multi-class case as follows:

(1.2)

where 1A denotes the indicator of the set A, the kernel K:d˜+ is a density function on d˜, and bn is a sequence of bandwidths tending to zero as n tends to infinity. In one hand, the sum in (1.2) is taken over Jn instead of Sn just to ensure that X(i) always exists and that the sums make sense. On the other hand, for each new site jSn, the classifier gn(x) predicts the missing label Yj independently of its features vector Xj which does not belong neither to the training sample Dn nor to the components set of X(j). Consequently, gn(x) may classify j even if its own features vector Xj is completely missing and that makes our method exhibit good performance in comparison with the classical spatial Markovian model. [6] proposes a nonparametric approach to extend the result of [2] to the non-Markovian case by using two kernels in the estimator in order to control both the distance between observations and that between spatial locations without using a specific vicinity for the non-observed site. This latter approach may be developed to classify spatial data but it does not work when one wants to classify sites with missing or incomplete features. Let Ln={gn(X(j))Yj|Dn} be the error probability of gn(x). Generally, we cannot hope to design a classifier that achieve the Bayes error probability L but it is possible that the limit behavior of Ln compares favorably to L. This idea is encapsulated in the notion of consistency.

Definition 1.1.

The classifier gn(x) is called weakly consistent if

and strongly consistent if
The classifier is called universally (weakly or strongly) consistent if it is (weakly or strongly) consistent for all distribution of (X1,Y1).
Remark 1.1.

Since Ln is bounded, the weak consistency of Ln is equivalent to the convergence of Ln towards L in probability which means that strong consistency implies the weak consistency.

In this paper, we investigate the strong consistency of gn(x) under some mild mixing conditions.

Let (Ω,,) be a probability space and let A and B be two sub σ-fields of . The α-mixing coefficient between A and B is defined by

and the β-mixing coefficient is defined by

Let (Zi)iN be a random field on (Ω,,) and taking values in some space (Ω,).

Definition 2.1.

The random field (Zi)iN is called strongly mixing if there exists χ:+ with χ(t)0 as t, and for any E,EN with finite cardinals,

where dist(E,E) denotes the Euclidean distance between E and E.

The α-mixing condition is one of the most popular mixing conditions. This condition is satisfied by many spatial models. Examples can be found in [17,19] and [11].

Definition 2.2.

The random field (Zi)iN is called β-mixing if there exists ϕ:+ with ϕ(t)0 as t, and for any E,EN with finite cardinals,

Linear processes or more generally Markov chains may be β-mixing (see [9]). Similar mixing coefficient is used by [2] to establish some asymptotic properties of the kernel regression estimator in the spatial case. The two mixing coefficients α and β are related by the inequality 2αβ (see [18]). It means that any β-mixing random field is a strongly mixing one.Now, we need some regularity assumptions.
Assumption 1.

K is a regular kernel, that is, there exist δ>0 and c>0 such that c1B(0,δ)K(x)for allxd˜andd˜Supuv+B(0,δ)K(u)dv<, where B(x,δ) is the closed ball of radius δ>0 and center at x.

Assumption 2.

For each i, X(i) has a density f with respect to Lebesgue measure and for each ij with νiνj=φ, (X(i),X(j)) has a density fi,j such that supu,vd˜|fi,j(u,v)f(u)f(v)|C, for some C>0.

Assumption 3.

The random field {(Xi,Yi)}iN is β-mixing and there exists θ>0 such that ϕ(t)=O(tθ)for allt+.

Assumption 1 is used by [8] and [7] in the i.i.d. case. It may be satisfied if K(x)=ξ(x) where ξ is a non-negative and decreasing function on [0,+] and . is the Euclidean norm. Hence, the Gaussian kernel is regular. Assumption 2, used by [21] to prove the weak consistency, is similar to that used by [3]. It is satisfied for example if f and fi,j are uniformly bounded. Assumption 3 means that the random field is arithmetically β-mixing which implies that it is also strongly mixing with α(B(E),B(E))ϕ(dist(E,E)) since 2αβ.

This section is a collection of technical lemmas which will be used to prove the strong consistency result stated in Theorem 4.1. Let .r denote the Lr-norm for any real r1. The following lemma is a direct consequence of the covariance inequality of Ibragimov [12] and the inequality 2αβ.

Lemma 3.1.

Ifr,sandtare strictly positive reals such thatr1+s1+t1=1andZ1andZ2are two-valued random variables such thatZ1s<andZ2t<, then

whereσ(Zi)is theσ-field generated byZifori=1,2.

For any sub σ-fields A and B of , we denote by AB the σ-field generated by AB. The following coupling lemma of Berbee [1] will be needed to establish the asymptotic results.

Lemma 3.2.

LetZbe a random variable on (Ω,,)with values in some Polish spaceΩandMa subσ-field of. Assume that there exists a random variableUuniformly distributed over[0,1], independent ofσ(Z)M. Then, there exists a random variableZ˜measurable with respect toσ(U)σ(Z)M, distributed asZand independent ofM, such that

Remark 3.1.

We recall that a Polish space Ω is a topological space which is separable and completely metrizable (see [13]) and that most of the familiar objects of study in analysis involve Polish spaces. For example, d for each integer d1, is Polish with the usual topology and {0,1,,n}, for all n, is Polish with discrete topology. We also recall that a countable product of Polish spaces is Polish.

The following covering lemma can be found in [8].

Lemma 3.3.

LetKbe a regular kernel ond˜andbnbe a sequence of bandwidths. DenoteKn(x)=bnd˜K(x/bn). Then, for any probability measureμ,

for someρ>0dependent only onK.

The proof of the following lemma is in [4] (see also [21]).

Lemma 3.4.

Letζ=Nϵ+(1γ)Na1for some0<a<1/2, withγandϵbeing small positive numbers such thata1(N+ϵ)(1γ)1N1>1. IfAssumption 3holds for someθ>2N, then for anyδ>0,

The proof of the following lemma follows from the reverse triangle inequality.
Lemma 3.5.

For eachi,jJn, dist(νi,νj)max{ijr˜,0}, wherer˜=max{ij,i,jν}is the diameter ofνN.

The weak consistency of the classifier (1.2) has been established by [21]. In this section we study the strong consistency of (1.2). The following theorem states the strong consistency under mild conditions.

Theorem 4.1.

Assume thatAssumptions 1–3hold for someθ>2N. Ifn^bnd˜asn, then

Remark 4.1.

Note that the assumption on the bandwidth, using by [21] to prove the weak consistency, is similar to the classical assumption used by [7] and [8] in the independent case. In addition, the condition on bn is minimal compared to that used by [4] and [3] since they have studied the rate of uniform convergence for the estimators. However, the restrictive constraints on the bandwidth in [4] and [3] are related to θ and one has to let θ in order to attain the classical assumption.

Our aim in this section is to look at how the classifier (1.2) behaves on simulated samples by comparing it with the classical kernel rule. We use the R statistical programming environment to run a simulation study for N=2. Let {(X(i,j),Y(i,j))} be the field of interest and suppose that the simulated data are observed on the area I(n,n)={(i,j)2:1i,jn}. Let

where M={(2k,2l),1k,l10} is the set of non-observed sites which need to be classified. In this particular case, the vicinity of any missing site (i,j) may be taken as in Figure 1.

Figure 1

The missing site (i,j) and its vicinity ν(i,j) with boundary in green dashed lines.

Figure 1

The missing site (i,j) and its vicinity ν(i,j) with boundary in green dashed lines.

Close modal

It is important to note that the vicinity ν(i,j) may be designed depending on the location of the missing site (see some typical examples in Figure 2) and that samples with larger size give more freedom to design vicinities.

Figure 2

Three typical vicinities corresponding to three missing sites (i,j) in different locations.

Figure 2

Three typical vicinities corresponding to three missing sites (i,j) in different locations.

Close modal

Figure 2 shows some examples of vicinities that can be used when the missing sites are not completely surrounded by already labeled sites (located at the edges of Sn for example).

We suppose that the simulated fields have the covariance function

We use the classifier (1.2) with K(x)=i=18Ki(xi) for x=(x1,,x8)8 where Ki(xi) is the standard Gaussian density (Gaussian kernel). We suppose that {X(i,j),1i,jn} are observations of a Gaussian mixture model:

with μ0<μ1<μ2 and π1+π2+π3=1. In order to illustrate the fact that our method works for multi-class, the data set {X(i,j),1i,jn} is partitioned in three clusters as follows:

For each n=50,75,100, we generate 100 samples on the region I(n,n) with μ0=5, μ1=15, μ2=25, π0=π1=π2=1/3 and σ02=σ12=σ22=4. In each replication, we use the classifier (1.2), constructed on the basis of the training data observed on J(n,n), to re-predict the labels of sites in the test set M. Figure 3 displays one replication for n=50.

Figure 3

The training sites are colored in red (0), green (1) or blue (2) and the sites to classify are blank.

Figure 3

The training sites are colored in red (0), green (1) or blue (2) and the sites to classify are blank.

Close modal

The optimal bandwidth b^opt is obtained by minimizing the cross-validation criterion on a training sample and the misclassification error rate (ER) is evaluated based on the associated test sample. The average error rate (AER) is obtained by averaging the error rates associated with the corresponding 100 test samples.

Table 1 shows that the estimated optimal bandwidth and the average error rate decrease when the training sample size increases. This means that the practical results in the simulation study are in line with the theoretical results. Now, let us compare the average error rate (AER) resulting from application of the proposed classifier with that resulting from application of the classical kernel rule.

Table 1

Estimated optimal bandwidths and average error rates corresponding to the classifier (1.2) with samples of different sizes.

n5075100
b^opt2.041.931.77
AER28.1%21.2%14.8%

The classical kernel rule is given, for any unlabeled site j with Xj=x, by

where K˜:d+ is a kernel on d (the Gaussian kernel is considered here), and hn is a sequence of bandwidths. In order for the classical kernel classifier to be usable in our case, we have to adjust it slightly by taking the sum over InM instead of In, i.e., for each jM with Xj=x,

From the theoretical point of view, this is justified by the fact that g˜n has the same asymptotic behavior on In as on InM since M is bounded. In this classical kernel method, we consider knowing the features vector Xj of each element j of M and we use x, the value of Xj, to predict its class while we needed only observations in nearby sites to predict the label of j by the classifier (1.2). We apply the classical kernel classifier to re-classify the elements of M using the same training samples generated above and taking into account all the replications for each size n=50,75,100. Similar to what we have done in application of (1.2), the optimal bandwidth h^opt is chosen by minimizing the cross-validation criterion on a training sample and the misclassification error rate (ER) is evaluated based on the associated test sample. Table 2 reports the average error rate (AER), obtained by averaging the error rates associated with the corresponding 100 test samples.

Table 2

Estimated optimal bandwidths and average error rates corresponding to the classical kernel classifier with samples of different sizes.

n255080
h^opt1.851.721.69
AER23.4%18.7%13.2%

By comparing Tables 1 and 2, we observe that the corresponding error values in the two tables begin to be close as n increases. This supports the possibility of using the classifier (1.2) as an alternative to the classical kernel classifier when we have to classify sites with missing features.

A digital image is nothing than data numbers indicating variation of red, green and blue (RGB) at a particular location on a grid of pixels. An RGB color value is specified with: rgb(red,green,blue). Each parameter (red,green,blue) defines the intensity of the color as an integer between 0 and 255. For example, rgb(0,0,255) is rendered as blue, because the blue parameter is set to its highest value 255 and the others are set to 0. One can divide RGB color values by 255 in order to provide values in the interval [0,1]. Let us have an image of Eiffel tower with 100 missing pixels as in Figure 3.

We use the R package jpeg to convert a jpg image into 3-d array of numbers. The package jpeg offers the readJPEG() function which can read raster graphics (consisting of “pixel matrices”) in jpg format into R. It returns either a single matrix with gray values in [0,1] or 3-d array with the RGB values in [0,1], say E. In our example of Figure 3, the dimensions of E are 306×165×3. Thus, the elements of E[,j] represent the intensities of the color j, for j=red,greenorblue, at all pixels of the grid I(306,165). For example, the matrix E[55:60,1:6,1] displays the intensities of red in each pixel of the region:

Le X(i,j)=(X(i,j)(1),X(i,j)(2),X(i,j)(3)) where X(i,j)(k) is the intensity of the color k at the pixel (i,j). Since our purpose is to classify new sites with completely missing features, we set an arbitrary threshold of 0.4 and we define labels as follow:

The set of 100 missing pixels is taken as a test set, say M. We use the classifier (1.2) (see (1.7) for the binary version) to classify each element of M based on its eight-neighbors. The optimal bandwidth is evaluated by minimizing the cross-validation criterion on the known sites where we get b^opt0.72. The misclassification error rate (ER) is evaluated on M where we obtain ER=0.04 which indicates that there are only four misclassified cases out of 100 classified cases (see Figure 4).

Figure 4

Digital image of Eiffel tower with 100 missing pixels (blank pixels).

Figure 4

Digital image of Eiffel tower with 100 missing pixels (blank pixels).

Close modal

Now let us use the support vector machine (S V M) classifier to re-classify the elements of M. In this case we should suppose that the RGB value is known for each element of M. For implementing support vector machine in R programming language, we use the package e1071. According to this classifier, we get a misclassification error of ER=0.11 and this permits to conclude that our kernel classifier in this example proceeds well compared to the (SVM) procedure.

Without loss of generality, we prove the theorem in the binary case where Yj takes values in {0,1} since no additional argument is required to prove it in the multi-class case. However, the Bayes classifier (1.1) in the binary case is given by

and the classifier (1.2) is given by

(7.1)

Define

Consequently, the classifier (7.1) can be written as

By Theorem 2.3 in [7], the consistency will be proved if we show that

(7.2)

But

Hence, in order to prove (7.1), it suffices to show that

(7.3)

and

(7.4)

The proof of (7.3) is the same as in the i.i.d. case (see [7], pp. 156–157 ). So, it suffices to prove (7.4). To do that, we will employ the blocking technique used in [4]. Let p=pn=[n^γ] for some 1/θ<γ<1/(2N) (where [.] stands for the integer part). Without loss of generality, we suppose that there exists a positive integer qk such that nk=2pqk for each k=1,,N. Let

We define blocks as follow, for each jJq,

As a consequence, we have In=k=12NjJqSj(k), and for each k=1,,2N, card(Sj(k))=pN and dist(Sj(k),Sj(k))p for any jj. Let Γj(k)={iSj(k):νiSn}, for each k=1,,2N and jJq. Hence, for a fixed k, we have dist(Γj(k),Γjk)p for any jj, card(Γj(k))card(Sj(k))=pN and

(7.5)

Let {(X(i),Yi)}iInJn be a set of independent and identically distributed random vectors such that they are independent of {(X(i),Yi)}iJn and (X(i),Yi) is identically distributed with (X(1),Y1). In order to make sense to the blocking technique, we define random vectors as follow: for each iIn,

It is clear that {(X(i),Yi),iJn}={(X(i),Yi),iJn} and {(X(i),Yi),iΓj(k)}={(X(i),Yi),iΓj(k)}. Now, for a fixed k and each jJq, let Wj(k)={(X(i),Yi),iSj(k)} be a vector whose components are ordered according to a given order on indices. Applying Lemma 3.2 together with the blocks decomposition introduced by [10] (see also [20]) on the family of vectors {Wj(k),jJq}, we can generate independent copies {W˜j(k),jJq} such that: they are mutually independent, and for each jJq, W˜j(k)={(X˜(i),Y˜i),iSj(k)} has the same distribution as Wj(k)={(X˜(i),Y˜i),iSj(k)}. Furthermore, by Lemma 3.5, we have P(Wj(k)W˜j(k))ϕ(pr˜) since pr˜ for n^ large enough. Thus, the two vectors (X˜(i),Y˜(i)) and (X˜(i),Y˜i) are independent for each iSj(k) and iSj(k) with jj. Now, for each iJn, there exists jJq such that {(X(i),Yi)(X˜(i),Y˜i)}(Wj(k)W˜j(k)). Since (X˜(i),Y˜i)=(X˜(i),Y˜i) for each iJn, denote (X˜(i),Y˜i)=(X˜(i),Y˜i), for each iJn (or iΓj(k)). As a consequence

(7.6)

By (7.5), we can write

If we denote

(7.7)

then

(7.8)

Using Markov’s inequality and Lemma 3.3 together with (7.7), we have for any ϵ>0,

where ρ>0 is the constant defined in Lemma 3.3. Since r˜ is bounded and p as n, so pr˜p/2 for n^ large enough. Therefore, we get

for some generic positive constant C>0. Since γθ>1, by Borel–Cantelli lemma, we have

(7.9)

with probability one. Now, we will show that

(7.10)

By (7.7) and (7.8), we have

(7.11)

Consequently, in order to establish (7.10), it is sufficient to show that

(7.12)

for each 1k2N. Without loss of generality, we show (7.12) for k=1. If the elements of Jq are enumerated in an arbitrary manner, we can write Jq={1,,m} with m=card(Jq)=k=1Nqk. Denote Z˜j={(X˜(i),Y˜i),iSj(1)}, for each j=1,,m, where the components of Z˜j are ordered according to an arbitrary order on indices. Recall that (X˜(i),Y˜i)=(X˜(i),Y˜i) for iΓj(1) and suppose that (X˜(i),Yi) is replaced by (0d˜,0) if iΓj(1) where 0d˜=(0,,0)d˜. Hence, by the blocks decomposition, the random vectors Z˜1,,Z˜m are independent. Let F:((d˜×{0,1})pN)m be a real function defined as follows

For z˜jz˜j where z˜j={(x˜(i),y˜i),iSj(1)},z˜j={(x˜(i),y˜i),iSj(1)}(d˜×{0,1})pN and (x˜(i),y˜i)=(x˜(i),y˜i)=(0d˜,0) for each iΓj(1), using Lemma 3.3, we have

Hence, since n^=2NpNm with m=k=1Nqk, by McDiarmid’s inequality [16], we have for every ϵ>0,

Since p=[n^γ] with 1/θ<γ<1/(2N), then n^1γN/log(n^) and Borel–Cantelli lemma yields

As a consequence

(7.13)

with probability one. In order to complete the proof of (7.12) for k=1, it remains to show that

(7.14)

The proof of (7.14) can be achieved by the same arguments used by ([21], Section 5), in addition to benefiting from Lemmas 3.1, 3.4 and 3.5. Combining (7.9), (7.10), (7.12)–(7.14), we get (7.4). Finally, (7.3) and (7.4) yield (7.2) and the proof is completed. □

The author would like to thank the anonymous referees whose valuable comments led to an improved version of the paper. The publisher wishes to inform readers that the article “Strong consistency of a kernel-based rule for spatially dependent data” was originally published by the previous publisher of the Arab Journal of Mathematical Sciences and the pagination of this article has been subsequently changed. There has been no change to the content of the article. This change was necessary for the journal to transition from the previous publisher to the new one. The publisher sincerely apologises for any inconvenience caused. To access and cite this article, please use “Younso, A., Kanaya, Z., Azhari, N. (2019), “Strong consistency of a kernel-based rule for spatially dependent data”, Arab Journal of Mathematical Sciences, Vol. 26 No. 1/2, pp. 211-225. The original publication date for this paper was 13/11/2019.

[1]
H.C.P.
Berbee
,
Random Walks with Stationary Increments and Renewal Theory
,
Math. Cent. Tract.
,
Amsterdam
,
1979
.
[2]
G.
Biau
,
B.
Cadre
,
Nonparametric spatial prediction
,
Stat. Inference Stoch. Process.
7
(
2004
)
327
349
.
[3]
M.
Carbon
,
C.
Francq
,
L.T.
Tran
,
Kernel regression estimation for random fields
,
J. Statist. Plann. Inference
137
(
2007
)
778
798
.
[4]
M.
Carbon
,
L.
Tran
,
B.
Wu
,
Kernel density estimation for random fields
,
Statist. Probab. Lett.
36
(
1997
)
115
125
.
[5]
S.
Dabo-Niang
,
L.
Hamdad
,
C.
Ternynck
,
A kernel spatial density estimation allowing for the analysis of spatial clustering. application to monsoon asia drought atlas data
,
Stoch. Environ. Res. Risk. Assess.
28
(
2014
)
2075
.
[6]
S.
Dabo-Niang
,
C.
Ternynck
,
A.-F.
Yao
,
Nonparametric prediction of spatial multivariate data
,
J. Nonparametr. Stat.
28
(
2016
)
428
458
.
[7]
L.
Devroye
,
L.
Györfi
,
G.
Lugosi
,
A probabilitic Theory of Pattern Recognition
,
Spriner-Verlag
,
New York
,
1996
.
[8]
L.
Devroye
,
A.
Krzyżak
,
An equivalence theorem for L1convergence of the kernel regression estimate
,
J. Statist. Plann. Inference
23
(
1989
)
71
82
.
[9]
P.
Doukhan
,
P.
Massart
,
E.
Rio
,
The functional central limit theorem for strongly mixing processes
,
Ann. Inst. H. Poincaré Probab. Statist.
30
(
1
) (
1994
)
63
82
.
[10]
P.
Doukhan
,
P.
Massart
,
E.
Rio
,
Invariance principles for absolutely regular empirical processes
,
Ann. Inst. H. Poincaré Probab. Statist.
31
(
2
) (
1995
)
393
427
.
[11]
X.
Guyon
,
Estimation d’un champ par pseudo-vraisemblance conditionnelle: Etude asymptotique et application au cas markovien
, in:
Proc.6th Franco-Belgian Meeting of Statisticians
,
1987
.
[12]
I.A.
Ibragimov
,
Some limit theorems for for stationary processes
,
Theory Probab. Appl.
7
(
2011
)
349
382
.
[13]
A.S.
Kechris
,
Classical Descriptive Set Theory
,
Spriner-Verlag
,
New York
,
1995
.
[14]
M.E.
Machkouri
,
Asymptotic normality of the parzen–rosenblatt density estimator for strongly mixing random fields
,
J. Statist. Plann. Inference
14
(
2011
)
73
84
.
[15]
M.E.
Machkouri
,
R.
Stoica
,
Asymptotic normality of kernel estimates in a regression model for random fields
,
J. Nonparametr. Stat.
22
(
2010
)
366
377
.
[16]
C.
McDiarmid
,
On the method of bounded differences
,
Surveys in combinatorics 1989
,
Cambridge University Press
,
Cambridge
,
1989
, pp.
148
188
.
[17]
C.C.
Nedearhouser
,
Convergence of blocks spins defined by a random fields
,
J. Stat. Phys.
22
(
1980
)
673
684
.
[18]
E.
Rio
,
Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques et Applications
,
Spriner
,
Berlin
,
2000
.
[19]
M.
Rosenblatt
,
Stationary Sequences and Random Fields
,
Birkhäuser
,
Boston
,
1985
.
[20]
G.
Viennet
,
Inequalities for absolutely sequence. Application to density estimation
,
Probab. Theory Related Fields
107
(
4
) (
1967
)
467
492
.
[21]
A.
Younso
,
On the consistency of a new kernel rule for spatially dependent data
,
Statist. Probab. Lett.
131
(
2017
)
64
71
.
Published in the Arab Journal of Mathematical Sciences. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/legalcode

or Create an Account

Close Modal
Close Modal