Sparse Network Asymptotics for Logistic Regression under Possible Misspecification
Bryan Graham
No 27962, NBER Working Papers from National Bureau of Economic Research, Inc
Abstract:
Consider a bipartite network where N consumers choose to buy or not to buy M different products. This paper considers the properties of the logit fit of the N ×M array of “i-buys-j” purchase decisions, Y = [Yij ]1≤i≤N,1≤j≤M, onto a vector of known functions of consumer and product attributes under asymptotic sequences where (i) both N and M grow large, (ii) the average number of products purchased per consumer is finite in the limit, (iii) there exists dependence across elements in the same row or same column of Y (i.e., dyadic dependence) and (iv) the true conditional probability of making a purchase may, or may not, take the assumed logit form. Condition (ii) implies that the limiting network of purchases is sparse: only a vanishing fraction of all possible purchases are actually made. Under sparse network asymptotics, I show that the parameter indexing the logit approximation solves a particular Kullback–Leibler Information Criterion (KLIC) minimization problem (defined with respect to a certain Poisson population). This finding provides a simple characterization of the logit pseudo-true parameter under general misspecification. With respect to sampling theory, sparseness implies that the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit pseudo composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the dataset in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and iid data. Simulation results suggest that sparse network asymptotics better approximate the finite network distribution of the logit estimator.
JEL-codes: C01 C31 C33 C55 (search for similar items in EconPapers)
Date: 2020-10
New Economics Papers: this item is included in nep-ecm, nep-net and nep-ore
Note: DEV IO ITI LS TWP
References: Add references at CitEc
Citations: View citations in EconPapers (6)
Downloads: (external link)
http://www.nber.org/papers/w27962.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:nbr:nberwo:27962
Ordering information: This working paper can be ordered from
http://www.nber.org/papers/w27962
Access Statistics for this paper
More papers in NBER Working Papers from National Bureau of Economic Research, Inc National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.. Contact information at EDIRC.
Bibliographic data for series maintained by ().