Document Type
Article
Publication Date
2013
Publication Title
IEEE Transactions on Knowledge and Data Engineering
Volume
In Press
Keywords
Big data, online social networks, small sample, bias, size estimation
Abstract
This paper discusses the bias problem when estimating the population size of big data such as online social networks (OSN) using simple random walk. Unlike the traditional estimation problem where the sample size is not very small relative to the data size, in big data a small sample relative to the data size is already very large and costly to obtain. When small samples are used, there is a bias that is no longer negligible. This paper shows analitically that the relative bias can be approximated by the reciprocal of the number of collisions, thereby a bias correction estimator is introduced. The result is further supported by both simulation studies and the real Twitter network that contains 41.7 million nodes.
DOI
10.1109/TKDE.2012.220
Recommended Citation
Lu, Jianguo and Li, Dingding. (2013). Bias Correction in Small Sample from Big Data. IEEE Transactions on Knowledge and Data Engineering, In Press.
https://scholar.uwindsor.ca/computersciencepub/1
Comments
(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.