For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Sampling Method Using Search API and Wikipedia for Social Media Analysis
Shohei OHSAWA Yutaka MATSUO
D - Abstracts of IEICE TRANSACTIONS on Information and Systems (Japanese Edition)
Publication Date: 2017/10/01
Online ISSN: 1881-0225
Type of Manuscript: PAPER
dictionary-based sampling, Facebook, Wikipedia, estimated Jaccard coefficient,
Full Text(in Japanese): PDF(679.9KB)
>>Buy this Article
In social media analysis, several researchers perform sampling from API (application programming interface) provided by the social media such as Facebook and Twitter to collect attribute information of entities to be analyzed. There are few reports of sampling method from search API, and hence it is not obvious how to sample from the API efficiently. This paper shows a method which enables us to improve the efficiency of sampling by using Wikipedia ontology. Our method generates multiple dictionaries from a given ontology, and changes using dictionary adaptively in conformity to a target topic. Besides, we propose estimated Jaccard coefficient as an evaluation criterion for a dictinoary. The expeiment reports that our method samples 18 million entities, 25.8% of all the entities in Facebook, and the method with estimated Jaccard coefficient outperforms existing methods.