Many people can easily write their articles using social network service (SNS), since we are living in Smart phone era. If we can infer twitter user’s residential location, it is possible to analyze sentimental analysis, movement of population, disease tracking, discourse of political and social issue for conversation related research, on-line issue monitoring and managing the risk about consumer. But because of privacy disclosure issue, it’s weakness that we can’t get a large amount of twitter user location information. In this research, we are using firehose twitter data level, spatial indicator, several clustering algorithm to overcome a small amount of location information in tweet. Also we are using district level residential location inference to improve accuracy. We selected Seoul city in South Korea, which has high twitter user and population in this research. We adopted variable clustering algorithm and compared inference accuracy by distance range. This research result analyzed that using spatial indicator for group A (point type of geotag), B (SNS), C (geocode), D (polygon type of geotag) rather than group A, C and Convex hull with onion peeling clustering algorithm has more inference probability of residential location also. As a result of this research, we hope to contribute algorithm research for twitter user location information inference, sparsity overcome and automated residential location inference.
- Firehose API
- Residential location inference
- Spatial indicator