Today researchers need an efficient and valid approach to mine and analyze the large amount of textual information that is available. Automated coding approaches offer promise but a major concern is the accuracy of such codes in capturing the meaning and intent of the original texts. We compare the recall (number of codes identified) and precision (accuracy of the codes) that included bodies of texts coded (1) manually by humans based on the Outline of Cultural Materials (OCM) code book, (2) semi-automatically by computers that used a human-generated content dictionary containing Rapid Ethnographic Retrieval (RER) codes, and (3) automatically by computers that used an automated version of the OCM content dictionary (AOCM). We applied network visualization and statistics to quantify the relative importance of codes. The semi-automatic coding approach had the highest balance of recall and precision. Network visualization and metrics identified relationships among concepts and frame codes within a context. Semi-automated approaches can code much more data in a shorter period of time than humans and researchers can more easily refine content dictionaries and analyses to address errors, which makes semi-automated coding a promising method to analyze the ever-expanding amount of textual information that is available today.
Keywords: Content analysis; Accuracy; Network analysis; Data mining
Citation: Van Holt, T., Johnson, J.C., Brinkley, J., Carley, K., and J. Diesner. 2013. Rapid Ethnographic Assessment for Cultural Mapping. Poetics 4(4):366-383.