Start Date

2019

Description

Our research is an extension of prior work by CSIRO - Commonwealth Scientific and Industrial Research Organization, Australia’s national research laboratory. Our focus is on utilizing Twitter data, Tweets, as a dataset by which we measure the SLO - Social License to Operate - of various mining, gas, and oil companies. SLO is defined as the acceptability of a company’s business operations by its employees, stakeholders, and the general public. The primary purpose of the summer 2019 research project is to investigate and find a methodology by which we can effectively model the topics of all the Tweets in our dataset. Topic modeling is a way of defining abstract “topics” that are prevalent in a corpus of textual documents. It is statistical in nature and is essentially unsupervised machine learning by which we attempt to cluster the Twitter data to find similarities and patterns among groups of words. To that end, we first utilized standard data science techniques to investigate the nature of our Twitter dataset. This involves the use of the Python programming language, the Pandas data analysis library, the Matplotlib data visualization library, and other processing and visualization software. Our discoveries and results are recorded in Jupyter Notebooks – an interactive web-based application that allows researchers to easily share code, equations, visualizations, and text. We also utilize Scikit-Learn, a machine learning software suite, and Gensim, a topic modeling software suite, along with various 3rd party libraries, to implement baseline topic models from which we can begin to investigate how to best extract relevant topics from the Tweet texts.

Share

COinS
 
Jan 1st, 12:00 AM

Monitoring Social Media using Machine Learning

Our research is an extension of prior work by CSIRO - Commonwealth Scientific and Industrial Research Organization, Australia’s national research laboratory. Our focus is on utilizing Twitter data, Tweets, as a dataset by which we measure the SLO - Social License to Operate - of various mining, gas, and oil companies. SLO is defined as the acceptability of a company’s business operations by its employees, stakeholders, and the general public. The primary purpose of the summer 2019 research project is to investigate and find a methodology by which we can effectively model the topics of all the Tweets in our dataset. Topic modeling is a way of defining abstract “topics” that are prevalent in a corpus of textual documents. It is statistical in nature and is essentially unsupervised machine learning by which we attempt to cluster the Twitter data to find similarities and patterns among groups of words. To that end, we first utilized standard data science techniques to investigate the nature of our Twitter dataset. This involves the use of the Python programming language, the Pandas data analysis library, the Matplotlib data visualization library, and other processing and visualization software. Our discoveries and results are recorded in Jupyter Notebooks – an interactive web-based application that allows researchers to easily share code, equations, visualizations, and text. We also utilize Scikit-Learn, a machine learning software suite, and Gensim, a topic modeling software suite, along with various 3rd party libraries, to implement baseline topic models from which we can begin to investigate how to best extract relevant topics from the Tweet texts.