The future is already here — it's just not very evenly distributed. ---- William Gibson
What does the US Federal Government say in their “boring” media releases? How do the Republicans and the Democrats differ in the things they are saying? What kind of emotions are attached to each topic? This collaborative project explores the Federal Government Media Releases with structural topic model and sentiment analysis. Results are presented as interesting interactive visualizations that are showcased on our project website.
The increasing popularity of social network services provides a great opportunity to study what we care about and how we interact with others. However, because of the complexity of network relationship data, research and applications are limited by the query efficiency of relational data model and SQL-like language. This collaborative project employs the newly emerging Graph Database, Neo4j, to store, extract and analyze group network data on Meetup.com. Finally, an interactive dashboard is built that allows users to query their interested Meetup topics and view results in the form of interactive visuals.
You are what you say. People’s words reveal important information about their identity, emotions and relationships with others. This provides new insight into the evaluation of teenage crisis intervention.With techniques of text mining, LIWC-based psycholinguistic analysis, and Analysis of Variance, my research reveals significant correlations between language use during a counseling session and the effectiveness of the treatment. For example, teens who felt better after their treatments in general used more prepositions, more conjunctions, and more words representing cognitive processes. Based on the detected language use patterns, predictive models achieve above 75% accuracy in detecting the “better” interventions, and above 90% accuracy in finding out the “worse”.
Among the many things that haunt a graduate student’s mind, finding a job is probably the most important and monstrous. While an ideal job is the intersection of three sets – what one loves, what one is good at, and what the society values – an answer to the third questions is enough to guarantee a good pay. This collaborative project intends to predict job salary based on texts that describe the job. We experiment on using topic modeling as a dimension reduction method to transfer unstructured text into quantitative dimensions that represent latent topics. The results are then fed into statistical models to predict salary levels with up to 84% accuracy.
Crisis Text Line (CTL) is a data-driven NGO start-up providing free crisis intervention to teens 24/7, covering the whole United States. During my internship there, I was in charge of a Google Adwords Campaign to raise fund and recruit volunteer counselors. With the support from my supervisor and the team, I learnt important concepts and strategies of PPC marketing, and even redesigned the organization landing page to target at potential donors. Below are some designs and adwords tutorials I developed for the project.
The 2002 New York City property assessment scandal has been called the greatest case of municipal fraud in U.S. history. Mayor Michael Bloomberg referred to it as the “largest and most damaging corruption scheme ever conducted within city” of New York. A tax assessor turned consultant masterminded a ploy wherein property owners paid him substantial fees. In exchange, the high-powered consultant secured questionable reductions to their property taxes by bribing government officials. The three-decades-old corruption scheme cost New York City an estimated $40 million in vital revenue a year. This research project examines the scandal closely by studying government reports, news articles, legal documents, conducting interviews with reporters and prosecutors, and performing quantitative analysis of public indictment data. A volume paper is in its final stage of preparation to publish.