2013–2014M.A. in Quantitative Methods,
Columbia University,
New York, NY, USA GPA 3.9/4.0Relavant Coursework: Multivariate Statistical Analysis, Applied Data Mining, Applied Data Science, Big Data Analytics, Data Visualization, Advanced Spatial Analysis & GIS, Writing
Thesis: Language Use in Teenage Crisis Intervention and the Immediate Outcome:A Machine Automated Analysis of Large Scale Text Data
2009–2013B.A. in Economics & English,
Zhejiang University, Chu Kochen Honors College,
Hangzhou, China GPA 3.9/4.0Relavant Coursework: Probability Theory and Statistical Inference, Linear Algebra, Java Programming, Econometrics, Advanced Microeconomics, Advanced Macroeconomics, Dynamic Optimization, Game Theory
Theses: The Analysis of Economic Classics Translation from the Perspective of Functionalist Translation Theory: A Case Study on Smith's The Theory of Moral Sentiments; Rules, Games, and the Coproduction of Public Service: Citizen's Role in Urban Solid Waste Sorting
Skills
• Well versed in Python, R and SQL. Day-to-day experience with Bash and git. Hands-on with d3.js and HTML
• Extensive working experience with RDBMS (Postgres, Transact); Strong knowledge of distributed computing framework (Spark/Hadoop); NoSQL databases
(Cosmos, Cassandra); web crawler (Nutch); search engine (Solr/Lucene)
• Immersed in AWS, Azure, Docker and Continuous Integration (Bamboo)
Work Experience
Apr. 2017–PresentReceptiv, Ad Tech, New York, NY Data Scientist
• Designed and built an advertising pacing system to dynamically allocate campaign budget to supply sources, via mobile device
ad playlists. The solution utilizes PID theory and is written in python with robust unit testing, logging and alerts. New
features are rolled out with continuous integration
• Built a device gender prediction web service using Azure Machine Learning that allows version control of models and easy
A/B testing. Designed the ETL process in Azure Data Factory
• Designed a bid decisioning system using win rate prediction and Bayesian bandit algorithms
Apr. 2015–Apr. 2017IgnitionOne, Ad Tech, New York, NY Data Scientist
• Designed and prototyped an NLP system that classifies URLs into contextual topics and discovers user interests based on
their browsing history, powering display ad campaigns with a 35% boost in ROI
• Built a pipeline in Python from automated data collection to display ad conversion prediction that utilizes GBM for feature
engineering and transfer learning to bring in more signals, achieving a highest 0.78 AUC on the advertiser level
• Designed experiments to test effectiveness of ad products, analyzed results in R, and delivered insights with intuitive
visuals
• Identified the bottleneck of the company’s existing mobile product and write Spark apps in Scala to build a sample mobile
device profile store in Cassandra
Aug.2013–Apr.2015Columbia University, Department of International and Public Affairs,
New York, NY Quantitative Research Assistant
• Designed and implemented a randomized control trial that tests the behavioral impact of anti-corruption audits in NYC
• Identified causal effects by collecting, assessing and analyzing data from 6 years of NYC tax rolls and various other sources
• Published a book chapter that quantitatively investigates the key figures involved in the 2002 NYC Assessment Scandal
Jun.2014–Jan.2015Crisis Text Line,
New York, NY Data Science InternProject I: Counselor Quality Evaluation using Text Mining
• Automated the retrieval of mental counseling data from a large, real-time database using SQL and Python
• Constructed variables from text data using regular expression, topic models and other NLP methods
• Identified key predictive factors of counselor quality using correlation test and ordered logistic regression
•Provided insights into counselor training by communicating results to board members with intuitive visualizations
Project II: PPC Marketing using Google Adwords & Analytics
• Achieved a 15.5% increase in website traffic by analyzing audiences, setting up targeted ad groups and A/B testing ad contents
• Redesigned and prototype the organization landing page to target potential donors and volunteer counselors
Jul.2013–Aug.2013PING AN Insurance, Unsecured Personal Loan Department,
Shanghai, China Client Management Intern
• Facilitated bad debt prevention by identifying and managing high risk clients based on their past performance data
• Proposed and helped implement an automated message reminder system based on clients’ credit record to reduce labor cost
Project Experience
Fall 2014Columbia University, New York, NYPowering Social Network Analysis with Graph Data Model: An Example of the Meetup Visual Analytics
• Performed social network analysis on web-scraped Meetup data using Python, and visualized the results with Gephi
•Built up a Neo4j-powered visual analytics dashboard that allows users to query their interested Meetup topics and view the most popular groups, most active groups, event locations and group network structures as interactive visuals
Spring 2014Columbia University, New York, NYPolitical Sentiment Visualization: Data Analysis and Visualization Using Voxgov US Federal Government Media Releases
• Processed, cleaned and manipulated 200,000 + Federal media release text files in JSON format
•Performed lexicon-based sentiment analysis and Structural Topic Modeling to reveal latent topics and the attached sentiment
•Designed and implemented both static and interactive D3 visualizations to communicate the results
Predicting Salary using Job Description: Topic Modeling and Supervised Learning on NYC Job Posting Data
• Predicted salary level with 87.5% accuracy using topic modeling results combined with machine learning algorithms including linear regression, K-Nearest Neighbors, Random Forest and Supported Vector Machine
Jun.2012–Mar.2013Zhejiang Provincial Bureau of Statistics, ChinaRival or Partner: Grassroots NGOs and Local Government in Regional Governance
• Led a team of 6 to conduct a 5-month fieldwork for data collection using 1200 + questionnaire and multiple interviews
• Won the 1st prize in the provincial statistical survey competition with a comprehensive report and a team presentation
2011–2013Zhejiang University, Hangzhou, ChinaDevelopment of Logistics Finance in Zhejiang Province: A Case Study
• Collected logistics finance data from Zhejiang Yongjin Storage CO,. Ltd with access to the company database, questionaires and interviews
•Analyzed restrictions and potentials of logistics finance in Zhejiang based on data analysis
Collaborative Management and Regional Development: the Hangzhou Experience
• Analyzed the pricing strategy and organizational design in the successful transform of the West Lake Scenic Region
The Theory of Moral Sentiment Translation Workshop
• Hosted the weekly workshop for a year to discuss important concepts, arguments, and different versions of translation
•Co-published the translation work of The Theory of Moral Sentiments by Adam Smith
Honors & Awards
2012
• Zheng Zhigang Scholarship
Awarded annually to 6 undergraduates university wide
2012
• Specialized Scholarship of Social Practice
Awarded annually to 6% of undergraduates university wide
AY 2010 –2012
• Academic Excellence Scholarship
Awarded annually to 15% of undergraduates university wide
Nov. 2012
• First Place in Zhejiang Statisical Survey Contest for College Students
Held by Zhejiang Provincial Bureau of Statistics, 1 out of 300 contestants
Jun. 2011
• Silver Medalist in Model Podium Teaching Contest
Held by Zhejiang University, Department of Education, 2 out of 53 contestants
Jun. 2010
• Silver Medalist in World Scholar's Cup
Held by the World Scholar's Cup Orgalization, 2 out of 56 teams
2010
• National Scholarship for Outstanding Students
Awarded annually to the top 1% undergraduates nationwide by Ministry of Education of the People's Republic of China