How's Your Life?

Understand people's change of emotions across life spans based on their life stories.

Data Mining / Data Analysis / Python / R

Course Project / April 2018

Research Background

I constantly ask myself: what’s my life going to be like in ten years?


In Zhihu (a Q&A forum), people are posting questions like “For those in their 30s, how it your life?” and numerous people are answering these questions by sharing their life stories. These answers together give us a big data sample on people’s lives in different life stages.


Specifically, many people are writing about their feelings in various life stories (a divorce, a drop-out, a promotion etc.). This sparks my interest in understanding the changes in emotions during life spans.

Research Question

General Question: what are people’s lives going to be like in ten years?


Research question: Is there any difference between the four emotions in three life stages (from 18 to 22, from 24 to 26, and from 28 to 32)?


Specific research question: Is there any difference between the word frequencies of "depressed", "anxious", "happy" and "lost" in the three questions in Zhihu platform?


*three questions:

For those aged between 18 and 22, what is your life like?

For those aged between 24 and 26, what is your life like?

For those aged between 28 and 32, what is your life like?


Database: three Zhihu questions and their answers.


  1. Use python to scrape first 1000 answers in each question.

  2. Use R to analyze frequencies of four words (happy, depressed, anxious and lost) in each of the three answers.

  3. Use ggplot2 in R to plot differences among the words and among age stages.

  4. Use Chi-square independence test to examine if the differences are statistically differences.

Partial Python Code for Data Mining

Word Segment Function in R


Happy and lost are the most frequently used words among the four, while depressed is the least used word among the four. 

For the youngest people (aged 18 to 22), lost is the most frequently used word among the four. For the second group, happy is the most frequently used word (aged 24 to 26). For the oldest group, anxious becomes the most used one (aged 28 to 32).

There is also an interaction effect between age group and emotion. While people in the youngest group (aged 18 to 22) use the word happy the most, they are also the most lost people. For anxiety, people in the oldest group (aged 28 to 32) feel more anxiety than the other group. 

Chi-square independence test further confirmed that the differences among the groups are significant. 


It seems that the youngest cohort is the happiest generation and the most lost one. I personally find the results assuring: it seems to "guarantee" that I am not the only one feeling lost in my life, there are thousands of millions of youngsters are feeling the same.


Meanwhile, it seems that people around 30 are the most anxious of all. There is an ancient Chinese wisdom that says "At thirty, I stood firm". However, the piece of wisdom does not seem to apply to today's society. As people are in their 30s, they are pressured by the family, their children's education and the high housing price. Few people can "stand firm" under pressure. This is probably why they are anxious.

Finally, I shall enjoy my life!

©2018 by Yu Zhao.