@superpigy 2018-06-08T06:33:27.000000Z 字数 50865 阅读 87

Individual Stressor Identification based on Latent Stressor Discovery Model

LDA stressor behavior

Introduction

A.Motivation

With the increasingly rapid pace of life today, people are becoming more and more vulnerable to stress. Numerous evidence and findings show that chronic stress may lead to huge physiological problems such as headaches, high blood pressure, heart issues and skin conditions. Due to the latency of stress, it is hard for people to realize that they are suffering from stress. For stress detection, however, traditional methods like stress-assessment questionaries and physiological characters monitoring are very passiveness and inconvenient. Instead, in recent years, many researchers put their eyes on methods that can identify individual stress state in real time. Most of the existing studies of stress detection can be categorized into two different perspectives, the physiological signal analysis, and the social media analysis. For physiological point of view, wearable sensors or Iot devices are employed for monitoring abnormal signals reflected from the body[4][5][6]. For daily use, however, such sensor sensing technologies cannot be widely used due to the extra expense and ease of use.
Benefiting from widely used portable smartphones,people are willing to share their latest status, moods and funny stories to their friends on SNS. And UGC on SNS can be acquired easily by using public APIs or Html crawling and parsing. Apart from ordinary user generate contents, some social media platforms trying to provide extra services such as tagging and profile dashboard so that more accurate personal portraits can be obtained. Regarding these psychological theories and findings, many empirical studies show that there are ways to leverage SNS data to identify the latent expressions of stress. Early studies[7][8] proved that by digging out the lexical features like words, punctuations and grammar use may help for stress evaluation. Nevertheless, these lexical methods cannot fully apply on social media like Twitter, Pinterest and Sina Microblog because of the short text issue. Recent studies use texts as well as user interactions through the large-scale SNS data to model users' behavior and text patterns in order to identify stress states.[2] applied the supervised CNN method together with FactorGraph to learn features from labeled data and predict unlabeled data; while [1] proposed a prediction model based on the correlation between stressor and stress to forecast pressure states of teenagers.
Table 1. Examples of topics and their corresponding stressors.(Excerpted from our crawled data)

stressor	tweet	topic
life	My mother was very ill, so I had to take care of her. And I was worried about her operation.	illness
life	I have little money, but I can squeeze by till the end of the month.	money shortage
study	math is very difficult, we cannot go to the same college!	math,college

Compare with stress state detection, in many cases, people are more concerned about reasons that cause psychological pressures. By the reports from Pew Research Center, an awareness of stressors offers a great help to people cope with stress itself. According to these psychology theories and empirical findings, stressors may be reflected in behavior patterns and result in negative thoughts and emotions. On the basis of the definition, a stressor can be interpreted as a kind of daily event that leads to overwhelming anxiety or depression. For instance, given a sentence--'I lost my key' and it refers to a 'lost' event. Thus,for stressor identification on social media,traditional event detection methods can be applied and existing studies such as [3], which found out a way to extract stressors from microblog by applying the ordinary event detection approach(eg., trigger words detection, NLP text analysis) in one's stressful periods and [9] used a labeled deep learning technology for stressor finding. However, these models neglect considering the individual difference on the stressor distribution. A student is more probable to be suffering from a study-related stressor than a work-related stressor, whereas a white-collar worker is quite the opposite. According to our empirical findings on large-scale social media data, each stressor can be represented as a set of topics.[Table 1] shows the one-to-many relationship between stressors and topics, the first two tweets share the same stressor of life but their embedded topics are totally different and thus it has a possible way to identify individual stressor distribution by a topic allocation method. Unlike traditional topic model such as the Latent Dirichlet Allocation(LDA)[10] which tries to uncover latent topics from words in documents, our purpose is to discover latent stressors over a series of short messages(e.g., posted tweets) by using a novel allocation approach.
Different from the startup of social networking, it is now insufficient for stressor detection to simply rely on text-level attributes, due to people nowadays prefer to choose pictures, music, and other multimedia formats to share their lives and attitudes and it also results in extra difficulty of lexical analysis due to text sparsity issue. Inspired by [2], which mentioned that SNS behaviors(interactions) play an important role for expressing the stressful status of online users, we introduce behaviors as an equally significant feature and integrate to our model.

B. Our work

    In this paper we propose a novel probabilistic approach for individual stressor distribution analysis taking consideration of lexical features,social behaviors and user access timings. From Bayesian theorem,in LSDM, individual data flow is treated as a series of stochastic generation processes in which lexical features,behaviors and access timings are all randomly generated by specific triggers,say,stressors. Our model describes a way of generating web content from a probabilistic point of view. To generalize the stressor detection problem,we define the minimum processing unit called 'term' to represent the user-generated content at a certain time,marked as, $\{[behaviors],[texts],time\}$ , where the square brackets means a list set and components of the 'term' are independent and elements in each component are i.i.d.Take the classical urn problem in statistics, for example, suppose there are urn A and urn B with behavior dices and word dices that are labeled in different colors respectively, given a user and a random trigger(e.g., a stressor), we notice that the user has different color preferences under different stressors. Then, the user chooses dice B and dice W from urn A and urn B respectively and independently by following the color preference.Afterwards,he or she keeps rolling the chosen dices to sample behaviors and words and at the end the user stops rolling at time $\tau$ ,therefore a new term is generated. We represent the above procedure with a generative model named Latent Stressor Discovery Model(LSDM).
    Motivated by [3], the SNS access rate in stressful periods is higher than the rate in non-stressed periods,the proportion of stressors which are shown on social media should be varies in different periods. Thus, we define a dynamic sampling period called 'footprint' in our model and we assume each footprint has a specific stressor distribution. The core principle that LSDM can obtain the stressor distribution in a footprint is to compute a Bayesian posterior distribution of stressors over the given evidence in the testing footprint. However, such posterior distribution calculation is not a trivial task because several fundamental questions need to be addressed:(1) how to determine the sampling length of a footprint;(2) what are the latent relevancy between stressors and observable terms;(3) how to inference the proportion of stressors in a testing footprint by the given evidence. Beyond individual stressor distribution ,in this paper, we are also interested in the criteria for the stressor related topics allocating. Due to the diversity of individuals such as interests, the degree of education and living environments reflected in texts, each user may give different words, phrases, emoticons or even punctuations to express the same stressor and thus such characteristics can accelerate our model in prediction accuracy and speed.
    We investigated over 45 million terms posted by 63,254 online users from Sina Microblog and found that
    Overall, the contributions for this work are as follows:

     (1) We proposed a novel mixture allocation model LSDM which leverages lexical,behavioral and time features individually to determine personalized stressor distribution in stressful periods.
     (2) We create a new perspective on probability to present the procedure that online users generate their contents on social media.
    The rest of the paper is organized as follows. A briefly overview of closely related works of event extraction,topic modeling,stressor detection and the state-of-the-art stress detection approaches will be presented in Section II. Following that, we will introduce the principles of LSDM in detail and present a Gibbs sampling inference for latent parameters and analyze the algorithm complexity of the proposed model in Section III. Afterward, an experimental setup is placed in Section IV and after that, we plan to bat around the results and further analysis in Section V.In the end, we conclude the whole paperwork in Section VI.

The proposed model(LSDM) for individual stressor distribution computing is based on stress detection, sentiment analysis, topic modeling on short texts and SNS based event extraction.

In recent years, social media has become a powerful “lens” for mental health care. A rich body of such research work has been conducted to discover, track and even forecast mental health of individuals or populations. [51, 52, 53, 54, 55] regarded social media as a new channel for depression detection. De Choudhury et al. [51] were the first to capture depression predictors through social media. They conducted a classifier based on social media data to predict whether an individual was vulnerable to depression. And the experiment results showed that the indicators such as lowered social activity, greater negative emotion, high self-attentional focus, increased relational and medicinal concerns play key roles in predicting depression. By analyzing six network features indicated social integration in a social networking site named TrevorSpace, M Homan et al. [52] compared users who were identified as depressed and those who were identified as non-depressed, and discovered that non- depressed users are more deeply integrated into the social fabric than depressed users. Andrew Schwartz et al. [53] developed a regression model to predict individuals’ continuous-valued depression scores through Facebook and found that individuals’ degree of depression most often increases from summer to winter. Based on Japanese- speaking users’ Twitter, Tsugawa et al. [54] verified the effectiveness of using features related social media activities for estimating users’ degree of depression. Besides, there are also a plenty of studies on other types of mental health care via social media. For example, De Choudhury et al. [56, 57] utilized Twitter posts and Facebook data for evaluating the risk of new mothers’ postpartum depression. Mitchell et al. [58] analyzed the language used by schizophrenia sufferers on Twitter by employing several natural language processing methods. They found that schizophrenia sufferers have significant differences from normal people in several linguistic indicators, such as words that mark an irrealis mood and a lack of emoticons. The repository of results of such research works reveals that social media data could provide solid and confidential information for mental health care.

B.Topic modeling over short texts

Conventional well-known topic models such as Probabilistic Latent Semantic Analysis(PLSA)[50] and Latent Dirichlet Allocation(LDA)[10] are widely used for document analysis in past several years. Due to flaws of these model on short texts -- the sparsity of the document-level word co-occurrences, it can't be directly applied to the social media topic allocation process. Early researchers concentrated on exploiting external sources to enlarge lexical features of short texts, for instance, Phan et al.[64] and Jin et al.[65] used a pre-learned topic classifier from large-scale dataset to infer the topic distribution over short texts. But the limitation of these models is that they won't get good results if the auxiliary data are less related to the original data and the highly correlated auxiliary data can't be easily found sometimes. Extra metadata such as timestamps, named entities, and hashtags have been used for short text aggregation[66] whereas, for some domains, it is not always smooth to find such favorable metadata, say, for example, news snippets and contents that only contain emoticons. [61]-[63] used improved word-embeddings techniques to detect latent topics over short texts and made good results. While another strategy[49] for short texts topic allocating is to merge short texts into long pseudo-documents to refine the topic inference process of conventional topic models.Lately, many achievements in enriching the word co-occurrence info from the set of short texts are being modeled, Yan et al.[67] proposed a novel BTM method based on the assumption that the two words are more likely to be part of the same topic if they more frequently co-occurred.

Sentiment analysis is widely used in all aspects, such as digital monitoring, stock prediction, brand management and a recommender.As stressors result in negative emotions, for example, stressed people tend to express some moods about 'boredom' or 'world-weariness' and thus sentiment analysis is a significant work for enhancing the effectiveness of stressors detection process. Some of the tweet-level sentiment analysis [68]-[70] are based on hashtags, emoticons and other abstract features and the other ones [71][72] are mainly focus on target-depended features. However, parsing emotions on Chinese text is also a challenging work. There are two branches of relevant research -- the monolingual and the bilingual[73]-[87]. The former branch mainly focuses on proceeding ordinary sentiment procedures based on Chinese grammar while the latter one directly applies English translate techniques on Chinese texts. Along with the development of machine learning and neural net techniques[88]-[94], it is getting easier to build an emotional classifier by using emotionally labeled data, such as customer reviews in e-commerce platforms and votes data, notably by SnowNLP, an opensource python library for NLP processing.

Referring to a research that was published on Holmes and Rahe Stress Scale(HRSS), it took the scale of 2,500 US military members to summarize top ten stressful live events which are more likely to tend to stressful illness. It proved that the personal life events detection is a meaningful work for our model to enhance the performance of LSDM learning procedures. Utilizing the resources of social media, plenty of works on personal life event detection from Twitter had been done. The work[95] builds a classifier of 11 major life events by using activity and attention features and [97] distinguishes whether the event that people are involved in or not and senses the time that the event happened.[96] turns tweet sentences into syntactic and semantic features to identify life events such as 'marriage','having a baby' and 'death'.The work [99] used machine learning approach to build several binary classifiers for some so-called 'life breaking point' events such as 'marriage','graduation','new job','newborn' and 'surgery' by leveraging extracted lexical features in tweets. In this paper, we will take advantage of individual life events detections tech to identify stressor related events in footprints.

Methodology

In this section, we will expound details of the generative process of Latent Stressor Discovery Model.

A. Problem Definition

Before presenting the generative process of LSDM, some declarations need to be made, we assume that the users' activities on the social network are a series of stochastic processes under stimuli，#need to say more#.Let $\vec{s} = \{s_0,...,s_N\}$ be the set of stimuli, $\vec{fp}$ represents the fragments of UGCs,named 'footprints', at different times and a window size, $ws$ for a constant cutting factor of timing,so each 'footprint' can be extracted from UGCs in time interval $[ws_{i-1},ws_i],i\ge1$ .For the UGC level, in our proposed model, three aspects are introduced, lexical features, behaviors and the corresponding SNS access time. In this paper, We use 'term' to describe the UGC unit(one action happens at one moment.e.g., tweet posting, liking, forwarding,etc), indicating that one term refers to a combination of three aspects. Let $t,\vec{\iota},\vec{b},\tau$ be the notations of a term, the set of lexical features in one UGC unit, the set of behaviors in one UGC unit and the time that the corresponding unit occurring, so the unit descriptor can be presented as $t = <\vec{\iota},\vec{b},\tau>$ .

Definition 1(Footprint-stressor distribution)

We define a stimuli set of N+1 items, $S=\{s_0,...,s_N\}$ , where $s_0$ represents non-stress related events, whereas others stand for stress-related events. Obviously, in real life, people are affected by different stressors in a different period of time, so we suppose that each footprint has its specific stressor-distribution.Let $\zeta_i$ be the stressor distribution of $i^{th}$ footprint and apparently $\zeta \sim Multinomial$ .Considering the conjugation of $Mult$ , we choose a Dirichlet distribution with a fixed pseudo count $\gamma$ as a generator of $\zeta$ , say, $Dir(\gamma)$ .

Definition 2(Stressor-topic-word distributions)

Each stressor is associated with one or more topics and different stressors have different mixtures of topics, for example, a study-related stressor may refer to topics like 'math', 'examination' and 'grade', a fortune-related stressor, however, relates to topics like 'money', poor' and 'life', thus each stressor can be described as a mixture of small number of topics, viz, a topic distribution. Let $\vartheta_j\in\Theta$ be the topic proportion with K-dimensions in $j^{th}$ stressor and definitely $\vartheta \sim Multinomial$ with $Dir(\vec{\alpha})$ as a conjugate prior, where $\vec{\alpha}$ are hyperparameters that we will set to a constants pseudo count in our paper. Besides, for the topic-word part, let $\varphi\in\Phi, \varphi \sim Multinomial$ be the distribution of words in the corresponding topic.

Definition 3(Lexi-box)

We introduce an extended 'bi-term' method to tackle the issue of sparse words on the social network like Twitter and Sina Weibo to enrich lexical features. The key idea is that the ratio of co-occurring words is in direct proportion to the possibility that they belong to the same topic.We introduce a neighbor-size, $ns$ and for every incoming short message, the method can extract unordered word terms in a fully-connect way.For example,given a message with word sequence: $(w_1,w_2,w_3,w_4)$ ,the output will be:

$\begin{equation} \begin{aligned} & if \ ns = 2 \quad \{(w_1,w_2),(w_2,w_3),(w_3,w_4),(w_2,w_3),(w_2,w_4),(w_3,w_4)\} \\ & if \ ns = 3 \quad \{(w_1,w_2,w_3),(w_1,w_3,w_4),(w_2,w_3,w_4)\} \end{aligned} \end{equation}$

And, more remarkable, the bigger $ns$ value lower the computing-time assumption but lessen the predictive accuracy.After the processing, each document in footprint is represented as a Lexi-box set and every element in a bag is generated independently from the corresponding topic-word distribution $\varphi$ .We denote $\iota \in L$ as a Lexi-box, where $L$ is the whole bag list of the corpus. Additionally, it is worth to be mentioning that it could be null Lexi-box in some terms because sometimes online users only prefer to share multimedia or others' tweets without saying a word.

Definition 4(Stressor-Behavior distribution)

The behavior we defined in this paper represents the minimum access unit of online users, one tweet may contain multiple behaviors(See Fig.1). Let $b_{i,j}\in B$ be the $i^{th}$ behavior that assigns to $j^{th}$ stressor, where $B$ represents the behavior pool that contains all possible behaviors the online users can proceed on SNS(the behavior list in this study will be shown in Table 1) and each term $t$ contains at least one behavior. On the basis of numerous empirical psychological research, different stressor may lead to different coping behaviors, that is to say, we can identify different behavior distribution over different stressors. The stressor-behavior distribution are indicated by $\eta_j$ for the $j^{th}$ stressor.

Definition 5(Timing)

Users post tweets at different times. In this study, we assume that term times are also under influence of different stressors. For simplicity of the algorithm, 24 hours will be partition into 4 intervals with 6 hours timespan for each, that is,[0,6),[6,12),[12,18),[18,0/24). We define $\lambda_{j}$ to declare the timing distribution over the $j^{th}$ stressor.

Table 1.Summary of notations

Notation	Description
$K$	number of topics
$M$	number of footprints
$N$	number of stresss-related events
$L$	lexi-box list
$B$	behavior list
$T_m$	number of terms in the footprint m
$C$	the set of 4 timeintervals
$\vec{t}$	term collection
$\alpha,\beta,\gamma,\delta,\epsilon$	hyperparameters of dirichlet priors for stressor-topic,topic-'lexi-box',footprint-stressor,stressor-timing,stressor-behavior respectively.
$\zeta_i$	stressor distribution of $i^{th}$ footprint
$s_{m,j}$	the $j^{th}$ stressor of footprint $m$
$z_{m,j}$	the topic id assign to stressor $s_{m,j}$
$\varphi$	multinomial distribution of lexi-box to topics
$\vartheta$	topic distribution over stressor
$\iota$	lexi-box item
$\lambda$	timing distribution over stressor
$\eta$	behavior distribution over stressor
$\tau$	timing of term generate
$t$	user generate term

Problem 1(Latent Distribution Discovery)

Given a series of individual footprints,one of our goal is to find out the probability distributions of latent stress-related parameters,such as $\varphi,\vartheta,\lambda,\eta$ and $\zeta$ , and the posterior distributions of stressors, $p(s|z),p(s|\vec{\iota}),p(s|\lambda)$ .

We also interested in the degree of correlation between topics that online users discuss on SNS and stressors that they are suffering in the meantime. The LSDM uses optimized LDA method to allocate topics and find out the convergent probability of topics that are assigned to stressors. The original LDA method is based on the relationship between documents and topics, LSDM, however, put more focus on the assumption that online users are more likely to post stressor-related topics. $\vartheta_j\in \Theta$ is a K-dimensional vector and the purpose is to learn the mapping $f(\Theta|L)$ .

Problem 3(Stressor detection)

By knowing the latent variables and distributions.Utilizing the conjugacy of the Dirichlet-multinomial,LSDM can easily predict the stress state and stressor distributions over new incoming footprints by calculating posterior probability $P(s_{new}|fp_{new},\Phi,\Theta,\eta,\lambda)$ .

Remark 1. In this paper, topics of a term only regard to the topics that are assigned to Lexi-boxes inside, because it makes no sense that behaviors and the timing can provide any cues.

B. Attributes Definition and Data Extraction

To address the issues of latent stressor parameters estimation, we need to define attributes of the three aspects of terms in order to enrich features that can distinguish characteristics of a stressor that is different from others.For the lexical level, we use 'biterm' method to expand short texts to a set of lexi-boxes and we setup a four-dimentional space $timing = \{\tau_0,\tau_1,\tau_2,\tau_3\}$ with 6 hours timespan for the post timing aspect.The behavior part, however, is more complex. We build a three-layer measurement to assess individual behaviors in one term, they are sentiment level, interaction level, and multimedia level.

B.1 Behavior Attributes - Sentiment Level

Sentiment level attributes describe the emotion status of posted terms. We focus on two-term types, normal tweets, and comments. For sentiment analysis, we first adopt SnowNLP, an open source python lib for natural language processing and the result is bad due to the embedded sentiment classifier is based on the e-commerce shopping comments. Eventually, we use the Chinese psychological dictionary, LIWC2015, to classify words into positive or negative emotions and additionally, emoticons, degree adverbs, and first personal pronouns are also considered. The use of degree adverbs is to determine the sentiment polarity of each term and first personal pronouns are to filter terms whose contents are not related to the poster. The emoticons in Sina Weibo are encoded into base64 icons, so we build a decoder and integrate it into our crawler, then crawled emoticons are auto decoded into square brackets format(e.g.,[EMOTICON: smile]) that we can easily extract emoticons by using the regular expression. The sentiment approach is in a very common way, tokenize, lemmatize and then build maps between words and LIWC2015.

B.2 Behavior Attributes - Interaction Level

Interaction level attributes, including mentioning, liking, retweeting, commenting, etc reflect the frequencies and weights of communication among users, thus we can dig out the degree of familiarity for each user-friend pair simply by number counting. Because users may tend to vent bad moods to friends that they are more familiar with. Moreover, we also consider the situation that users interact with public media(PM) due to some stressed users may visit and retweet contents that are generated by stressor-related PM. We propose a method that matches tags and description posted on PMs' public profile with categories of the LIWC dictionary to categorize PM into different stressors. So, we define two groups for PM interactions,stressor-related and normal.

B.3 Behavior Attributes - Multimedia Level

Multimedia level attributes contain resources of photos, music,videos, external links, etc. As for multimedia behaviors collecting, according to users' trigger actions, such behaviors will be classified into three dimensions, 'original published','retweet' and 'like'(e.g., publish with photos, retweet with photos, etc).

Remark 2. Considering finely sorted behaviors of three aspects, such enlarged features help LSDM to better learn the distinctions of different stressors and make stressor prediction on new incoming footprints more accurate. Note that behaviors can overlap on one term. Take an example, a term, I am so happy to be here!!@joe,@barton', contains three behaviors - 'publish positive emotion' behavior and two 'mention' behaviors. Table 2 shows the summary of behavior list.

Table 2.Behavior list used in LSDM

Category	Type	Object
Sentiment Level	publish	positive emotion tweet
		neutral emotion tweet
		negative emotion tweet
	forward	positive emotion tweet
		neutral emotion tweet
		negative emotion tweet
	like	positive emotion tweet
		neutral emotion tweet
		negative emotion tweet
	comment	positive emotion comment
		neutral emotion comment
		negative emotion comment
	reply	positive emotion comment
		neutral emotion comment
		negative emotion comment
Interaction level	share	friend's tweet
		stressor related org tweet
		normal org tweet
	comment	friend's tweet
		stressor related org tweet
		normal org tweet
	like	friend's tweet
		stressor related org tweet
		normal org tweet
	mention	fans or follows
	reply	comments

C. Model Description

     This section will present the framework of our model and detail the generative process.Fig.2 illustrates the probabilistic graphics model of LSDM which presents a stochastic process of term generation under stressors(stimuli). For instance, an individual suffers from a random stressor, he or she will randomly pick a stressor-related topic, have a series of behaviors, and generate a term onto SNS at a random time. Notice that components of a term data $t=<\vec{\iota},\vec{b},\tau>$ are observation evidences, whereas stressors and topics are latent variables to be estimated.
    Each Lexi-box $\iota \in \vec{\iota}$ is generated by a topic-word distribution $\varphi_z$ ,the topic $z$ of which is drawn by a stressor-topic distribution $\vartheta_{s=j}$ .For notation simplicity,we use $\vartheta_j$ instead, $\vartheta_{s=j} = \vartheta_j$ .As for each behavior, $b \in \vec{b}$ , which is drawn by a stressor-behavior distribution $\eta_j$ whose conjugate prior is a Dirichlet distribution with the hyperparameter $\epsilon$ .Similarly, we choose Dirichlet distribution as a priori for the generator of the time variable, $\lambda$ .
    We note that the joint probability of stressors and terms $P(S,\vec{t};\alpha,\beta,\delta,\epsilon,\gamma)$ is the most
critical quantity which needs to be measured in the process.So we'll follow LSDM generative process to estimate the joint probability.Recall the aforementioned example, in LSDM, a term is generated by a series of the stochastic process based on Bayesian rules and all components are drawn from distributions that are related to stressors, so the probability of terms generation can be written as:

$P(\vec{b},\vec{\iota},\tau;\alpha,\beta,\delta,\epsilon)=P(\vec{b};\alpha,\beta,\delta,\epsilon)\cdot P(\vec{\iota};\alpha,\beta,\delta,\epsilon)\cdot P(\tau;\alpha,\beta,\delta,\epsilon)$

The equation shows that the term creation probability is made up of three generation processes, the behavior generation process, the Lexi-box generation process and the time generation process. To facilitate the integration of such complex formula, we assume that term topics only depends on the lexical part, so the equation will be simplified to:

$P(t|\alpha,\beta,\delta,\epsilon)=P(\vec{b};\epsilon)\cdot P(\vec{\iota};\alpha,\beta)\cdot P(\tau;\delta) \tag{1}$

    Before mathmatical derivation, we will formally summarize the generative process as follows:

    From Bayesian probability theory, we get:

$\begin{equation} \begin{aligned} P(S,\vec{t};\alpha,\beta,\delta,\epsilon,\gamma) &= P(\vec{t}|S;\alpha,\beta,\delta,\epsilon)P(S;\gamma) \end{aligned} \tag{2} \end{equation}$

Substitude Eq(1) to Eq(2):

$\begin{equation} \begin{aligned} P(S,\vec{t};\alpha,\beta,\delta,\epsilon,\gamma) &= P(S;\gamma)\cdot P(\vec{t}|S;\alpha,\beta,\delta,\epsilon)\\ &=P(S|\vec{\zeta})P(\vec{\zeta}|\gamma)\times \\ &\quad P(B|S,\epsilon)P(C|S,\delta)P(L|S,\alpha,\beta)\\ &=P(S|\vec{\zeta})P(\vec{\zeta}|\gamma)\times\\ &\quad P(B|S,\vec{\eta})P(\vec{\eta}|\epsilon)\cdot P(C|S,\vec{\lambda})P(\vec{\lambda}|\delta)\times \\ &\quad P(Z|S,\vec{\theta})P(\vec{\theta}|\alpha)P(L|Z,\vec{\phi})P(\vec{\phi}|\beta) \end{aligned} \tag{3} \end{equation}$

Where $S=\{s_{m,j}\},\vec{\zeta}=\{\zeta_m\},\vec{\eta} = \{\eta_{m,j}\},\vec{\lambda} = \{\lambda_{m,j}\}, \vec{\theta} = \{\vartheta_{m,j}\},\vec{\phi}=\{\varphi_{j,k}\}$ .All latent variables are i.i.d, so we can derive the posterior distribution by integrating latent variables, $\zeta,\eta,\lambda,\theta,\phi$ of Eq(3):

$\begin{equation} \begin{aligned} P(S,\vec{t};\alpha,\beta,\delta,\epsilon,\gamma) &= \int_{\zeta}P(S|\zeta)P(\zeta|\gamma)d\zeta\times \\ &\quad \int_{\eta}P(B|S,\eta)P(\eta|\epsilon)d\eta \times \\ &\quad \int_{\lambda}P(C|S,\lambda)P(\lambda|\delta)d\lambda \times \\ &\quad \int_{\phi}\int_{\theta}P(Z|S,\theta)P(\theta|\alpha)P(L|Z,\phi)P(\phi|\beta)d\phi d\theta \end{aligned} \tag{4} \end{equation}$

Eq(4) shows that the joint distribution is the product of 4 independent generation processes. By applying the conjugacy of Dirichlet and Multinomial, the result of Eq(4) can be easily derived. For lack of space, we here only detail one term of it. Let's pick the behavior generation part as an example.Let $G$ be the notation that represents the generation process, i.e. $G_b \equiv \int_{\eta}P(B|S,\eta)P(\eta|\epsilon)d\eta$ , thus we have:

$\begin{equation} \begin{aligned} G_b &= \int_{\eta}Mult(B|S,\eta)\cdot Dir(\eta|\epsilon)d\eta \\ &=\int_\eta\prod^B_{b=1}\prod^N_{j=0}\eta^{n(b,j)}_{b,j}\cdot \prod^N_{j=0}Dir(\eta_{b,j}|\epsilon)d\eta\\ &=\int_\eta\prod^B_{b=1}\prod^N_{j=0}\eta^{n(b,j)}_{b,j}\cdot \prod^N_{j=0}\frac{\Gamma(\sum^N_{j=0}\epsilon_j)}{\prod^N_{j=0}\Gamma(\epsilon_j)}\prod^N_{j=0}\eta^{\epsilon_v-1}_{b,j}d\eta_{b,j} \\ &=\int_\eta\prod^N_{j=0}\frac{\Gamma(\sum^B_{b=1}\epsilon_b)}{\prod^B_{b=1}\Gamma(\epsilon_b)}\cdot \prod^B_{b=0}\eta^{n(b,j) + \epsilon_b - 1}_{b,j}d\eta_{b,j} \end{aligned} \tag{5} \end{equation}$

For equation simplicity,we define a delta function: $\Delta(\epsilon)=\frac{\prod^B_{b=1}\Gamma(\epsilon_b)}{\Gamma(\sum^B_{b=1}\epsilon_b)}$ , and the property that $\int Dir(\eta_{b,j}|n(b,j)+\beta_b)d\eta = \int_\eta \frac{1}{\Delta(n(b,j) + \beta_b)}\prod^B_{b=1}\eta^{n(b,j) + \beta_b - 1}_{b,j}d\eta_{b,j}=1$

$\begin{equation} \begin{aligned} Eq(5)&=\int_\eta\prod^N_{j=0}\frac{1}{\Delta(\epsilon)}\cdot (\frac{\Delta(n(b,j) + \epsilon_b)}{\Delta(n(b,j) + \epsilon_b)})\cdot \prod^B_{b=1}\eta^{n(b,j) + \epsilon_b - 1}_{b,j}d\eta_{b,j} \\ &=\prod^N_{j=0}\frac{\Delta(\vec{n}^{(b)}_j+\epsilon)}{\Delta(\epsilon)}\cdot \int Dir(\eta_{b,j}|n(b,j)+\epsilon_b)d\eta \\ &= \prod^N_{j=0}\frac{\Delta(\vec{n}^{(b)}_j+\epsilon)}{\Delta(\epsilon)} \end{aligned} \tag{6} \end{equation}$

Where $\vec{n}^{(b)}_j = \{n^{(b)}_j\}^{B}_{b=1}$ denotes a B-dimensional vector that the number of behaviors that are distributed in the stressor $j$ .Similarly, use the same approach to derive other generation processes and finally, we will get the joint distribution as follows:

$\begin{equation} \begin{aligned} P(S,\vec{t};\alpha,\beta,\delta,\epsilon,\gamma) &= G_b\cdot G_\iota\cdot G_z\cdot G_\tau\cdot G_s \\ &= \prod^N_{j=0}\frac{\Delta(\vec{n}^{(b)}_j+\epsilon)}{\Delta(\epsilon)}\cdot \prod^K_{k=1}\frac{\Delta(\vec{n}^{(l)}_k + \beta)}{\Delta(\beta)}\cdot \\ &\quad \prod^{N}_{j=0}\frac{\Delta(\vec{n}^{(z)}_j + \alpha)}{\Delta(\alpha)}\cdot \prod^{N}_{j=0}\frac{\Delta(\vec{n}^{(\tau)}_j+\delta)}{\Delta(\delta)}\cdot \\ &\quad \prod^M_{m=1}\frac{\Delta(\vec{n}^{(s)}_m+\gamma)}{\Delta(\gamma)} \\ &= \prod^N_{j=0}[\frac{\Delta(\vec{n}^{(b)}_j+\epsilon)}{\Delta(\epsilon)} \frac{\Delta(\vec{n}^{(\tau)}_j+\delta)}{\Delta(\delta)} \frac{\Delta(\vec{n}^{(z)}_j + \alpha)}{\Delta(\alpha)}]\times \\ &\quad \prod^N_{j=0}\prod^K_{k=1}\frac{\Delta(\vec{n}^{(l)}_{j,k} + \beta)}{\Delta(\beta)}\cdot \prod^M_{m=1}\frac{\Delta(\vec{n}^{(s)}_m+\gamma)}{\Delta(\gamma)} \end{aligned} \tag{7} \end{equation}$

Note that $\vec{n}^{(l)}_{j,k}$ represents a L-dimention vector that number of lexi-boxes are assigned to topic $k$ in stressor $j$ .After getting the joint distribution of all latent variables, either Gibbs sampling method or expectation maximization method can be applied to estimate the latent variables, $\phi,\theta,\zeta,\lambda,\eta$ ,in other words, the estimators of latent parameters satisfy

$\hat{var} = \mathop{argmax} \limits_{var} P(var|evidence)$
In this paper,we plan to use Gibbs sampling to generate samples for stressor-related latent variables,

$s,z,b,\tau$ .

D.Gibbs Sampling Inference

As we discussed above, in LSDM, a stressor can lead to three generation processes,so if we want to detect whether a stressor is occurring in the period of a footprint,the best way is to compute the maximum probability of the stressor on given evidences,say,terms, $P(s_j|\vec{t}) = P(s_j|\vec{z},\vec{\iota};\vec{b};\vec{\tau})$ .With rules of Bayes theorem, $P(s_j|\vec{t})$ can be written as a summation of conditional probabilites of variables, $\vec{z},\vec{b},\vec{\tau}$ :

$P(s_j|\vec{t}) = P(s_j|\vec{z},\vec{\iota}) + P(s_j|\vec{b})+P(s_j|\tau) \tag{8}$

We first focus on the conditional probability of $s_i = j$ base on the remaining latent variables collection $\vec{s}_{-j}$ and topic-lexi generation variables $Z,L$ .

$\begin{equation} \begin{aligned} P(s_i=j|S_{-i},Z,L;\alpha,\beta,\gamma) &= \frac{P(s_i = j,S_{-i},Z,L;\alpha,\beta,\gamma)}{P(S_{-i},Z,L;\alpha,\beta,\gamma)} \\ &\propto P(Z|s_i=j,S_{-i},L;\alpha,\beta,\gamma)\times \\ &\quad P(s_i,S_{-i},L;\alpha,\beta,\gamma) \\ \end{aligned} \tag{9} \end{equation}$

where the symbol $S_{-i}$ means the remaining stressor set excluding $s_i$ from $S$ .For the first term of (9),

$\begin{equation} \begin{aligned} P(Z|s_i=j,S_{-i},L;\alpha,\beta,\gamma) &= P(Z|s_i,S_{-i};\alpha) \\ &\propto \frac{\prod^K_{k=1}\Gamma(\alpha_k+n^k_j)}{\Gamma(\sum^K_{k=1}(n^k_j + \alpha_k))}\cdot \\ &\frac{\Gamma(\sum^K_{k=1}(n^k_{n=j,-i} + \alpha_k))}{\sum^K_{k=1}\Gamma(\alpha_k + n^k_{n=j,-i})} \end{aligned} \tag{10} \end{equation}$

With the fact that the property of gamma distribution : $\Gamma(\alpha + 1) = \alpha\Gamma(\alpha)$ and $n^k_n = n^k_{n=j,-i} + 1$ .Thus the result of $(10)$ :

$Eq(10) = \frac{\alpha+n^k_{j,-i}}{K\alpha + \sum^K_{k=1}n^k_{j,-i}}$

The second part of $(9)$ :

$\begin{equation} \begin{aligned} P(s_i,S_{-i},L;\alpha,\beta,\gamma) &= P(s_i,S_{-i};\gamma) \\ &= \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}} \end{aligned} \tag{11} \end{equation}$

Substitude $(10),(11)$ into $(9)$

$\begin{equation} \begin{aligned} P(s_i = j|S_{-i},Z,L;\alpha,\beta,\gamma) &= \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}}\cdot \\ & \frac{\alpha+n^k_{j,-i}}{K\alpha + \sum^K_{k=1}n^k_{j,-i}} \tag{12} \end{aligned} \end{equation}$

Considering the sub-generation process of $(9)$ , the probability distribution of topic conditioned to Lexi-boxes as follows:

$\begin{equation} \begin{aligned} P(z_i=k,\iota_i=l|Z_{-i},L_{-i};\alpha,\beta) &= \int_\phi\int_\theta P(z_i=k,\iota_i=l,\phi,\theta|Z_{-i},L_{-i};\alpha,\beta) d\phi d\theta \\ &= \int_\theta \vartheta_{j,k}Dir(\theta|Z_{-i},L_{-i};\alpha)d\theta\times \\ &\quad \int_\phi \varphi_{k,l}Dir(\phi|Z_{-i},L_{-i};\beta)d\phi \\ &= E(\vartheta_{j,k})E(\varphi_{k,l}) \\ &= \frac{\alpha + n^k_{j,-i}}{K\alpha+\sum^K_{k=1}n^k_{j,-i}}\cdot \frac{\beta + n^l_{k,-i}}{L\beta + \sum^L_{l=1}n^l_{k,-i}} \end{aligned} \tag{13} \end{equation}$

Applying similar approach to behavior part and timing part,

$\begin{equation} \begin{aligned} P(s_i=j|S_{-i},B;\epsilon,\gamma) &= \int_\zeta\int_\eta P(s_i=s,b_i=b,\zeta,\eta|S_{-i},b_{-i},\epsilon,\gamma)d\zeta d\eta \\ &= E(\zeta_{m,j})E(\eta_{j,b}) \\ &= \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}}\cdot \frac{\epsilon + n^b_{j,-i}}{B\epsilon+\sum^B_{b=1}n^b_{j,-i}} \end{aligned} \tag{14} \end{equation}$

$\begin{equation} \begin{aligned} P(s_i=j|S_{-i},C;\epsilon,\gamma) &= \int_\zeta\int_\eta P(s_i=s,\tau_i=b,\zeta,\lambda|S_{-i},\tau_{-i},\delta,\gamma)d\zeta d\lambda \\ &= E(\zeta_{m,j})E(\lambda_{j,c}) \\ &= \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}}\cdot \frac{\delta + n^c_{j,-i}}{4\delta+\sum^4_{c=1}n^c_{j,-i}} \end{aligned} \tag{15} \end{equation}$

Rewrite Eq(12）by integrating the topic-lexibox generation process equation, say, Eq(13), so we can compute the conditional probability of stressor on given Lexi-boxes:

$\begin{equation} \begin{aligned} P(s_i=j,\iota_i=l|S_{-i},L_{-i};\alpha,\beta) &\propto \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}}\cdot \\ & \frac{\alpha+n^k_{j,-i}}{K\alpha + \sum^K_{k=1}n^k_{j,-i}}\cdot \frac{\beta + n^l_{k,-i}}{L\beta + \sum^L_{l=1}n^l_{k,-i}} \end{aligned} \tag{16} \end{equation}$

By iteratively using MCMC algorithm to do sampling with $(14),（15)$ and $(16)$ , it will eventually get the Markov steady state to achieve the stationary distribution. Afterward,given the sampled stressors, topics, behaviors and times the final estimators of latent variables can be measured with the expectation value of the corresponding Dirichlet distribution:

$\hat{\varphi}_{k,l} = E(\varphi_{k,l}) = \frac{\beta + n^l_{k,-i}}{L\beta + \sum^L_{l=1}n^l_{k,-i}}\\ \hat{\vartheta}_{j,k} = E(\varphi_{j,k}) = \frac{\alpha + n^k_{j,-i}}{K\alpha+\sum^K_{k=1}n^k_{j,-i}}\\ \hat{\lambda}_{j,c} = E(\lambda_{j,c}) = \frac{\delta + n^c_{j,-i}}{4\delta+\sum^4_{c=1}n^c_{j,-i}}\\ \hat{\eta}_{j,b} = E(\eta_{j,c}) = \frac{\epsilon + n^b_{j,-i}}{B\epsilon+\sum^B_{b=1}n^b_{j,-i}}\\ \hat{\zeta}_{m,j} = E(\zeta_{m,j}) = \frac{\gamma + n^j_{m=m_i,-i}}{(N+1)\gamma + \sum^N_{j=0}n^j_{m=m_i,-i}} \tag{17}$
With derived stressor-topic distribution

$\phi$ ,stressor-behavior distribution

$\eta$ and stressor-timing distribution

$\lambda$ , referring Eq(8), the stressor probability distribution for each new incoming footprint can be evaluated by summing all posterior distributions of topic,behavior, and timing:

$P(s =j|fp) \\= \frac{P(s_j)\cdot(\prod_{\iota\in L_{fp}}P(\iota|s_j) + \prod_{b \in B_{fp}}P(b|s_j) + \prod_{\tau \in C_{fp}}P(\tau|s_j))}{\sum_sP(s)\cdot(\prod_{\iota\in L_{fp}}P(\iota|s) + \prod_{b \in B_{fp}}P(b|s) + \prod_{\tau \in C_{fp}}P(\tau|s))} \tag{18}$
where

$P(s_j)$ means the prior probability of the stressor

$j$ in the whole training set and

$(\cdot)_{fp}$ is the data set in the testing evidence. Thus, by applying

$Eq(18)$ we can benchmark the detection performance of the LSDM approach.

LSDM Algorithm

Input : footprint cutting factor, $ws$ ,a series of preprocessed SNS data,topic counts,iteration $I$ ;
Output : latent parameter sets, $\vec{\theta},\vec{\phi},\vec{\zeta},\vec{\lambda},\vec{\eta}$ ;full stressor distribution vectors $\vec{s}$ over footprints;
Initialize zero matrixes, $n_{m,j},n_{j,k},n_{k,l},n_{j,l},n_{j,b},n_{j,\tau}$
Initialize zero vectors, $n_m,n_k,n_l,n_j,n_\tau,n_b$
Extract footprints $\vec{fp} = \{fp_1,..,fp_M\}$ by using cutting factor $ws$ .
Initialization
for each footprints $m \in M$ do
    for each term $t \in T_m$ do
        sample stressor index $s^\iota_{m,t} = j \sim Mult(1/(N+1))$
        for each lexi-box item $l$ do
            sample topic index $z_{j,l} = k \sim Mult(1/K)$
            increment stressor-topic matrix $n_{j,k} + 1$
            increment stressor-lexibox matrix $n_{j,l} + 1$
            increment topic-lexibox matrix $n_{k,l} + 1$
            increment stressor-topic sum $n_j + 1$
            increment topic-lexibox sum $n_k + 1$
            increment stressor-lexibox sum $n_l + 1$
        end for
        increment stressor-behavior matrix $n_{j,b} + 1$
        increment stressor-timing matrix $n_{j,\tau} + 1$
        increment footprint-stressor sum $n_m + 1$
        increment stressor-behavior sum $n_b + 1$
        increment stressor-timing sum $n_\tau + 1$
    end for
end for
Gibbs burn-in period and sampling period
while not finished do
    for each footprint $m$ do
        for each term $t$ do
            for each lexi-box $l$ do
                for the current assignment of topic k and stressor j and for lexi-box $\iota_{m,l}$ :
                decrement sums and counts: $n_{j,k}-1,n_{j,l}-1,n_{k,l}-1,n_j-1,n_k-1,n_l-1$
                multinomial topic sampling process acc.to Eq(12) and stressor sampling process acc.to Eq(12):
                sample topic index $z^{new}_{j,l}=k^{new} \sim p(z_i,\iota_i|Z_{-i},L_{-i})$
                sample stressor index $s^{new} \sim p(s_i,z_i = z^{new}|S_{-i},Z_{-i})$
                reassign new generated topic and stressor:
                increment sums and counts: $n_{s^{new},k^{new}}+1,n_{k^{new},l}+1,n_{s^{new}}+1,n_{k^{new}}+1$
            end for
            for each behavior $b$ do:
                decrement sums and counts: $n_{j,b}-1,n_b-1$
                multinomial behavior sampling process acc.to Eq(14):
                sample stressor index $s^{new} \sim p(s_i|S_{-i},B_{-i})$
                decrement sums and counts: $n_{s^{new},b}+1,n_b+1$
            end for
            decrement sums and counts: $n_{j,\tau} - 1, n_\tau - 1$
            multinomial timing sampling process acc.to Eq(15)
            sample stressor index $s^{new} \sim p(s_i|S_{-i},C_{-i})$
            increment sums and counts: $n_{s^{new},\tau} + 1, n_\tau + 1$
        end for
    until achieved markov steady state or iteration finished
    end for

E.Algorithm Complexity Analysis And Optimizatioin

As in LSDM, we will proceed sampling process on topic-lexibox,stressor-topic,stressor-behavior and stressor-timing.Set the number of iteration is $I$ ,the number of footprints is $M$ ,the average number of terms in footprints is $\bar{T}$ ,the average number of unduplicated words in each term is $\bar{w}$ ,so the average numer of lexi-boxes can be calculated by $N_w =\sum^{(\bar{w}-(ns - 1))}_{w=1}w$ and the average number of behaviors in each term is $\bar{B}$ .Note that we will ignore the stressor-timing sampling because it only contributes one calculation in each term iteration, thus the computational complexity of LSDM is $O(IM\bar{T}\times max(N_w,\bar{B}))$ .Moreover,the gamma function $\Gamma(x)$ in each sampling may lead to the heavy computing load when x is a big value.For space complexity,during the entire procedure, the program needs to keep the counters $n_z,n_s,n_{\iota|z},n_b,n_\tau,n_{s|z}$ ,topic index vectors $\vec{z}$ for lexi-boxes over all footprints and the stressor assignment $s_z,s_b,s_\tau$ for topics,behaviors and timing respectively.In total of $((N + 1) + C + B + (N+1)C + (N+1)B + (N+1)K + K + |L|)$ variables in memory.Let $V$ be the vocabulary size of the whole corpus,the lexi-box volume of LSDM can be computed as follows:

$\begin{equation} L = \left\{ \begin{array}{lr} V \qquad \qquad \qquad ns = 1 & \\ \sum^{(V-(ns - 1))}_{v=1}v \quad V> ns > 1 & \\ 1 \qquad \qquad \qquad ns = V \end{array} \right. \end{equation}$

It is worth noting that the lexi-box volume is proportional to the vocabulary size of the footprint and it brings on high memory load and a larg-scale sparse lexi-box matrix when the vocabulary size is a large value.
Gamma function computational complexity optimization. To tackle the computational complexity of Gamma computing $\Gamma(x)$ for a large $x$ ,we introduce an approximative approach,say, Stirling's formula to simplified such Gamma computing.In mathematics, Stirling's formula is used to approximate the value of factorial.While Gamma function is an extention of factorial and it is defined on $\mathcal{R}$ ,nevertheless,Stirling's formula may still be applied for Gamma computing approximation, $\Gamma(x) = \sqrt{\frac{2\pi}{x}}\cdot (\frac{x}{e})^x(1 + O(\frac{1}{x}))$ .By replacing all gamma computings in the LSDM iterations by Stirling's approximation equation for gamma function and it significantly saves in computational time.
An approximator for the posterior probability of footprint-lexiboxes. In LSDM, LDA algorithm is applied to extract topics from given footprints and the latent footprint-topic distribution can be computed via $P(z|fp) = \int P(z|\iota)P(\iota|fp)$ ,in which, $P(\iota|fp)$ is a very time-consuming part to obtain.Here,for computing simplicity,we directly take the frequency ratio as the estimator for lexi-boxes in the footprint:

$P(\iota|fp) = \frac{n^{(l)}_{fp}}{\sum_ln^{(l)}_{fp}}$
According to our empirical finding in short texts,the distribution of the lexi-box over the footprint nearly equals to an uniform distribution,thus we simply take this estimation as a substitution for

$P(\iota|fp)$ in order to reduce the computational complexity.
Large-scale sparse lexi-box matrixes compression. The LSDM enriches lexical features over short texts by employing an extended 'biterm' method, compare to the ordinary LDA whose lexical matrix capacity equals the word count

$V$ , the volume of the LSDM lexical space is extremely magnified serveral times.Due to scarce of tweet-level lexical features,the lexi-box matrix for each footprint is sparse and impact high memory load during temporary computing.

Experimental

In this section, we will validate the performance and effectiveness of LSDM on our crawled data.

A. DataSet

We crawled over 95 million tweets, 75 million comments and 15 million tagged photos posted by over 65,000 online-users from Sina Weibo. The computing platform is built with Intel i5 processor

References

    [1] Qi Li, Liang Zhao, Yuanyuan Xue, Li Jin, Mostafa Alli, Ling Feng, Correlating Stressor Events for Social Network Based Adolescent Stress Prediction. DASFAA (1) 2017: 642-658
    [2] Huijie Lin, Jia Jia, Jiezhong Qiu, Yongfeng Zhang, Guangyao Shen, Lexing Xie, Jie Tang, Ling Feng, TSC, Detecting Stress Based on Social Interactions in Social Networks. IEEE Trans. Knowl. Data Eng. 29(9): 1820-1833 (2017)
    [3] Q. Li, Y. Xue, L. Zhao, J. Jia, L. Feng, “Analyzing and identifying teens’ stressful periods and stressor events from a microblog,” IEEE Journal of Biomedical and Health Informatics, vol. 21, no. 5, pp. 1434–1448, 2017.
    [4]: Boon-Giin Lee, Wan-Young Chung: 
Wearable Glove-Type Driver Stress Detection Using a Motion Sensor. IEEE Trans. Intelligent Transportation Systems 18(7): 1835-1844 (2017)
    [5]: Lucio Ciabattoni, Francesco Ferracuti, Sauro Longhi, Lucia Pepa, Luca Romeo, Federica Verdini, Real-time mental stress detection based on smartwatch. ICCE 2017: 110-111
    [6]: Óscar Martínez Mozos, Virginia Sandulescu, Sally Andrews, David Ellis, Nicola Bellotto, Radu Dobrescu, José Manuel Ferrández, Stress Detection Using Wearable Physiological and Sociometric Sensors. Int. J. Neural Syst. 27(2): 1-16 (2017)
    [7] Yuanyuan Xue, Qi Li, Li Jin, Ling Feng, David A. Clifton, Gari D. Clifford, Detecting Adolescent Psychological Pressures from Micro-Blog. HIS 2014: 83-94
    [8] Mike Thelwall, TensiStrength: Stress and relaxation magnitude detection for social media texts. Inf. Process. Manage. 53(1): 106-121 (2017)
    [9] Huijie Lin, Jia Jia, Liqiang Nie, Guangyao Shen, Tat-Seng Chua, What Does Social Media Say about Your Stress?. IJCAI 2016: 3775-3781
    [10] Blei, D.M., Ng, A.Y., Jordan, M.I.,Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993-1022(2003)

    [49] J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank, finding
topic-sensitive influential twitterers. In WSDM, 2010.
    [50] Thomas Hofmann, Probabilistic Latent Semantic Analysis. CoRR abs/1301.6705 (2013)
    [51] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting depression via social media,” In AAAI Conference on Weblogs and Social Media. pp. 128-137, 2013.
    [52] C. M Homan, N. Lu, X. Tu, M. C Lytle, and V. Silenzio, “Social structure and depression in TrevorSpace,” In Proc. of the 17th ACM conference on Computer supported cooperative work & social computing. pp. 615–625, 2014.
    [53] H. Andrew Schwartz, J. Eichstaedt, M. L Kern, G. Park, M. Sap, D. Stillwell, M. Kosinski, and L. Ungar, “Towards assessing changes in degree of depression through facebook,” In Proc. of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. pp. 118–125, 2014.
    [54] S. Tsugawa, Y. Kikuchi, F. Kishino, K. Nakajima, Y. Itoh, and H. Ohsaki, “Recognizing Depression from Twitter Activity,” In Proc. of the 33rd Annual ACM Conference on Human Factors in Computing Systems. pp. 3187–3196, 2015.
    [55] M. Park, D. W McDonald, and M. Cha, “Perception Differences between the Depressed and Non-depressed Users in Twitter,” In Proc. of ICWSM. pp. 476–485, 2013.
    [56] M. De Choudhury, S. Counts, and E. Horvitz, “Predicting postpartum changes in emotion and behavior via social media,” In Proc. of the 2013 ACM annual conference on Human factors in computing systems. pp. 3267–3276, 2013.
    [57] M. De Choudhury, S. Counts, E. Horvitz, and A. Hoff, “Characterizing and Predicting Postpartum Depression from Facebook Data,” In Proc. of the ACM Conference on Computer Supported Cooperative Work and Social Computing. pp. 626–638, 2014.
    [58] M. Mitchell, K. Hollingshead, and G. Coppersmith, “Quantifying the Language of Schizophrenia in Social Media,” In Proc. of the NAACL Workshop on Computational Linguistics and Clinical Psychology. pp. 11–20, 2015.
    [59] Vivek Kumar Rangarajan Sridhar, Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words. VS@HLT-NAACL 2015: 192-200
    [60] Jianhui Pang, Xiangsheng Li, Haoran Xie, Yanghui Rao, SBTM: Topic Modeling over Short Texts. DASFAA Workshops 2016: 43-56
    [61] Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, Zongyang Ma, Topic Modeling for Short Texts with Auxiliary Word Embeddings. SIGIR 2016: 165-174
    [62] Jipeng Qiang, Ping Chen, Tong Wang, Xindong Wu,Topic Modeling over Short Texts by Incorporating Word Embeddings. CoRR abs/1609.08496 (2016)
    [63] Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, Zongyang Ma, Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings. ACM Trans. Inf. Syst. 36(2): 11:1-11:30 (2017)
    [64] X. Phan, L. Nguyen, and S. Horiguchi, “Learning to classifyshort and sparse text & web with hidden topics from large-scale data collections,” in WWW. ACM, 2008, pp. 91–100.
    [65] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In SIGIR, 2013.
    [66] O. Jin, N. Liu, K. Zhao, Y. Yu, and Q. Yang, “Transferringtopical knowledge from auxiliary long texts for short textclustering,” in CIKM. ACM, 2011, pp. 775–784.
    [67] X. Yan, J. Guo, Y. Lan, and X. Chen, A biterm topic model for short texts. In WWW, 2013.
    [68] Davidiv, D., Tsur, O., Rappoport, A., Enhanced Sentiment Learning Using Twitter Hash-tags and Smileys. In: Proceedings of the 23rd International Conference on Computational in Linguistics, pp. 241–249. Coling 2010 Organizing Committee,Beijing, China (2010)
    [69] Barbosa, L., Feng, J.L.: Robust Sentiment Detection on Twitter from Biased and Noisy Data. In, Proceedings of the 23rd International Conference on Computational in Linguistics, pp. 36–44. Coling 2010 Organizing Committee, Beijing, China (2010)
    [70] Go, A., Huang, L., Bhayani, R., Twitter Sentiment Classification using Distant Supervision. Project Report, CS224N (2009)
    [71] Jiang, L., Yu, M., Zhou, M., Liu, X., Zhao, T.: Target-dependent Twitter Sentiment Classification. In, Proceeding of 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 151–160 (2011)
    [72] Parikh, R., Movassate, M.,Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques. Final Report, CS224N (2009)
    [73] Quan C, Ren F. Construction of a blog emotion corpus for chinese emotional expression analysis. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 3-volume 3. Association for Computational Linguistics; p. 1446–1454. 2009.
    [74] Zhao Y., Qin B., Liu T. Creating a fine-grained corpus for chinese sentiment analysis. IEEE Intell Syst. 2014;30(5):36–43.
    [75] Hui-Hsin W, Tsai AC-R, Tsai RT-H, Hsu JY-J. Building a graded chinese sentiment dictionary based on commonsense knowledge for sentiment analysis of song lyrics. J Inf Sci Eng. 2013;29(4):647–62.
    [76] Yan S, Li S. Constructing chinese sentiment lexicon using bilingual information. In: Chinese lexical semantics. Springer; p. 322–331. 2013.
    [77] Liu L, Lei M, Wang H. Combining domain-specific sentiment lexicon with hownet for chinese sentiment analysis. J Comput. 2013;8(4):878–83.
    [78] Hongzhi X, Zhao K, Qiu L, Changjian H. Expanding chinese sentiment dictionaries from large scale unlabeled corpus. In: PACLIC. p. 301–310. 2010.
    [79] Ge X, Meng X, Wang H. Build chinese emotion lexicons using a graph based algorithm and multiple resources. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics; p. 1209–1217. 2010.
    [80] Wang B, Huang Y, Xian W, Li X. A fuzzy computing model for identifying polarity of chinese sentiment words. Comput Intell Neurosci. 2015;2015.
    [81] Tan S, Zhang J. An empirical study of sentiment analysis for chinese documents. Expert Syst Appl. 2008;34(4):2622–9.
    [82] Zhai Z, Hua X, Kang B, Jia P. Exploiting effective features for chinese sentiment classification. Expert Syst Appl. 2011;38(8):9139–46.
    [83] Zengcai S, Hua X, Zhang D, Yunfeng X. Chinese sentiment classification using a neural network tool - word2vec. In: 2014 International conference on multisensor fusion and information integration for intelligent systems (MFI). IEEE; p. 1–6. 2014.
    [84] Xiang L. Ideogram based chinese sentiment word orientation computation. arXiv preprint arXiv:1110.4248. 2011.
    [85] Wei X, Liu Z, Wang T, Liu S. Sentiment recognition of online chinese micro movie reviews using multiple probabilistic reasoning model. J Comput. 2013;8(8):1906–11.
    [86] Cao Y, Chen Z, Ruifeng X, Chen T, Gui L. A joint model for chinese microblog sentiment analysis. ACL-IJCNLP. 2015;2015:61.
    [87] LiL, LuoD, LiuM, ZhongJ, YeW, SunL.A self-adaptive hidden markov model for emotion classification in chinese microblogs. In: Mathematical problems in engineering. 2015.
    [88]Y.Kim,‘‘Convolutional neural networks for sentence classification,’’presented at the Conf. Empirical Methods Natural Lang. Process. (EMNLP), Doha, Qatar, Oct. 2014, pp. 1746–1751.
    [89] R. Johnson and T. Zhang. (2014). ‘‘Effective use of word order for text categorization with convolutional neural networks.’’ [Online]. Available: https://arxiv.org/abs/1412.1058
    [90] R.JohnsonandT.Zhang,‘‘Semi supervised convolutional neural networks for text categorization via region embedding,’’ in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 919–927.
    [91] D. Tang, Q. B. FuruWei, T. Liu, and M. Zhou Coooolll:, ‘‘A deep learning system for twitter sentiment classification,’’ in Proc. 8th Int. Workshop Semantic Eval. (SemEval), Dublin, Ireland, Aug. 2014, pp. 208–212.
    [92] D.Tang,Y.N.FuruWei,M.Zhou,T.Liu,andB.Qin,‘‘Learningsentiment- specific word embedding for twitter sentiment classification,’’ in Proc. 52nd Annu. Meeting Assoc. Comput. Linguistics, Baltimore, MD, USA, Jun. 2014, pp. 1555–1565,2014.
    [93] R. Socher et al., ‘‘Recursive deep models for semantic compositionality over a sentiment treebank,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., Seattle, DC, USA, Oct. 2013, pp. 1631–1642.
    [94] J.Li,M.-T.Luong,D.Jurafsky,andE.Hovy.(2015).‘‘When are tree structures necessary for deep learning of representations?’’ [Online]. Available: https://arxiv.org/abs/1503.00185
    [95] S. Choudhury and H. Alani. 2015. Detecting Presence of Personal Events in Twitter Streams. In Proceedings of International Conference on Social Informatics, 157–166.
    [96] T. Dickinson, M. Fernandez, L. A. Thomas, P. Mulholland, P. Briggs, and H. Alani. 2016. Identifying Important Life Events from Twitter Using Semantic and Syntactic Patterns. In Proceedings of the 15th International Conference WWW/Internet, 143–150.
    [97] K. C. Sanagavarapu, A. Vempala, and
E. Blanco. 2017. Determining Whether and When People Participate in the Events They Tweet About. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 641–646.
    [98] K. C. Sanagavarapu, A. Vempala, and E. Blanco. 2017. Determining Whether and When People Participate in the Events They Tweet About. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 641–646.
    [99] Smitashree Choudhury, Harith Alani:
Personal Life Event Detection from Social Media. HT (Doctoral Consortium / Late-breaking Results / Workshops) 2014