$\begingroup$

I'm writing a study that analyses how the topic of debate changes depending on who is speaking next.

I have 8,000 debates of various lengths. I have a continous measurement of how different each set of two speeches are in terms of their topic focus. For example, if a debate has 4 speeches. I compare the similarity of speech 1 to speech 2; speech 2 to speech 3, and speech 3 to speech 4. Giving my dependant variable 3 values.

My regressor of interest is the two speakers' ideological difference (also continuous). I have a number of control variables.

My current regression clusters standard errors at the debate level. In stata:

reg topic_diff ideological_diff x1 x2 x3, vce(cluster debate_number)

Where x1...x3 are my control variables. Clustering standard errors takes into account the similarity of speeches within a debate compared to between debates.

However, debates tend to follow paths, which would mean the difference in topics follow some kind of time series? Am I correct, or have I done enough?

Sorry for my ineptitude. Please ask me as many questions as you need!

James

Edit: just to add - I don't know how I would deal with having 8,000 different time series in a single regression, if I need to do so. I would ask faculty, but it's christmas!