Argumentation Mining in Parliamentary Discourse

. We examine whether using frame choices in forum statements can help us identify framing strategies in parliamentary discourse. In this analysis, we show how features based on embedding representations can improve discovering various frames in argumentative political speech. Given the complex nature of the parliamentary discourse, the initial results that are presented here are promising. We further present a manually annotated corpus for frame recognition in parliamentary discourse.


Introduction
In parliamentary discourse, politicians expound their beliefs and ideas through argumentation, and to persuade the audience, they highlight some aspect of an issue, which is commonly known as framing [4].Consider the following example1 : Example 1 There was no need to change the definition of marriage in order for gays and lesbians to establish meaningful, long term relationships that are recognized in law.
The speaker is framing his/her argument to promote the idea that the same-sex marriage is not necessary because the long term relationships are already recognized in law.
While a deep understanding and analysis of beliefs and ideas remains a challenge, a relatively simpler task of argument tagging based on pre-existing frames has recently been proposed [1,5].
In this paper, we introduce our supervised model trained on user-generated web content for classifying parliamentary discourse by its use of various frames.We use vector representations of words and sentences to capture their semantic information, and compute semantic similarity metrics across argumentative discourse-pairs.We further present our corpus of argumentative parliamentary discourse.These argumentative sentences are annotated with known frames.

Related Work
In recent years, several work has been reported on the argument analysis of user postings.Cabrio and Villata [3] used a textual entailment approach to find pro and con arguments in a set of debates selected from Debatepedia2 .Boltužić and Šnajder [1] proposed a categorization task of tagging user postings with a pre-existing set of frames.Their supervised classification model made use of entailment and semantic similarity features.To generalize their earlier work for various topics, Boltužić and Šnajder [2] presented an unsupervised model to recognize frames by means of textual similarity.In a similar task, Hasan and Ng [5] employed a probabilistic approach for stance and reason classification of user postings.Misra et al. [8] took a supervised approach to classify dialog postings by "argument facets" using lexical and semantic similarity features.These approaches focused on user-generated content on online forums.In contrast, we explore framing strategies in parliamentary discourse.

Corpus and Annotation
For our frame prediction task, we use user-postings manually annotated with known frames (ComArg corpus) as a training set and argumentative parliamentary speeches as a test set.The corpora that we conducted our study on are described in the following sections.

ComArg Corpus
ComArg3 provided by Boltužić and Šnajder [1], is a corpus of user statements manually annotated with users' positions towards a specific topic (Pro/Con stance), and a set of pre-existing "arguments".These arguments are, in effect, frames in the sense that we introduced above, as each highlights certain aspects of the issue.The authors chose two different sources for collecting their data; the user statements are compiled from the ProCon.orgweb-site and the frames are taken from the Idebate.orgweb-site.The corpus is on two topics of Gay Marriage (GM) and Under God in Pledge (UGIP).Since UGIP (regarding the Pledge of Allegiance) is an issue specific to the United States, we focused on the GM part of the corpus containing 198 statements and seven predefined frames (shown in Table 1).In this corpus, the pairs of statements and frames are annotated as explicit attack, implicit attack, no mention, explicit support, and implicit support.In this work, we only used explicit (176 instances) and implicit (98 instances) support statements.

Argumentative Parliamentary Statements
For our test set, we focused on debates regarding same-sex marriage in the Canadian Parliament.In 2005, Bill C-38, An act respecting certain aspects of legal capacity for marriage for civil purposes, which is about legalizing same-sex marriage in Canada, was introduced in the Parliament.Later that year, the bill was passed and the legal definition of marriage was expanded under a Liberal government to include conjugal couples of the same sex.After the Conservative Party of Canada gained power, the debate on same-sex marriage was re-opened in the Parliament in 2006.
We selected all speeches regarding same-sex marriage made by Liberal and Conservative members of the Canadian Parliament during the year 2006.We segmented the texts into sentences, which resulted in total of 136 sentences.The statements were first examined with respect to the position of the speaker towards same-sex marriage, and assigned Pro, Con, or No stance.We further examined which of the pre-defined frames (described in Section 3.1) support the statements, and manually annotated them with "none" or one of the frames.This annotation task was done by three annotators.To measure inter-annotator agreement, Weighted Kappa was computed for both stance (0.54), and for frames (0.46).
For almost 90% of the statements, at least two annotators were in agreement, and were kept as the final dataset.Some statements cannot be judged without the context, and annotators did not agree on the stance or the frame.After discarding the statements for which the annotators were not in agreement, the final set has 121 statements.87 of these remaining statements are supported by one of the ComArg pre-defined frames.

Methods
Distributed word representations are used efficiently in various NLP tasks including sentiment analysis [9].More recently, embedding models such as those of Mikolov et al. [7], Wang et al. [11], and Kiros et al. [6] provided an effective and easy way to employ word and sentence representations.These distributed representations are realvalued vectors that capture semantic and syntactic content of words and sentences.Here, we use word and sentence vector representations to measure the semantic textual similarity (STS) between the statements and the frames.We used word2vec embeddings [7] (300-dimensional vectors) trained on Google news articles, and syntactic embeddings [11] (300-dimensional vectors) trained on the Annotated English Gigaword, to compute sentence vectors, and further compare them to skip-thought sentence vectors (4800-dimensional vectors) [6].We compute two similarity scores between statements and frames; (1) the cosine similarity of sentence vectors, (2) the similarity score represented by a concatenation of the component-wise product of two vectors and their absolute difference (P&D) [10].We also studied the impact of adding the stance feature (Pro/Con) to the similarity scores as suggested by Boltužić and Šnajder [1].Our supervised model then takes these features as input, and learns to identify the frames.For supervised learning, we use SV M multiclass by Joachims. 4

Experiment and Results
We use the statements from ComArg as a training set and the Canadian parliamentary statements on GM as a test set for our classification task.We first remove the stopwords, and then sum the vector representations of the remaining words in the sentences to compute the sentence vectors.For syntactic embeddings, we only used the noun, adjective, and verb embeddings.
After representing the statements and frames using word2vec, the syntactic-based embedding model, and the skip-thought model, we computed the semantic similarity of each pair with the similarity measures described in Section 4.
Our baselines are majority class and bag-of-words (as TF-IDF vector) classifiers.Table 2 summarizes our results.We observe that almost all models that use STS features outperform the baselines.We also observe that the P&D similarity score provides a better measure for capturing the meaning of the statement-frame pairs.Furthermore, adding the stance feature to the cosine similarity scores improves the accuracy of the classifiers; however, adding it to P&D has no impact on the accuracy of the classifiers.Although the training set of explicit statements is smaller than the training set of explicit and implicit statements, the best results are mostly achieved by training the classifier on explicit instances.The best accuracy was obtained using stance feature and was about 20-40 pp increase above the baseline.Without using the stance feature, the best score was obtained by training the classifier on explicit and implicit instances with the P&D similarity score of word2vec vectors.

Conclusion and Future Work
In this preliminary study, we examined recognizing frames in political argumentative discourse.Many directions have yet to be explored, including (1) discovering frames for various issues, (2) considering larger spans of argumentative discourse, (3) exploring semi-supervised or unsupervised approaches due to the scarcity of human-annotated data for supervised approaches.

Table 1 .
ComArg pre-defined frames on Gay Marriage

Table 2 .
Frame prediction results