Machine-translation inspired reordering as preprocessing for cross-lingual sentiment analysis

Ramírez Atrio, Alejandro

Inici de DSpace
→
Grau en ID
→
Sistemes de Gestió Digital de la Informació I (2024-25, matí)
→
Comunitat Carla Rubio
→
Col·lecció Carla Rubio
→
Visualitza element

dc.contributor	Badia, Antoni
dc.creator	Ramírez Atrio, Alejandro
dc.date	2018-11-15T14:20:11Z
dc.date	2018-11-15T14:20:11Z
dc.date	2018-08
dc.date.accessioned	2024-12-16T10:27:04Z
dc.date.available	2024-12-16T10:27:04Z
dc.identifier	http://hdl.handle.net/2445/126142
dc.identifier.uri	http://fima-docencia.ub.edu:8080/xmlui/handle/123456789/21812
dc.description	Treballs Finals del Màster en Ciència Cognitiva i Llenguatge, Facultat de Filosofia, Universitat de Barcelona, Curs: 2017-2018, Tutor: Toni Badia
dc.description	In this thesis we study the effect of word reordering as preprocessing for Cross-Lingual Sentiment Analysis. We try different reorderings in two target languages (Spanish and Catalan) so that their word order more closely resembles the one from our source language (English). Our original expectation was that a Long Short Term Memory classifier trained on English data with bilingual word embeddings would internalize English word order, resulting in poor performance when tested on a target language with different word order. We hypothesized that the more the word order of any of our target languages resembles the one of our source language, the better the overall performance of our sentiment classifier would be when analyzing the target language. We tested five sets of transformation rules for our Part of Speech reorderings of Spanish and Catalan, extracted mainly from two sources: two papers by Crego and Mariño (2006a and 2006b) and our own empirical analysis of two corpora: CoStEP and Tatoeba. The results suggest that the bilingual word embeddings that we are training our Long Short Term Memory model with do not improve any English word order learning by part of the model when used cross-lingually. There is no improvement when reordering the Spanish and Catalan texts so that their word order more closely resembles English, and no significant drop in result score even when applying a random reordering to them making them almost unintelligible, neither when classifying between 2 options (positive-negative) nor between 4 (strongly positive, positive, negative, strongly negative). We also replicated this with two different classifiers: a Convolutional Neural Network and a Support Vector Machine. The Convolutional Neural Network should primarily learn only short-range word order, while the Long Short Term Memory network should be expected to learn as well more long-range orderings. The Support Vector Machine does not take into account word order. Subsequently, we analyzed the prediction biases of these models to see how they affect the reordering results. Based on this analysis, we conclude that the lacking results of the Long Short Term Memory classifier when fed a reordered text do not respond to a problem of prediction bias. In the process of training our models, we use two bilingual lexicons (English-Spanish and English-Catalan) (Hu and Liu 2004) that contain words that typically are key for analyzing the sentiment of a sentence that we use to project our bilingual word embeddings between each language pair. Due to the results we got in the reordering experiments, we conjectured that what determines how our models are classifying the sentiment of the target languages is whether these lexicon words appear or not in the input sentence. Finally, because of this, we test different alterations on the target languages corpora to determine whether this conjecture is strengthened or not. The results seem to go in favor of it. Our main conclusion, therefore, is that Part of Speech-based word reordering of a target language to make its word order more similar to a source language does not improve the results on sentiment classification of our Long Short Term Memory classifier trained on source language data, regardless of the granularity of the sentiment, based on our bilingual word embeddings.
dc.format	29 p.
dc.format	application/pdf
dc.language	eng
dc.rights	cc-by-nc-nd (c) Ramírez Atrio, Alejandro, 2018
dc.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights	info:eu-repo/semantics/openAccess
dc.source	Màster Oficial - Ciència Cognitiva i Llenguatge (CCiL)
dc.subject	Ciència cognitiva
dc.subject	Traducció automàtica
dc.subject	Multilingüisme
dc.subject	Treballs de fi de màster
dc.subject	Cognitive science
dc.subject	Machine translating
dc.subject	Multilingualism
dc.subject	Master's theses
dc.title	Machine-translation inspired reordering as preprocessing for cross-lingual sentiment analysis
dc.type	info:eu-repo/semantics/masterThesis

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

Col·lecció Carla Rubio
Col·lecció de prova de l'assignatura de SGDI 1.

Mostra el registre parcial de l'element

Cerca a DSpace

Cerca avançada

Visualitza

Tot DSpace
Aquesta col·lecció

Machine-translation inspired reordering as preprocessing for cross-lingual sentiment analysis

Fitxers en aquest element

Aquest element apareix en la col·lecció o col·leccions següent(s)

Cerca a DSpace

Visualitza

Tot DSpace

Aquesta col·lecció

El meu compte