Integració de diferents fonts de dades òmiques i visualització de les variables originals mitjançant tècniques de Machine Learning

Riba Archilla, Laura

DSpace Principal
→
Grau en ID
→
Sistemes de Gestió Digital de la Informació I (2024-25, matí)
→
Comunitat Carla Rubio
→
Col·lecció Carla Rubio
→
Ver ítem

dc.contributor	Vegas Lozano, Esteban
dc.creator	Riba Archilla, Laura
dc.date	2016-07-18T09:08:46Z
dc.date	2016-07-18T09:08:46Z
dc.date	2014-09
dc.date.accessioned	2024-12-16T10:22:47Z
dc.date.available	2024-12-16T10:22:47Z
dc.identifier	http://hdl.handle.net/2445/100571
dc.identifier.uri	http://fima-docencia.ub.edu:8080/xmlui/handle/123456789/14893
dc.description	Treballs Finals de Grau en Estadística UB-UPC, Facultat d'Economia i Empresa (UB) i Facultat de Matemàtiques i Estadística (UPC), Curs: 2013-2014, Tutor: Esteban Vegas Lozano
dc.description	En l’última dècada s’han desenvolupat noves tecnologies d’alt rendiment, les quals generen un volum de dades biològiques tan gran que ha motivat la creació de nous algorismes en el camp de la bioinformàtica per analitzar les dades generades. Aquests avenços han revolucionat la biologia molecular i han conduït a una nova mentalitat en la qual es desenvolupa una visió global dels sistemes biològics. En aquest context, actualment hi ha dues grans vies d’investigació: la integració de dades òmiques i la visualització de les variables originals. L’anàlisi de dades òmiques de més d’un tipus de forma simultània combinada amb la visualització de les relacions entre els milers de variables biològiques pot portar a una millor comprensió dels processos biològics. En aquest projecte s’estudia la tècnica del Kernel PCA juntament amb procediments per a representar les variables originals, s’aplica a dos conjunts de dades òmiques i es presenta de forma accessible amb aplicacions web interactives.
dc.description	The development in the last decade of the high-throughput technologies, new techniques for measuring biological data, has dramatically changed our views on molecular biology. Whereas a few years ago each gene or protein was studied as a single entity, new technologies allow to analyse large numbers of genes or proteins simultaneously. As a result, biological processes are studied as complex systems of functionally interacting macromolecules. This new mindset has led to the rise of new disciplines, such as genomics, proteomics and transcriptomics, in the so-called “omics era”. All of them have in common that are based on the analysis of a large volume of heterogeneous biological data. These datasets encourage researchers to develop new algorithms in the field of bioinformatics for its interpretation. Within this context, there are currently two major research challenges: omics data integration and visualization of the input variables. The analysis at the same time of integrated omics data combined with the visualization of relationships between the thousands of biological variables generated may lead to a better understanding of the global functioning of biological systems. Although individual analysis of each of these omics data undoubtedly results into interesting findings, it is only by integrating them that one can gain a global insight into cellular behavior. A systems approach thus is predicated on the integration of multiple independent datasets. Visualization is a key aspect of both the analysis and understanding of the omics data. The challenge is to create clear and meaningful visualizations that give biological insight, despite the complexity of the data. In this project, first we present the main types of omics data, the associated highthroughput technologies and the challenges that present its analysis, including the integration of omics data. After this, we give an overview of the discipline of machine learning, which provides algorithms and techniques to analyze omics data. In addition, special attention is paid to kernel methods, which are one of the most powerful methods for integrating heterogeneous data types. In the present work, we analyze the integration of data from several sources of information using the Kernel PCA technique together with a set of procedures to represent the input variables. Then we apply them to two different omics datasets. In addition, we provide this technique in an accessible way by the creation of interactive web applications.
dc.format	142 p.
dc.format	application/pdf
dc.language	cat
dc.rights	cc-by-nc-nd (c) Riba Archilla, 2016
dc.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights	info:eu-repo/semantics/openAccess
dc.source	Treballs Finals de Grau (TFG) - Estadística UB-UPC
dc.subject	Estadística
dc.subject	Bioinformàtica
dc.subject	Mètodes estadístics
dc.subject	Treballs de fi de grau
dc.subject	Statistics
dc.subject	Bioinformatics
dc.subject	Statistical methods
dc.subject	Bachelor's theses
dc.title	Integració de diferents fonts de dades òmiques i visualització de les variables originals mitjançant tècniques de Machine Learning
dc.type	info:eu-repo/semantics/bachelorThesis

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Col·lecció Carla Rubio
Col·lecció de prova de l'assignatura de SGDI 1.

Mostrar el registro sencillo del ítem

Buscar en DSpace

Búsqueda avanzada

Listar

Todo DSpace
Esta colección

Integració de diferents fonts de dades òmiques i visualització de les variables originals mitjançant tècniques de Machine Learning

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Buscar en DSpace

Listar

Todo DSpace

Esta colección

Mi cuenta