DI - GigaBD - Artículos de Revistas

Permanent URI for this collection


Recent Submissions

Now showing 1 - 7 of 7
  • Publication
    Tendencias en el perfil tecnológico del profesional de la información
    (EPI, 2016-04) Morato Lara, Jorge Luis; Sánchez Cuadrado, Sonia; Fernandez Bajon, Maria Teresa
    Las tecnologí­as obligan a actualizar las competencias profesionales. Este estudio analiza estos cambios, tratando de aportar pautas para la mejora curricular. En este trabajo hemos realizado un análisis sobre los términos utilizados en 20 curricula de profesionales de la información y documentación para identificar sus competencias tecnológicas. Esas habilidades tecnológicas y esos términos han sido utilizados para examinar 735 ofertas de trabajo publicadas en portales generalistas y para 170 ofertas especí­ficas. De este análisis hemos identificado los conocimientos y competencias que deben favorecer la inserción laboral de estos profesionales. Estos están relacionados principalmente con el marketing, la gestión y el software para publicación web. Los resultados confirman la tendencia positiva en la demanda laboral de las competencias tecnológicas atribuibles a los profesionales de la documentación. No obstante es necesario reclamar el reconocimiento de la profesión pues muchas de esas ofertas están clasificadas en otras especialidades. Asimismo aportamos una lista de conocimientos tecnológicos clave para los profesionales de la información y la documentación. La informática y la Web sin duda les proporcionan nuevas oportunidades.
  • Publication
    Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
    (MDPI, 2022-06-01) Jain, Arti; Arora, Anuja; Morato Lara, Jorge Luis; Yadav, Divakar; Kumar, Kumar Vimal
    In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.
  • Publication
    Feature Based Automatic Text Summarization Methods: A Comprehensive State-of-the-Art Survey
    (IEEE, 2022-12-20) Yadav, Divakar; Katna, Rishbah; Yadav, Arun Kumar; Morato Lara, Jorge Luis; ---
    With the advent of the World Wide Web, there are numerous online platforms that generate huge amounts of textual material, including social networks, online blogs, magazines, etc. This textual content contains useful information that can be used to advance humanity. Text summarization has been a significant area of research in natural language processing (NLP). With the expansion of the internet, the amount of data in the world has exploded. Large volumes of data make locating the required and best information time-consuming. It is impractical to manually summarize petabytes of data; hence, computerized text summarization is rising in popularity. This study presents a comprehensive overview of the current status of text summarizing approaches, techniques, standard datasets, assessment criteria, and future research directions. The summarizing approaches are assessed based on several characteristics, including approach-based, document-number-based, Summarization domain-based, document-language-based, output summary nature, etc. This study concludes with a discussion of many obstacles and research opportunities linked to text summarizing research that may be relevant for future researchers in this field.
  • Publication
    Readability of Non-text Images on the World Wide Web (WWW)
    (IEEE, 2022-11-01) Elahi, Ehsan; Iglesias Maqueda, Ana María; Morato Lara, Jorge Luis; European Commission; Ministerio de Economía y Competitividad (España)
    The World Wide Web associated the world in a manner that was unrealistic previously and made it a lot more straightforward for users to get data, share and impart. But, irrelevant non-text images on the web pages equally specify poor readability, disrupting the people from the emphasis of the reading. The main goal of this paper is to evaluate the impact of irrelevant or low-quality non-text images on the readability of the webpage. An automatic methodology has been proposed to compute the relevancy of the non-text images. This methodology merges different approaches to extract information from non-text images and read text from websites in order to find relevancy between them. This technique was used to analyze fifty different educational websites to automatically find the relevancy of their non-text images. A user study has been carried out to evaluate the proposed methodology with different types of questions. The results agree with the fact that the relevant non-text images enhance the readability of the web page. This research work will help web designers to improve readability by considering only the relevant content of a web page, without relying on expert judgment.
  • Publication
    Text Summarization Technique for Punjabi Language Using Neural Networks
    (IAJIT, 2021-11-01) Jain, Arti; Arora, Anuja; Yadav, Divakar; Morato Lara, Jorge Luis; Kaur, Amanpreet; Ministerio de Economía y Competitividad (España)
    In the contemporary world, utilization of digital content has risen exponentially. For example, newspaper and web articles, status updates, advertisements etc. have become an integral part of our daily routine. Thus, there is a need to build an automated system to summarize such large documents of text in order to save time and effort. Although, there are summarizers for languages such as English since the work has started in the 1950s and at present has led it up to a matured stage but there are several languages that still need special attention such as Punjabi language. The Punjabi language is highly rich in morphological structure as compared to English and other foreign languages. In this work, we provide three phase extractive summarization methodology using neural networks. It induces compendious summary of Punjabi single text document. The methodology incorporates pre-processing phase that cleans the text; processing phase that extracts statistical and linguistic features; and classification phase. The classification based neural network applies an activation function- sigmoid and weighted error reduction-gradient descent optimization to generate the resultant output summary. The proposed summarization system is applied over monolingual Punjabi text corpus from Indian languages corpora initiative phase-II. The precision, recall and F-measure are achieved as 90.0%, 89.28% an 89.65% respectively which is reasonably good in comparison to the performance of other existing Indian languages" summarizers.
  • Publication
    Automated Readability Assessment for Spanish e-Government Information
    (IADITI, 2021-01-01) Morato Lara, Jorge Luis; Iglesias Maqueda, Ana María; Campillo Santiago, Adrián; Sánchez Cuadrado, Sonia; Ministerio de Economía, Industria y Competitividad (España)
    This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government"s websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.
  • Publication
    Datos enlazados para el análisis de la literatura grecolatina
    (CSIC, 2022-01-31) Linares Sánchez, Jorge Juan; Sánchez Cuadrado, Sonia; Morato Lara, Jorge Luis
    Se describe la elaboración de una ontología de dominio para la representación de la literatura grecolatina en forma de datos enlazados. Se analizan los principios de la Web Semántica y la difusión semántica de contenido aplicados a la literatura clásica grecolatina. Se ha adaptado la metodología Methontology para la construcción de ontologías y se ha implementado un recurso en lenguaje formalizado. El resultado de esta investigación ha sido la elaboración de un proyecto piloto de datos enlazados basado en los principios y tecnologías Linked Open Data (LOD) en el campo de la literatura comparada, desarrollando la ontología Litcomp para la mejora del estudio acerca de la influencia y la pervivencia de la literatura grecolatina.