The Artificial Intelligence Group (GIA) at the University of Havana is currently directed by Dr. Yudivián Almeida, it belongs to the Department of Artificial Intelligence and Computational Systems, of the Faculty of Mathematics and Computing of the University of Havana.
This research group has approximately 10 researchers, and the team consists of both by senior researchers, post-doctoral and pre-doctoral researchers. It is a group dynamic and proactive who organize different internal seminars, collaborate with various collaborations with international centers, such as the Paris 1 Panthéon-Sorbonne University (France), the University of University of Alicante (Spain) or the University of Montreal (Canada) and currently have several research projects related to the subject of Artificial Intelligence with treatment of heterogeneous data in execution phase.
The objective of the research group is to investigate topics related to Artificial Intelligence. Within this area of research, 4 fundamental areas are explored:
- intelligent generation, processing, and analysis of natural language;
- machine learning and its applications;
- solutions based on metaheuristics for complex problems; and
- robotics.
From this, results have been obtained in exploring the linguistic and computational issues involved in areas such as:
- the generation of natural language, the recognition of the textual implication, the speech and the modeling of the dialogue, the pragmatics and multilinguality; develop computational tools for knowledge inference;
- the interaction between representation and inference in computational semantics for natural language;
- the design of metaheuristics for solving large problems;
- the design of optimal flows or ensembles of machine learning algorithms; and
- the design of robotic instruments.
Cecilia: The Cuban Language Model
Cecilia is a family of language models specifically pretrained on Cuban written text, designed to capture the linguistic, cultural, and social nuances of Cuban Spanish. Developed by the Artificial Intelligence Research Group (GIA-UH) at the University of Havana in collaboration with the University of Alicante, Cecilia supports a variety of natural language processing tasks tailored to Cuban Spanish, including text generation, sentiment analysis, named entity recognition, and machine translation. The model is trained on a rich corpus comprising Cuban newspapers, literature, laws, encyclopedias, and song lyrics, enabling it to reflect Cuban language varieties and cultural context authentically[1][3].
AutoGOAL: A Framework for Program Synthesis with a Focus on Auto Machine Learning
AutoGOAL is a versatile framework designed to automate the process of program synthesis, with a particular emphasis on automated machine learning (AutoML). It facilitates the automatic generation and optimization of machine learning pipelines, enabling researchers and practitioners to efficiently explore model configurations and data transformations without extensive manual intervention. AutoGOAL aims to accelerate the development of robust machine learning solutions by leveraging program synthesis techniques.
LETO: A Platform for Knowledge Discovery and Data Analysis Using Large Language Models
LETO (Learning Engine Through Ontologies) is an innovative platform that harnesses the power of large language models (LLMs) for knowledge discovery and data analysis. By integrating advanced LLM capabilities, LETO enables users to extract insights, identify patterns, and generate meaningful interpretations from complex datasets. This platform supports exploratory data analysis and facilitates informed decision-making through natural language interactions and automated reasoning.
LINGO: A minimal, async-native, and unopinionated toolkit for modern LLM applications
LINGO is a library that provides a powerful, three-layered API for building, testing, and deploying complex LLM workflows with precision and clarity. It is built on the idea that developers need different levels of abstraction for different tasks. The High-Level Lingo API for purely declarative, ready-to-use LLM assistants. This is the fastest way to get a chatbot running; The Mid-Level Flow API gor declarative, reusable context engineering workflows that allows to define complex, composable logic with branching, tool use, and subroutines; and, The Low-Level (LLM, Engine, Context) API for direct, explicit context engineering that gives full, imperative control over the message history and LLM interactions.
PUMKING: A library to streamline the complex process of document chunking, parsing, and representation
PUMKING is an open-source Python library designed to streamline the complex process of document chunking, parsing, and representation. It provides a flexible and powerful pipeline to transform unstructured documents into structured, queryable knowledge. Whether you're building a RAG (Retrieval-Augmented Generation) system, a document analysis tool, or a knowledge extraction service, pumpking provides the foundational blocks to get you there faster.
ARGO: A Library for Building LLM-Based Agentic Workflows
ARGO (Agent-based Reasoning, Governance, and Orchestration) is a comprehensive library designed to construct agentic workflows powered by large language models. It provides tools and abstractions to develop intelligent agents capable of performing complex tasks autonomously by orchestrating LLMs in workflows. ARGO supports the creation of dynamic, context-aware agents that can interact with various data sources and services, enabling sophisticated automation and problem-solving applications.