consulta_academica is a project aimed at establishing the foundational infrastructure for data analysis and data-driven features within UENF's Academic System ("Acadêmico").
Acadêmico manages the academic activities of Universidade Estadual do Norte Fluminense (UENF), encompassing courses, subjects, enrollments, grades, and more. However, the production database of Acadêmico contains sensitive and irrelevant data that could compromise privacy and system performance. To address this, consulta_academica was developed.
Our primary objective is to create an anonymized and abstracted database, enabling the development of data routines and features that contribute to a data-driven culture within our university.
This project consists of a docker compose orchestration to be run alongside the "Acadêmico". This system keeps the academico_db
, an internal copy of the appropriate cohort of the "Acadêmico" production database. From academico_db
the system builds the consulta
schema with tables that refine and simplify the tables in "Academico" production database. The tables in consulta
can be easily accessed through the api.
consulta_academica is intented to run internally alongside the "Acadêmico", but you can run it locally following this instructions:
- Clone the repository:
git clone https://github.com/danibritods/consulta_academica
- Build the compose:
docker compose build
- Running the services
docker compose up
- You can stop the services with:
docker compose down
Access the consulta_academica API at port 80. You'll be greeted with a json presenting the instructions, an example and the current schema of the consulta_academica database:
{"Saudações!":"API Consulta Acadêmica",
"Instruções":"consulte o banco de dados enviando a sua query por meio de um POST para \"/execute_query\"",
"exemplo de query:":"SELECT disciplina_id, contagem_alunos FROM demanda.contagem_aluno_por_disciplina;",
"tabelas no formato {schema/: [tabelas]}":
{"consulta":[],
"demanda":[]
}
}
For example, you can perform a query like "SELECT * FROM consulta.aluno;"
:
curl -X 'POST' \
'http://localhost/execute_query/?query=SELECT%20%2A%20FROM%20consulta.aluno%3B' \
-H 'accept: application/json' \
-d ''
The response will be in JSON format, providing the queried data:
{"columns":["id","curso_id","matriz_id"],"data":[[1,1,1],[2,1,1],[3,1,1],[4,2,2],[5,2,2]]}
You can access the auto-generated API documentation at /docs
.
consulta_academica has a RabbitMQ and python services to consume the queues at port 5672.
Currently data_to_db
is a queue to receive tuples from the "Acadêmico" database.
The system expects tuples in the following format:
[{"table_name":'table',"id":1,"col1":1,"col2":2,"col3":3},
{"table_name":'table',"id":2,"col1":1,"col2":2,"col3":3}
{"table_name":'table',"id":2,"col1":1,"col2":2,"col3":3}]
Internally only the non-personal data is processed and copied to the internal academico_db
database. This process syncs the academico_db
to the production database, in such a way as to have a separate database, without sensitive data, that can be used without impacting the performance of the "Acadêmico".
The manage
queue currently only supports the message "run_sql_scripts" which executes the SQL scripts.
This primary script builds the consulta
schema from the tables in academico_db
. The consulta
schema contains an anonymized and abstracted copy of the relevant data from Acadêmico's database. It serves as a foundation for analytics and data features.
- It provides a separate source of truth, enabling queries without impacting production performance.
- Pre-processed for analytical purposes, e.g., consolidating multiple tables into one.
- Enhances data governability and privacy by restricting API access to the
consulta
schema. - The schema is publicly accessible for creating analytics routines.
This project's services maintain consulta
and enable querying its tables.
demanda
is the schema of our first data feature developed from consulta
. It builds a table containing each subject offered at UENF and its corresponding demand (in number of students) for the next period (demanda.contagem_aluno_por_disciplina
). It also maintains all the intermediary tables necessary for the calculation because each one of them can be useful for other analysis. This schema is also accessible through the API.
-
consulta tables
Table name Description aluno disciplina_matriz disciplina inscricao turma plano participacao atividade aproveitamento_de_atividade equivalencia disciplina_isencao disciplina_equivalencia_a_pedido disciplina_aproveitamento_interno disciplina_inscricao disciplina_participacao disciplina_cursada_ou_aproveitada disciplina_cursada_aproveitada_ou_equivalente disciplina_cursada pre_requisito co_requisito
-
consulta columns
-
aluno
- id
- curso_id
- matriz_id
-
aproveitamento_de_atividade
- id
- aluno_id
- participacao_id
-
atividade
- id
- ano_semestre
- descricao
- disciplina_id
-
co_resuisito
- co_requisitante_id
- co_requisito_id
-
disciplina
- id
- ano_semestre_fim
- ano_semestre_inicio
- creditos
- horas_extra_classe
- horas_pratica
- horas_teorica
- laboratorio_id
- nome
- sigla
- tipo_aprovacao
-
disciplina_aproveitamento_interno
- id
- aluno_id
- disciplina_id
- ano_semestre
- faltas
- nota
-
disciplina_cursada
- aluno_id
- ano_semestre
- disciplina_id
- faltas
- nota
- origem
- situacao
-
disciplina_cursada_aproveitada_ou_equivalente
- aluno_id
- ano_semestre
- disciplina_id
- faltas
- nota
- origem
- situacao
-
disciplina_cursada_ou_aproveitada
- aluno_id
- ano_semestre
- disciplina_id
- faltas
- nota
- origem
- situacao
-
disciplina_equivalencia_a_pedido
- aluno_id
- ano_semestre
- disciplina_id
- faltas
- id
- nota
-
disciplina_inscricao
- aluno_id
- ano_semestre
- disciplina_id
- faltas
- nota
- nota_ef
- situacao
-
disciplina_isencao
- aluno_id
- ano_semestre
- disciplina_id
- id
-
disciplina_matriz
- area_de_concentracao_id
- disciplina_id
- id
- matriz_id
- periodo_referencia
-
disciplina_participacao
- aluno_id
- ano_semestre
- descricao
- disciplina_id
- faltas
- nota
- situacao
-
equivalencia
- equivalente_id
- equivalida_id
-
inscricao
- faltas
- id
- nota
- nota_ef
- plano_id
- situacao
- turma_id
-
participacao
- aluno_id
- atividade_id
- faltas
- id
- insatisfatoria_em
- nota
- satisfatoria_em
-
plano
- aluno_id
- ano_semestre
- id
-
pre_requisito
- pre_requisitante_id
- pre_requisito_id
-
turma
- ano_semestre
- codigo
- disciplina_id
- id
-
-
demanda tables
Table name Description contagem_aluno_por_disciplina contagem_alunos contagem_aluno_por_disciplina disciplina_id disciplina_aprovada aluno_id disciplina_aprovada disciplina_id disciplina_demandada aluno_id disciplina_demandada disciplina_id disciplina_remanescente aluno_id disciplina_remanescente disciplina_id
-
demanda columns
-
contagem_aluno_por_disciplina
- contagem_alunos
- disciplina_id
-
disciplina_aprovada
- aluno_id
- disciplina_id
-
disciplina_demandada
- aluno_id
- disciplina_id
-
disciplina_remanescente
- aluno_id
- disciplina_id
-
This project consists of the following services orchestrated by Docker Compose:
db
: A PostgreSQL container housing theacademico_db
database with filtered tables from "Acadêmico" and theconsulta
, anddemanda
schemas.ruby
: A Ruby service to load the DB schema from "Acadêmico" inacademico_db
setup.rabbitMQ
: A messaging and queuing service for data synchronization and command execution.python
: A service consuming RabbitMQ queues for syncingacademico_db
and running SQL scripts.api
: A FastAPI providing queries exclusively to thedemanda
andconsulta
schemas. From this API future data products can be built such as dashboards and facilities to course coordinators.
.
├── api
│ ├── app
│ │ ├── __init__.py
│ │ └── main.py
│ ├── requirements.txt
│ └── Dockerfile
│
├── db
│ ├── data
│ ├── scripts
│ └── Dockerfile
│
├── docs
│ └── ...
│
├── off_the_rails
│ ├── app
│ │ ├── db
│ │ │ ├── config.yml
│ │ ... └── schema.rb
│ └── Dockerfile
│
├── python
│ ├── app
│ │ ├── __init__.py
│ │ │
│ │ ├── data_ingestion
│ │ │ ├── __init__.py
│ │ │ ├── consumer.py
│ │ │ ├── database.py
│ │ │ ├── manager.py
│ │ │ ├── run_sql_scripts.py
│ │ │ └── sync_academico_db.py
│ │ │
│ │ ├── sql_scripts
│ │ │ ├── consulta_from_academico.sql
│ │ │ └── demanda_from_consulta.sql
│ │ │
│ │ ├── tests
│ │ │ ├── __init__.py
│ │ │ ├── insert_mock_data.sql
│ │ │ ├── test_manager.py
│ │ │ ├── test_run_sql_scripts.sql
│ │ │ ├── test_scripts.sql
│ │ │ └── test_sync_academico_db.py
│ │ └── requirements.txt
│ └── Dockerfile
│
├── docker-compose.yml
└── README.md
├── notebook.ipynb