Skip to content

Latest commit

 

History

History
485 lines (369 loc) · 13.7 KB

README.md

File metadata and controls

485 lines (369 loc) · 13.7 KB

consulta_academica

Unit Tests

About

consulta_academica is a project aimed at establishing the foundational infrastructure for data analysis and data-driven features within UENF's Academic System ("Acadêmico").

Acadêmico manages the academic activities of Universidade Estadual do Norte Fluminense (UENF), encompassing courses, subjects, enrollments, grades, and more. However, the production database of Acadêmico contains sensitive and irrelevant data that could compromise privacy and system performance. To address this, consulta_academica was developed.

Our primary objective is to create an anonymized and abstracted database, enabling the development of data routines and features that contribute to a data-driven culture within our university.

This project consists of a docker compose orchestration to be run alongside the "Acadêmico". This system keeps the academico_db, an internal copy of the appropriate cohort of the "Acadêmico" production database. From academico_db the system builds the consulta schema with tables that refine and simplify the tables in "Academico" production database. The tables in consulta can be easily accessed through the api.

Setup

consulta_academica is intented to run internally alongside the "Acadêmico", but you can run it locally following this instructions:

  1. Clone the repository:
    git clone https://github.com/danibritods/consulta_academica
  2. Build the compose:
    docker compose build
  3. Running the services
    docker compose up
  • You can stop the services with:
    docker compose down

Use

API

Access the consulta_academica API at port 80. You'll be greeted with a json presenting the instructions, an example and the current schema of the consulta_academica database:

{"Saudações!":"API Consulta Acadêmica",

"Instruções":"consulte o banco de dados enviando a sua query por meio de um POST para \"/execute_query\"",

"exemplo de query:":"SELECT disciplina_id, contagem_alunos FROM demanda.contagem_aluno_por_disciplina;",

"tabelas no formato {schema/: [tabelas]}":

    {"consulta":[],

    "demanda":[]
    }
}

example query:

For example, you can perform a query like "SELECT * FROM consulta.aluno;":

curl -X 'POST' \
  'http://localhost/execute_query/?query=SELECT%20%2A%20FROM%20consulta.aluno%3B' \
  -H 'accept: application/json' \
  -d ''

The response will be in JSON format, providing the queried data:

{"columns":["id","curso_id","matriz_id"],"data":[[1,1,1],[2,1,1],[3,1,1],[4,2,2],[5,2,2]]}

API documentation

You can access the auto-generated API documentation at /docs.

api docs

Message broker

consulta_academica has a RabbitMQ and python services to consume the queues at port 5672. Currently data_to_db is a queue to receive tuples from the "Acadêmico" database. The system expects tuples in the following format:

[{"table_name":'table',"id":1,"col1":1,"col2":2,"col3":3},
                            {"table_name":'table',"id":2,"col1":1,"col2":2,"col3":3}
                            {"table_name":'table',"id":2,"col1":1,"col2":2,"col3":3}]

Internally only the non-personal data is processed and copied to the internal academico_db database. This process syncs the academico_db to the production database, in such a way as to have a separate database, without sensitive data, that can be used without impacting the performance of the "Acadêmico".

The manage queue currently only supports the message "run_sql_scripts" which executes the SQL scripts.

SQL scripts

consulta_from_academico.sql

This primary script builds the consulta schema from the tables in academico_db. The consulta schema contains an anonymized and abstracted copy of the relevant data from Acadêmico's database. It serves as a foundation for analytics and data features.

  • It provides a separate source of truth, enabling queries without impacting production performance.
  • Pre-processed for analytical purposes, e.g., consolidating multiple tables into one.
  • Enhances data governability and privacy by restricting API access to the consulta schema.
  • The schema is publicly accessible for creating analytics routines.

This project's services maintain consulta and enable querying its tables.

demanda_from_consulta.sql

demanda is the schema of our first data feature developed from consulta. It builds a table containing each subject offered at UENF and its corresponding demand (in number of students) for the next period (demanda.contagem_aluno_por_disciplina). It also maintains all the intermediary tables necessary for the calculation because each one of them can be useful for other analysis. This schema is also accessible through the API.

Schemas

consulta

  • consulta tables
    Table name Description
    aluno
    disciplina_matriz
    disciplina
    inscricao
    turma
    plano
    participacao
    atividade
    aproveitamento_de_atividade
    equivalencia
    disciplina_isencao
    disciplina_equivalencia_a_pedido
    disciplina_aproveitamento_interno
    disciplina_inscricao
    disciplina_participacao
    disciplina_cursada_ou_aproveitada
    disciplina_cursada_aproveitada_ou_equivalente
    disciplina_cursada
    pre_requisito
    co_requisito
  • consulta columns
    • aluno
      • id
      • curso_id
      • matriz_id
    • aproveitamento_de_atividade
      • id
      • aluno_id
      • participacao_id
    • atividade
      • id
      • ano_semestre
      • descricao
      • disciplina_id
    • co_resuisito
      • co_requisitante_id
      • co_requisito_id
    • disciplina
      • id
      • ano_semestre_fim
      • ano_semestre_inicio
      • creditos
      • horas_extra_classe
      • horas_pratica
      • horas_teorica
      • laboratorio_id
      • nome
      • sigla
      • tipo_aprovacao
    • disciplina_aproveitamento_interno
      • id
      • aluno_id
      • disciplina_id
      • ano_semestre
      • faltas
      • nota
    • disciplina_cursada
      • aluno_id
      • ano_semestre
      • disciplina_id
      • faltas
      • nota
      • origem
      • situacao
    • disciplina_cursada_aproveitada_ou_equivalente
      • aluno_id
      • ano_semestre
      • disciplina_id
      • faltas
      • nota
      • origem
      • situacao
    • disciplina_cursada_ou_aproveitada
      • aluno_id
      • ano_semestre
      • disciplina_id
      • faltas
      • nota
      • origem
      • situacao
    • disciplina_equivalencia_a_pedido
      • aluno_id
      • ano_semestre
      • disciplina_id
      • faltas
      • id
      • nota
    • disciplina_inscricao
      • aluno_id
      • ano_semestre
      • disciplina_id
      • faltas
      • nota
      • nota_ef
      • situacao
    • disciplina_isencao
      • aluno_id
      • ano_semestre
      • disciplina_id
      • id
    • disciplina_matriz
      • area_de_concentracao_id
      • disciplina_id
      • id
      • matriz_id
      • periodo_referencia
    • disciplina_participacao
      • aluno_id
      • ano_semestre
      • descricao
      • disciplina_id
      • faltas
      • nota
      • situacao
    • equivalencia
      • equivalente_id
      • equivalida_id
    • inscricao
      • faltas
      • id
      • nota
      • nota_ef
      • plano_id
      • situacao
      • turma_id
    • participacao
      • aluno_id
      • atividade_id
      • faltas
      • id
      • insatisfatoria_em
      • nota
      • satisfatoria_em
    • plano
      • aluno_id
      • ano_semestre
      • id
    • pre_requisito
      • pre_requisitante_id
      • pre_requisito_id
    • turma
      • ano_semestre
      • codigo
      • disciplina_id
      • id

demanda

  • demanda tables
    Table name Description
    contagem_aluno_por_disciplina contagem_alunos
    contagem_aluno_por_disciplina disciplina_id
    disciplina_aprovada aluno_id
    disciplina_aprovada disciplina_id
    disciplina_demandada aluno_id
    disciplina_demandada disciplina_id
    disciplina_remanescente aluno_id
    disciplina_remanescente disciplina_id
  • demanda columns
    • contagem_aluno_por_disciplina
      • contagem_alunos
      • disciplina_id
    • disciplina_aprovada
      • aluno_id
      • disciplina_id
    • disciplina_demandada
      • aluno_id
      • disciplina_id
    • disciplina_remanescente
      • aluno_id
      • disciplina_id

Structure

This project consists of the following services orchestrated by Docker Compose:

  • db: A PostgreSQL container housing the academico_db database with filtered tables from "Acadêmico" and the consulta, and demanda schemas.
  • ruby: A Ruby service to load the DB schema from "Acadêmico" in academico_db setup.
  • rabbitMQ: A messaging and queuing service for data synchronization and command execution.
  • python: A service consuming RabbitMQ queues for syncing academico_db and running SQL scripts.
  • api: A FastAPI providing queries exclusively to the demanda and consulta schemas. From this API future data products can be built such as dashboards and facilities to course coordinators.

Folder Structure

.
├── api
│   ├── app
│   │   ├── __init__.py
│   │   └── main.py
│   ├── requirements.txt
│   └── Dockerfile
│
├── db
│   ├── data
│   ├── scripts
│   └── Dockerfile
│
├── docs
│   └── ...
│
├── off_the_rails
│   ├── app
│   │   ├── db
│   │   │   ├── config.yml
│   │  ...  └── schema.rb
│   └── Dockerfile
│
├── python
│   ├── app
│   │   ├── __init__.py
│   │   │   
│   │   ├── data_ingestion
│   │   │   ├── __init__.py
│   │   │   ├── consumer.py
│   │   │   ├── database.py
│   │   │   ├── manager.py
│   │   │   ├── run_sql_scripts.py
│   │   │   └── sync_academico_db.py
│   │   │   
│   │   ├── sql_scripts
│   │   │   ├── consulta_from_academico.sql
│   │   │   └── demanda_from_consulta.sql
│   │   │   
│   │   ├── tests
│   │   │   ├── __init__.py
│   │   │   ├── insert_mock_data.sql
│   │   │   ├── test_manager.py
│   │   │   ├── test_run_sql_scripts.sql
│   │   │   ├── test_scripts.sql
│   │   │   └── test_sync_academico_db.py
│   │   └── requirements.txt
│   └── Dockerfile
│
├── docker-compose.yml
└── README.md
├── notebook.ipynb