This application allows you to save/get/modify/delete documents with related text pieces entities in PostgreSQL database and index/search that saved document's text pieces in ElasticSearch.
-
Make sure that you have installed the latest versions of
pythonandpipon your computer. Also, you have to install Docker and Docker Compose. -
This project by default uses poetry for dependency and virtual environment management. Make sure to install it too.
-
Make sure to provide all required environment variables (via
.envfile,exportcommand, secrets, etc.) before running application.
-
For managing pre-commit hooks this project uses pre-commit.
-
For import sorting this project uses isort.
-
For code format checking this project uses black.
-
For type checking his project uses mypy
-
For create commits and lint commit messages this project uses commitizen. Run
make committo use commitizen during commits.
This project involves github actions to run all checks and unit-tests on push to remote repository.
There are lots of useful commands in Makefile included into this project's repo. Use make <some_command> syntax to run each of them.
If your system doesn't support make commands - you may copy commands from Makefile directly into terminal.
For managing migrations this project uses alembic.
-
Dockerfile already includes
alembic upgrade headcommand to run all revision migrations, required by current version of application. -
Run
make upgradeto manually upgrade database tables state. You could also manually upgrade to specific revision with.pyscript (fromalembic/versions/) by running:alembic upgrade <revision id number> -
You could also downgrade one revision down with
make downgradecommand, to specific revision - by runningalembic downgrade <revision id number>, or make full downgrade to initial database state with:make downgrade_full
-
To install all the required dependencies and set up a virtual environment run in the cloned repository directory use:
poetry installYou can also install project dependencies using
pip install -r requirements.txt. -
To config pre-commit hooks for code linting, code format checking and linting commit messages run in the cloned directory:
poetry run pre-commit install -
Build app image using
make build -
Run Docker containers using
make upNotes:
docker-compose.ymlspecifies containers creation consideringhealthchecksin the following order:elasticsearch -> postgresql -> web.
-
Stop and remove Docker containers using
make downIf you also want to remove log volume use
make down_volume
-
By default, web application will be accessible at http://localhost:8080, database available at http://localhost:5432 host and elasticsearch available at http://localhost:9200.
-
You can try all endpoints with SWAGGER documentation at http://localhost:8080/docs
-
Use resources with
/documentsprefix to create, read, update and delete data indocumentsdatabase table.
-
To create document you should provide values for "document_name" (must be unique) and "author". Document entity also has "document_id" - Primary key for database table, that returns as part of successful response. There will be also ElasticSearch index created (if not exists) for which "index_name" equals to created "document_id".

-
Use resources with
/text_piecesprefix to create, read, update and delete data intext_piecesdatabase table.
-
Each request to create new text_piece should be provided with following data in request body:
text- required field with text data,type- required, eithertitleorparagraph,page- required, integer number of page in document, which text piece belongs,document_name- required - link to document, which text piece belongs. Non-nullable ForeignKey to document, saved indocumentstable,meta_data- optional field with JSON object as value, containing some metadata info about text piece.
In succesfull response you will also get:
piece_id- Primary Key of new text piece in database table,indexed- boolean value that show whether text piece was already indexed or not,size- calculated length oftextfield of text piece.created_at- timestamp for text piece object entity creation time in database. datetime.datetime() object.
-
Use resources with
/indexprefix to index and search for text pieces in Elasticsearch indices.
-
Request to
/index/{index_name}/indexresource will check that index with name (document_id) exists in ElasticSearch. If exists - all already saved text pieces in index will be removed and after that all text_pieces from PostgreSQL table, associated withdocument_idwill be indexed. For all text pieces to be indexed -indexedfield's value in database table will be updated and set totrue. -
Request to
/index/{index_name}/searchresource will search for text pieces in ElasticSearch index with name (document_id) if exists. Support pagination (page_numandpage_size). If no pagination parameters specified - returns first 15 results. Infiltersfield you should specify list of filters, consists offield,operatorandvalues. Available text fields for search are:text- supportmatchoperator that calculates score of relative matching andeqthat finds exact match of requested string,document_name- supportmatchoperator that calculates score of relative matching andeqthat finds exact match of requested string,meta_data- searcheseqfor values,type- haseqoperator that accepts only existing text pieces types (titleorparagraph,indexed- haseqand accepts onlytrueorfalsevalues.
Available countable fields for search are:
page,size,created_at.
This fields are compatible with operators:
eq,in(array of possible values - will return result if at least one value matches) and comparations:gt(greater than),gte(greater than or equals),lt(lower than),lte(lower than or equals).If no filters provided - returns all documents in index with
index_name.Note: Order results by descending
scorevalue (ifmatchis used) and then bycreated_attimestamp in ascending order.Returns pagination parameters (including
page_num,page_sizeandtotal- with total number of text pieces matching query) anddatafield with list of matching text pieces in response body.
-
Use
make testto locally run pytest checks during development. -
After all tests coverage report will be also shown.
-
Staged changes will be checked during commits via pre-commit hook.
-
All checks and tests will run on code push to remote repository as part of github actions.