Skip to content

πŸ“š AzeLexicon: Comprehensive Azerbaijani word list, hyphenated forms, and academic/scientific terminology for research and academic work

License

Notifications You must be signed in to change notification settings

abdanar/AzeLexicon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

169 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

AzeLexicon – Azerbaijani Word List & Terminology

GitHub Repo Size GitHub Issues License

AzeLexicon is a structured, open-source repository of Azerbaijani words and terminology. Its main purpose is to serve as the primary source for the Azerbaijani word list, its hyphenated version, and academic/scientific terminology, supporting anyone producing academic or scientific work in Azerbaijani.

πŸ“š Project Scope

The repository includes:

  • General word list
    A plain list of Azerbaijani words (txt format). It contains only words, without translations. Some cleanup and refinement are still needed.

  • Hyphenated words
    A hyphenated version of the general word list. This list is not yet complete, as the hyphenation algorithm is under active development.

  • Academic and scientific terminology
    Organized by subject (mathematics, physics, computer science, etc.). Each subject has a main file (terms.json) containing the translations of English terms into Azerbaijani. Subfields (e.g., linear algebra, probability) are used for initial collection of terms, which are then consolidated into the main terms.json.

This project aims to be the authoritative reference for Azerbaijani in scientific and academic contexts.

Repository Structure

AzeLexicon/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ general/
β”‚ β”‚ β”œβ”€β”€ words.txt # Plain list of Azerbaijani words (no translations)
β”‚ β”‚ └── words-hyphenated.txt # Hyphenated version of the general word list
β”‚ β”‚
β”‚ β”œβ”€β”€ scripts/
β”‚ β”‚ β”œβ”€β”€ generate_markdown.py # Generates Markdown glossaries from terms.json
β”‚ β”‚ └── hyphenation.py # Experimental hyphenation algorithm
β”‚ β”‚
β”‚ └── subjects/
β”‚ β”œβ”€β”€ math/
β”‚ β”‚ β”œβ”€β”€ terms.json # Glossary of math terms (EN ↔ AZ)
β”‚ β”‚ β”œβ”€β”€ math_terms.txt # Consolidated list of all English math terms
β”‚ β”‚ β”œβ”€β”€ categories/ # Subfield-specific English terms
β”‚ β”‚ β”‚ β”œβ”€β”€ linalg.txt # Linear Algebra terms
β”‚ β”‚ β”‚ β”œβ”€β”€ prob.txt # Probability terms
β”‚ β”‚ β”‚ └── ... # More subfields can be added here
β”‚ β”‚ └── scripts/
β”‚ β”‚ └── process_subject.py # Script to process, validate, and sort terms
β”‚ β”‚
β”‚ └── ... # Other subjects (physics, chemistry, biology, etc.)
β”‚
β”œβ”€β”€ glossary/
β”‚ β”œβ”€β”€ math.md # Generated Markdown glossary from terms.json
β”‚ └── ... # Glossaries for other subjects
β”‚
β”œβ”€β”€ .github/workflows/
β”‚ └── sort-validate.yml # Automated Term Standardization workflow
β”‚
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ CODE_OF_CONDUCT.md
└── CONTRIBUTORS.md # List of contributors and maintainers

βš™οΈ Automation

To maintain consistency and quality across the repository, all terms.json files are automatically sorted alphabetically and checked for duplicates by the πŸ›  Automated Term Standardization workflow (sort-validate.yml). Contributors are expected to update the status for each term they are contributing in the JSON file. The workflow automatically flags missing translations with ❌ Missing, so there is no need to add this manually.

Valid statuses are:

  • ❌ Missing – automatically flagged by the workflow for missing translations.
  • ⚠️ Revision – use this if the translation needs review or you are unsure.
  • βœ… Complete – use this if the translation is verified and fully correct.

The workflow can be triggered manually in the GitHub Actions tab.

Caution

Before submitting a pull request (PR), ensure that the πŸ›  Automated Term Standardization workflow completes successfully. For guidance on when to run the workflow, please refer to the wiki section describing the appropriate cases. Additionally, make sure that the status of each contributed term is updated correctly.

About

πŸ“š AzeLexicon: Comprehensive Azerbaijani word list, hyphenated forms, and academic/scientific terminology for research and academic work

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages