🗺️ bilingual-batchifier

This is a simple Python notebook that extracts sentence pairs from a .tmx (Translation Memory eXchange) file and saves them as a clean .csv.

It’s useful if you work with translation memories and want a quick way to turn them into bilingual data for analysis, training, or anything else.

📁 What it does

Reads a .tmx file
Extracts source and target segments
Cleans up whitespace
Saves everything into a CSV file (bilingual_corpus.csv)

▶️ How to use

Open the translation_memory_project.ipynb notebook
Replace the filename (yourfile.tmx) with your own
Adjust the language codes if needed (like 'en' and 'es')
Run all the cells

That’s it! Your clean bilingual file will be ready as bilingual_corpus.csv.

🧰 Requirements

Python 3.x
pandas

Install with:

pip install pandas

📝 Example output
source,target
"Hello, world!","¡Hola, mundo!"
"How are you?","¿Cómo estás?"
source,target
"Hello, world!","¡Hola, mundo!"
"How are you?","¿Cómo estás?"

🔮 Future Upgrades

💾SQL + Python Project Ideas (more advanced but manageable)

1.Translation Memory Database Manager

💡 Idea: Instead of reading .tmx into a CSV, store the data in a SQL database.

✴️ Use sqlite3 to save source–target pairs in a table

✴️ Add fields like language_pair, domain, file_source

✴️ Include sample SQL queries like:

WHERE source LIKE '%hello%' AND language_pair = 'en-es';

2. Idiom Database with Emotional Valence Tags

💡 Build a small idiom database with columns: idiom, lang, valence_score, transparency, familiarity

-Use SQL for: AVG valence per language

Idioms common to multiple languages

Complex filters (e.g., “neutral Turkish idioms that are familiar but low transparency”)

3. Corpus Search Tool 💡 Build a searchable bilingual corpus using SQL

-Create a corpus table with id, source, target, domain, lang_pair

-Let users search by keyword, domain, or language pair

Bonus: Write a Python UI for it with Streamlit

👩🏻‍💻 Made with linguistic love

Created by @avocadoyoon,

because bilingual data deserves to be ✨clean, sorted, and a lil assertive✨.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
translation_memory_project.ipynb		translation_memory_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗺️ bilingual-batchifier

📁 What it does

▶️ How to use

🧰 Requirements

🔮 Future Upgrades

👩🏻‍💻 Made with linguistic love

About

Uh oh!

Releases

Packages

Languages

avocadoyoon/SQL_TM

Folders and files

Latest commit

History

Repository files navigation

🗺️ bilingual-batchifier

📁 What it does

▶️ How to use

🧰 Requirements

🔮 Future Upgrades

👩🏻‍💻 Made with linguistic love

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages