A Lakota language dictionary in SFM/MDF & FLEx format for the Lakota people of the Sioux tribes.
Thank you to u/Even-Morons-Dream for the opportunity to help by reclaiming the data for them. I am honoured to be able to lend my skills.
For usage of the data, see the files within the \Lexicon folder. It contains the following files for use:
- An SFM/MDF dictionary file (dict.sfm)
- A FieldWorks Language Explorer (FLEx) project backup (Lakota Test 2025-06-28 1209 Lakota.fwbackup)
- A FLEx import map (dict-import-settings.map)
- An XHTML dictionary listing page (Lexicon.xhtml)
Important
The .fwbackup
file can be loaded as a backup restore and used, edited, added to and exported from within FLEx.
The .xhtml
file can be browsed, though it is rudimentary at best.
The .sfm
file itself can be imported into FLEx / Soapbox / Toolbox or any SFM/MDF compatible language tool for building a dictionary.
Note
The data was retrieved from was a Unity binary built for Android, packaged as an .apk
.
Analysis steps were as follows:
- Unpack
.apk
. - Decompile
.dex
and check code for Android side keys or other info. - Identify Unity files within
assets
folder. - Identify libraries within
UnityServicesProjectConfiguration.json
. - Identify libraries within
RuntimeInitializeOnLoads.json
. - Identify libraries within
ScriptingAssemblies.json
and identify use of SqlCipher4Unity3D (SQLCipher). - Identify binaries as mono and not IL2CPP.
- Identify an obfuscated database outside of asset packs via header check (0->16:32 SQLCipher print) and disassembly of monobehaviour scripts as likely encrypted with SQLCipher.
- Database likely contains additional text records and audio files due to output of heuristic analysis. No key found.
- Merge
sharedassets0.assets.split[n]
into completesharedassets0.assets
file. - Run custom header lookup scripts in hex editor for Unity disassembly/unpacking and identify asset chunks (shaders, images, fonts, text, etc.)
- Dump each data section to file.
- Identify two sections are SFM/MDF databases/dictionaries and are a single dictionary split in reverse due to size.
- Merge SFM/MDF data and trim header + start/end padding.
- Dump to ASCII string.
- Confirm output with native speaker.
- Confirm data meets technical documentation / SIL International specs.
- Import into FLEx.
- Export FLEx project backup and xhtml.
The root of the repo contains more 'raw' extracted files:
- Dumped SFM/MDF data block 1 (raw_data_A_to_I.bin)
- Dumped SFM/MDF data block 2 (raw_data_I_to_Z.bin)
- Extracted ASCII dictionary merged from fragments (dict.txt)
- ASCII dictionary fragments (dict_A_to_I.txt + dict_I_to_Z.txt)
If time permits, I would like to branch the build script, table/library code and frontend from STL Bitz Box & ACNH Pattern Dump Index to create a static webpage dictionary for the data with a row entry per word and searchable/filterable columns for each piece of information tied to that word (including audio support) that can be updated, managed and hosted by the community and will be completely open source.