Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 12, 2025

Problem

The app was using an incorrect XML format that did not match the actual QWOM data source from https://github.com/rulingAnts/QWOM_Data. This prevented the app from successfully importing real wordlist data files.

Issues Found

XML Structure:

  • ❌ Used <Wordlist> as root element instead of <phon_data>
  • ❌ Used <Entry> for row elements instead of <data_form>
  • ❌ Used <Picture> for image field instead of <Image_File>

File Encoding:

  • ❌ Sample test data was ASCII with LF line endings
  • ❌ Should be UTF-16 LE with CRLF line endings to match actual QWOM format

Solution

Updated the XML import/export logic and test data to match the actual Dekereke/QWOM format used by the data source (verified against QWOM2025-08.xml with 987 entries).

Changes Made

Import Parser (lib/services/xml_service.dart):

  • Changed element lookup from Entry to data_form
  • Added support for Image_File field with fallback to Picture for backward compatibility
  • Updated comments to document the actual format

Export Generator (lib/services/xml_service.dart):

  • Changed root element from <Wordlist> to <phon_data>
  • Changed row elements from <Entry> to <data_form>
  • Changed image field from <Picture> to <Image_File>
  • Added CRLF line ending conversion for Windows compatibility

Test Data (test_data/sample_wordlist.xml):

  • Converted to UTF-16 LE encoding with CRLF line endings
  • Updated structure to use correct element names
  • Added format documentation in XML comments

Documentation (FLUTTER_README.md):

  • Updated XML format examples to show correct structure
  • Added format notes explaining encoding requirements

Tests (test/xml_service_test.dart):

  • Added comprehensive unit tests for XML export
  • Validates correct element names and structure
  • Verifies CRLF line endings in output
  • Ensures old format elements are not present

Format Comparison

Before (Incorrect)

<Wordlist>
  <Entry>
    <Reference>0001</Reference>
    <Gloss>body</Gloss>
    <Picture>body.jpg</Picture>
  </Entry>
</Wordlist>

After (Correct - matches QWOM)

<?xml version="1.0" encoding="utf-16"?>
<phon_data>
  <data_form>
    <Reference>0001</Reference>
    <Gloss>body</Gloss>
    <Image_File>0001body.png</Image_File>
  </data_form>
</phon_data>

Verification

  • ✅ Validated against actual QWOM data (QWOM2025-08.xml, 987 entries)
  • ✅ Confirmed UTF-16 LE encoding with CRLF line endings
  • ✅ Verified correct element structure
  • ✅ Backward compatible with legacy <Picture> field in imports

The app can now successfully import real QWOM wordlist data files and export data in the correct format for use with Dekereke and other linguistic tools.

Original prompt

Check this repository for problems, especially notice that I made changes to the XML format (both character encoding, CRLF, and the names or roots and row elements). The changes i made reflect the actual source data file format. And our app should use that.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 3 commits October 12, 2025 22:42
Co-authored-by: rulingAnts <152982096+rulingAnts@users.noreply.github.com>
Co-authored-by: rulingAnts <152982096+rulingAnts@users.noreply.github.com>
Co-authored-by: rulingAnts <152982096+rulingAnts@users.noreply.github.com>
Copilot AI changed the title [WIP] Check XML format changes for source data compliance Fix XML format to match actual QWOM data source (phon_data, data_form, UTF-16, CRLF) Oct 12, 2025
Copilot AI requested a review from rulingAnts October 12, 2025 22:46
Copilot finished work on behalf of rulingAnts October 12, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants