Skip to content

msgfmt.py: Handling of header inconsistent with GNU msgfmt #131852

Closed
@StanFromIreland

Description

@StanFromIreland

Bug report

Bug description:

Running our current tests with a GNU generated general.mo we have a failure:

SubTest failure: Traceback (most recent call last):
  File "/home/stan/PycharmProjects/cpython/Lib/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/home/stan/PycharmProjects/cpython/Lib/unittest/case.py", line 556, in subTest
    yield
  File "/home/stan/PycharmProjects/cpython/Lib/test/test_tools/test_msgfmt.py", line 55, in test_compilation
    self.assertDictEqual(actual._catalog, expected._catalog)
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: {'': [35 chars]N\nPOT-Creation-Date: 2024-10-26 18:06+0200\nP[563 chars]bar'} != {'': [35 chars]N\nPO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\nLa[521 chars]bar'}
  {'': 'Project-Id-Version: PACKAGE VERSION\n'
-      'POT-Creation-Date: 2024-10-26 18:06+0200\n'
       'PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n'
       'Last-Translator: FULL NAME <EMAIL@ADDRESS>\n'
       'Language-Team: LANGUAGE <LL@li.org>\n'
       'MIME-Version: 1.0\n'
       'Content-Type: text/plain; charset=UTF-8\n'
       'Content-Transfer-Encoding: 8bit\n',
   '\n newlines \n': '\n translated \n',
   '"escapes"': '"translated"',
   'Multilinestring': 'Multilinetranslation',
   'abc\x04foo': 'bar',
   'bar': 'baz',
   'xyz\x04foo': 'bar',
   ('One email sent.', 0): 'One email sent.',
   ('One email sent.', 1): '%d emails sent.',
   ('abc\x04One email sent.', 0): 'One email sent.',
   ('abc\x04One email sent.', 1): '%d emails sent.'}



One or more subtests failed
Failed subtests list: (po_file=PosixPath('/home/stan/PycharmProjects/cpython/Lib/test/test_tools/msgfmt_data/general_po'))


Ran 13 tests in 0.375s

FAILED (failures=1)

This is because of a difference in what information is compiled from general.po header:

"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2024-10-26 18:06+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgfmt.py includes "POT-Creation-Date: 2024-10-26 18:06+0200\n" in the binary mo file whereas msgfmt.c does not.

Binary file diff
$ colordiff -y <(xxd messages.mo) <(xxd general.mo)
00000000: de12 0495 0000 0000 0900 0000 1c00 0000  ..........   00000000: de12 0495 0000 0000 0900 0000 1c00 0000  ..........
00000010: 6400 0000 0d00 0000 ac00 0000 0000 0000  d......... | 00000010: 6400 0000 0000 0000 0000 0000 0000 0000  d.........
00000020: e000 0000 0c00 0000 e100 0000 0900 0000  .......... | 00000020: ac00 0000 0c00 0000 ad00 0000 0900 0000  ..........
00000030: ee00 0000 0f00 0000 f800 0000 1f00 0000  .......... | 00000030: ba00 0000 0f00 0000 c400 0000 1f00 0000  ..........
00000040: 0801 0000 2300 0000 2801 0000 0700 0000  ....#...(. | 00000040: d400 0000 2300 0000 f400 0000 0700 0000  ....#.....
00000050: 4c01 0000 0300 0000 5401 0000 0700 0000  L.......T. | 00000050: 1801 0000 0300 0000 2001 0000 0700 0000  ........ .
00000060: 5801 0000 f500 0000 6001 0000 0e00 0000  X.......`. | 00000060: 2401 0000 1e01 0000 2c01 0000 0e00 0000  $.......,.
00000070: 5602 0000 0c00 0000 6502 0000 1400 0000  V.......e. | 00000070: 4b02 0000 0c00 0000 5a02 0000 1400 0000  K.......Z.
00000080: 7202 0000 1f00 0000 8702 0000 1f00 0000  r......... | 00000080: 6702 0000 1f00 0000 7c02 0000 1f00 0000  g.......|.
00000090: a702 0000 0300 0000 c702 0000 0300 0000  .......... | 00000090: 9c02 0000 0300 0000 bc02 0000 0300 0000  ..........
000000a0: cb02 0000 0300 0000 cf02 0000 0100 0000  .......... | 000000a0: c002 0000 0300 0000 c402 0000 000a 206e  ..........
000000b0: 0300 0000 0000 0000 0800 0000 0900 0000  .......... | 000000b0: 6577 6c69 6e65 7320 0a00 2265 7363 6170  ewlines ..
000000c0: 0700 0000 0200 0000 0000 0000 0400 0000  .......... | 000000c0: 6573 2200 4d75 6c74 696c 696e 6573 7472  es".Multil
000000d0: 0500 0000 0000 0000 0600 0000 0000 0000  .......... | 000000d0: 696e 6700 4f6e 6520 656d 6169 6c20 7365  ing.One em
000000e0: 000a 206e 6577 6c69 6e65 7320 0a00 2265  .. newline | 000000e0: 6e74 2e00 2564 2065 6d61 696c 7320 7365  nt..%d ema
000000f0: 7363 6170 6573 2200 4d75 6c74 696c 696e  scapes".Mu | 000000f0: 6e74 2e00 6162 6304 4f6e 6520 656d 6169  nt..abc.On
00000100: 6573 7472 696e 6700 4f6e 6520 656d 6169  estring.On | 00000100: 6c20 7365 6e74 2e00 2564 2065 6d61 696c  l sent..%d
00000110: 6c20 7365 6e74 2e00 2564 2065 6d61 696c  l sent..%d | 00000110: 7320 7365 6e74 2e00 6162 6304 666f 6f00  s sent..ab
00000120: 7320 7365 6e74 2e00 6162 6304 4f6e 6520  s sent..ab | 00000120: 6261 7200 7879 7a04 666f 6f00 5072 6f6a  bar.xyz.fo
00000130: 656d 6169 6c20 7365 6e74 2e00 2564 2065  email sent | 00000130: 6563 742d 4964 2d56 6572 7369 6f6e 3a20  ect-Id-Ver
00000140: 6d61 696c 7320 7365 6e74 2e00 6162 6304  mails sent | 00000140: 5041 434b 4147 4520 5645 5253 494f 4e0a  PACKAGE VE
00000150: 666f 6f00 6261 7200 7879 7a04 666f 6f00  foo.bar.xy | 00000150: 504f 542d 4372 6561 7469 6f6e 2d44 6174  POT-Creati
00000160: 5072 6f6a 6563 742d 4964 2d56 6572 7369  Project-Id | 00000160: 653a 2032 3032 342d 3130 2d32 3620 3138  e: 2024-10
00000170: 6f6e 3a20 5041 434b 4147 4520 5645 5253  on: PACKAG | 00000170: 3a30 362b 3032 3030 0a50 4f2d 5265 7669  :06+0200.P
00000180: 494f 4e0a 504f 2d52 6576 6973 696f 6e2d  ION.PO-Rev | 00000180: 7369 6f6e 2d44 6174 653a 2059 4541 522d  sion-Date:
00000190: 4461 7465 3a20 5945 4152 2d4d 4f2d 4441  Date: YEAR | 00000190: 4d4f 2d44 4120 484f 3a4d 492b 5a4f 4e45  MO-DA HO:M
000001a0: 2048 4f3a 4d49 2b5a 4f4e 450a 4c61 7374   HO:MI+ZON | 000001a0: 0a4c 6173 742d 5472 616e 736c 6174 6f72  .Last-Tran
000001b0: 2d54 7261 6e73 6c61 746f 723a 2046 554c  -Translato | 000001b0: 3a20 4655 4c4c 204e 414d 4520 3c45 4d41  : FULL NAM
000001c0: 4c20 4e41 4d45 203c 454d 4149 4c40 4144  L NAME <EM | 000001c0: 494c 4041 4444 5245 5353 3e0a 4c61 6e67  IL@ADDRESS
000001d0: 4452 4553 533e 0a4c 616e 6775 6167 652d  DRESS>.Lan | 000001d0: 7561 6765 2d54 6561 6d3a 204c 414e 4755  uage-Team:
000001e0: 5465 616d 3a20 4c41 4e47 5541 4745 203c  Team: LANG | 000001e0: 4147 4520 3c4c 4c40 6c69 2e6f 7267 3e0a  AGE <LL@li
000001f0: 4c4c 406c 692e 6f72 673e 0a4d 494d 452d  LL@li.org> | 000001f0: 4d49 4d45 2d56 6572 7369 6f6e 3a20 312e  MIME-Versi
00000200: 5665 7273 696f 6e3a 2031 2e30 0a43 6f6e  Version: 1 | 00000200: 300a 436f 6e74 656e 742d 5479 7065 3a20  0.Content-
00000210: 7465 6e74 2d54 7970 653a 2074 6578 742f  tent-Type: | 00000210: 7465 7874 2f70 6c61 696e 3b20 6368 6172  text/plain
00000220: 706c 6169 6e3b 2063 6861 7273 6574 3d55  plain; cha | 00000220: 7365 743d 5554 462d 380a 436f 6e74 656e  set=UTF-8.
00000230: 5446 2d38 0a43 6f6e 7465 6e74 2d54 7261  TF-8.Conte | 00000230: 742d 5472 616e 7366 6572 2d45 6e63 6f64  t-Transfer
00000240: 6e73 6665 722d 456e 636f 6469 6e67 3a20  nsfer-Enco | 00000240: 696e 673a 2038 6269 740a 000a 2074 7261  ing: 8bit.
00000250: 3862 6974 0a00 0a20 7472 616e 736c 6174  8bit... tr | 00000250: 6e73 6c61 7465 6420 0a00 2274 7261 6e73  nslated ..
00000260: 6564 200a 0022 7472 616e 736c 6174 6564  ed .."tran | 00000260: 6c61 7465 6422 004d 756c 7469 6c69 6e65  lated".Mul
00000270: 2200 4d75 6c74 696c 696e 6574 7261 6e73  ".Multilin | 00000270: 7472 616e 736c 6174 696f 6e00 4f6e 6520  translatio
00000280: 6c61 7469 6f6e 004f 6e65 2065 6d61 696c  lation.One | 00000280: 656d 6169 6c20 7365 6e74 2e00 2564 2065  email sent
00000290: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d  | 00000290: 6d61 696c 7320 7365 6e74 2e00 4f6e 6520  mails sent
000002a0: 2073 656e 742e 004f 6e65 2065 6d61 696c   sent..One | 000002a0: 656d 6169 6c20 7365 6e74 2e00 2564 2065  email sent
000002b0: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d  | 000002b0: 6d61 696c 7320 7365 6e74 2e00 6261 7200  mails sent
000002c0: 2073 656e 742e 0062 6172 0062 617a 0062   sent..bar | 000002c0: 6261 7a00 6261 7200                      baz.bar.
000002d0: 6172 00                                  ar.        <

This is an inconsistency and I presume we want to be consistent with files generated by the GNU versions looking at tests.

I discovered this when working on #131725 Where if you remove the problematic line from the header and generate the .mo with my patch you get %100 consistency with the msgfmt.c generated .mo

No diff
$ colordiff -y <(xxd messages.mo) <(xxd general.mo)
00000000: de12 0495 0000 0000 0900 0000 1c00 0000  ..........	00000000: de12 0495 0000 0000 0900 0000 1c00 0000  ..........
00000010: 6400 0000 0d00 0000 ac00 0000 0000 0000  d.........	00000010: 6400 0000 0d00 0000 ac00 0000 0000 0000  d.........
00000020: e000 0000 0c00 0000 e100 0000 0900 0000  ..........	00000020: e000 0000 0c00 0000 e100 0000 0900 0000  ..........
00000030: ee00 0000 0f00 0000 f800 0000 1f00 0000  ..........	00000030: ee00 0000 0f00 0000 f800 0000 1f00 0000  ..........
00000040: 0801 0000 2300 0000 2801 0000 0700 0000  ....#...(.	00000040: 0801 0000 2300 0000 2801 0000 0700 0000  ....#...(.
00000050: 4c01 0000 0300 0000 5401 0000 0700 0000  L.......T.	00000050: 4c01 0000 0300 0000 5401 0000 0700 0000  L.......T.
00000060: 5801 0000 f500 0000 6001 0000 0e00 0000  X.......`.	00000060: 5801 0000 f500 0000 6001 0000 0e00 0000  X.......`.
00000070: 5602 0000 0c00 0000 6502 0000 1400 0000  V.......e.	00000070: 5602 0000 0c00 0000 6502 0000 1400 0000  V.......e.
00000080: 7202 0000 1f00 0000 8702 0000 1f00 0000  r.........	00000080: 7202 0000 1f00 0000 8702 0000 1f00 0000  r.........
00000090: a702 0000 0300 0000 c702 0000 0300 0000  ..........	00000090: a702 0000 0300 0000 c702 0000 0300 0000  ..........
000000a0: cb02 0000 0300 0000 cf02 0000 0100 0000  ..........	000000a0: cb02 0000 0300 0000 cf02 0000 0100 0000  ..........
000000b0: 0300 0000 0000 0000 0800 0000 0900 0000  ..........	000000b0: 0300 0000 0000 0000 0800 0000 0900 0000  ..........
000000c0: 0700 0000 0200 0000 0000 0000 0400 0000  ..........	000000c0: 0700 0000 0200 0000 0000 0000 0400 0000  ..........
000000d0: 0500 0000 0000 0000 0600 0000 0000 0000  ..........	000000d0: 0500 0000 0000 0000 0600 0000 0000 0000  ..........
000000e0: 000a 206e 6577 6c69 6e65 7320 0a00 2265  .. newline	000000e0: 000a 206e 6577 6c69 6e65 7320 0a00 2265  .. newline
000000f0: 7363 6170 6573 2200 4d75 6c74 696c 696e  scapes".Mu	000000f0: 7363 6170 6573 2200 4d75 6c74 696c 696e  scapes".Mu
00000100: 6573 7472 696e 6700 4f6e 6520 656d 6169  estring.On	00000100: 6573 7472 696e 6700 4f6e 6520 656d 6169  estring.On
00000110: 6c20 7365 6e74 2e00 2564 2065 6d61 696c  l sent..%d	00000110: 6c20 7365 6e74 2e00 2564 2065 6d61 696c  l sent..%d
00000120: 7320 7365 6e74 2e00 6162 6304 4f6e 6520  s sent..ab	00000120: 7320 7365 6e74 2e00 6162 6304 4f6e 6520  s sent..ab
00000130: 656d 6169 6c20 7365 6e74 2e00 2564 2065  email sent	00000130: 656d 6169 6c20 7365 6e74 2e00 2564 2065  email sent
00000140: 6d61 696c 7320 7365 6e74 2e00 6162 6304  mails sent	00000140: 6d61 696c 7320 7365 6e74 2e00 6162 6304  mails sent
00000150: 666f 6f00 6261 7200 7879 7a04 666f 6f00  foo.bar.xy	00000150: 666f 6f00 6261 7200 7879 7a04 666f 6f00  foo.bar.xy
00000160: 5072 6f6a 6563 742d 4964 2d56 6572 7369  Project-Id	00000160: 5072 6f6a 6563 742d 4964 2d56 6572 7369  Project-Id
00000170: 6f6e 3a20 5041 434b 4147 4520 5645 5253  on: PACKAG	00000170: 6f6e 3a20 5041 434b 4147 4520 5645 5253  on: PACKAG
00000180: 494f 4e0a 504f 2d52 6576 6973 696f 6e2d  ION.PO-Rev	00000180: 494f 4e0a 504f 2d52 6576 6973 696f 6e2d  ION.PO-Rev
00000190: 4461 7465 3a20 5945 4152 2d4d 4f2d 4441  Date: YEAR	00000190: 4461 7465 3a20 5945 4152 2d4d 4f2d 4441  Date: YEAR
000001a0: 2048 4f3a 4d49 2b5a 4f4e 450a 4c61 7374   HO:MI+ZON	000001a0: 2048 4f3a 4d49 2b5a 4f4e 450a 4c61 7374   HO:MI+ZON
000001b0: 2d54 7261 6e73 6c61 746f 723a 2046 554c  -Translato	000001b0: 2d54 7261 6e73 6c61 746f 723a 2046 554c  -Translato
000001c0: 4c20 4e41 4d45 203c 454d 4149 4c40 4144  L NAME <EM	000001c0: 4c20 4e41 4d45 203c 454d 4149 4c40 4144  L NAME <EM
000001d0: 4452 4553 533e 0a4c 616e 6775 6167 652d  DRESS>.Lan	000001d0: 4452 4553 533e 0a4c 616e 6775 6167 652d  DRESS>.Lan
000001e0: 5465 616d 3a20 4c41 4e47 5541 4745 203c  Team: LANG	000001e0: 5465 616d 3a20 4c41 4e47 5541 4745 203c  Team: LANG
000001f0: 4c4c 406c 692e 6f72 673e 0a4d 494d 452d  LL@li.org>	000001f0: 4c4c 406c 692e 6f72 673e 0a4d 494d 452d  LL@li.org>
00000200: 5665 7273 696f 6e3a 2031 2e30 0a43 6f6e  Version: 1	00000200: 5665 7273 696f 6e3a 2031 2e30 0a43 6f6e  Version: 1
00000210: 7465 6e74 2d54 7970 653a 2074 6578 742f  tent-Type:	00000210: 7465 6e74 2d54 7970 653a 2074 6578 742f  tent-Type:
00000220: 706c 6169 6e3b 2063 6861 7273 6574 3d55  plain; cha	00000220: 706c 6169 6e3b 2063 6861 7273 6574 3d55  plain; cha
00000230: 5446 2d38 0a43 6f6e 7465 6e74 2d54 7261  TF-8.Conte	00000230: 5446 2d38 0a43 6f6e 7465 6e74 2d54 7261  TF-8.Conte
00000240: 6e73 6665 722d 456e 636f 6469 6e67 3a20  nsfer-Enco	00000240: 6e73 6665 722d 456e 636f 6469 6e67 3a20  nsfer-Enco
00000250: 3862 6974 0a00 0a20 7472 616e 736c 6174  8bit... tr	00000250: 3862 6974 0a00 0a20 7472 616e 736c 6174  8bit... tr
00000260: 6564 200a 0022 7472 616e 736c 6174 6564  ed .."tran	00000260: 6564 200a 0022 7472 616e 736c 6174 6564  ed .."tran
00000270: 2200 4d75 6c74 696c 696e 6574 7261 6e73  ".Multilin	00000270: 2200 4d75 6c74 696c 696e 6574 7261 6e73  ".Multilin
00000280: 6c61 7469 6f6e 004f 6e65 2065 6d61 696c  lation.One	00000280: 6c61 7469 6f6e 004f 6e65 2065 6d61 696c  lation.One
00000290: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d 	00000290: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d 
000002a0: 2073 656e 742e 004f 6e65 2065 6d61 696c   sent..One	000002a0: 2073 656e 742e 004f 6e65 2065 6d61 696c   sent..One
000002b0: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d 	000002b0: 2073 656e 742e 0025 6420 656d 6169 6c73   sent..%d 
000002c0: 2073 656e 742e 0062 6172 0062 617a 0062   sent..bar	000002c0: 2073 656e 742e 0062 6172 0062 617a 0062   sent..bar
000002d0: 6172 00                                  ar.		000002d0: 6172 00                                  ar.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions