Skip to content

MiscTable is not handled correctly by audformat.Database.update() #466

@hagenw

Description

@hagenw

When combining to databases with a misc table, the misc table should be combined in the end as well.

Minimal example:

import audformat
import pandas as pd


db1 = audformat.Database("mydb")
db1.schemes["answer1"] = audformat.Scheme("str")
db1["sessions"] = audformat.MiscTable(pd.Index(["a"], name="session"))
db1["sessions"]["prompt_1"] = audformat.Column(scheme_id="answer1")
db1["sessions"]["prompt_1"].set(["response1"])

db2 = audformat.Database("mydb")
db2.schemes["answer1"] = audformat.Scheme("str")
db2.schemes["answer2"] = audformat.Scheme("str")
db2["sessions"] = audformat.MiscTable(pd.Index(["b"], name="session"))
db2["sessions"]["prompt_1"] = audformat.Column(scheme_id="answer1")
db2["sessions"]["prompt_2"] = audformat.Column(scheme_id="answer2")
db2["sessions"]["prompt_1"].set(["response1"])
db2["sessions"]["prompt_2"].set(["response2"])

db1.update(db2)

Expected output:

>>> db1["sessions"].df
          prompt_1   prompt_2
session                      
a        response1       <NA>
b        response1  response2

But instead we get:

>>> db1["sessions"].df
          prompt_1   prompt_2
session                      
b        response1  response2

This does not happen for a filewise index:

import audformat
import pandas as pd


db1 = audformat.Database("mydb")
db1.schemes["answer1"] = audformat.Scheme("str")
db1["sessions"] = audformat.Table(audformat.filewise_index(["a"]))
db1["sessions"]["prompt_1"] = audformat.Column(scheme_id="answer1")
db1["sessions"]["prompt_1"].set(["response1"])

db2 = audformat.Database("mydb")
db2.schemes["answer1"] = audformat.Scheme("str")
db2.schemes["answer2"] = audformat.Scheme("str")
db2["sessions"] = audformat.Table(audformat.filewise_index(["b"]))
db2["sessions"]["prompt_1"] = audformat.Column(scheme_id="answer1")
db2["sessions"]["prompt_2"] = audformat.Column(scheme_id="answer2")
db2["sessions"]["prompt_1"].set(["response1"])
db2["sessions"]["prompt_2"].set(["response2"])

db1.update(db2)

There we get

>>> db1["sessions"].df
       prompt_1   prompt_2
file                      
a     response1       <NA>
b     response1  response2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions