Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made little adjustment to convert easier #7

Open
leozicai opened this issue Nov 24, 2020 · 0 comments
Open

Made little adjustment to convert easier #7

leozicai opened this issue Nov 24, 2020 · 0 comments

Comments

@leozicai
Copy link

leozicai commented Nov 24, 2020

Thank you very much for the code you provided. I made a small modification and it might be easier to use.

In practical applications, annotations are often not accurate, and sometimes need to be cleaned, or data needs to be moved. Therefore, it is often inaccurate to directly read the path in xml, and the location of pictures and annotations has often changed. I think that in practical applications, like xml annotations, if the image name is the same as the xml name, it can be considered as a group, which is better.

Specific method: read the name of the corresponding image through the xml path and write it into the json file. (function get_image_info and function convert_xmls_to_cocojson)

def get_image_info(annotation_root, imgname, extract_num_from_imgid=True):
filename = imgname # First Change
img_name = os.path.basename(filename)
img_id = os.path.splitext(img_name)[0]
if extract_num_from_imgid and isinstance(img_id, str):
img_id = int(re.findall(r'\d+', img_id)[0])
size = annotation_root.find('size')
width = int(size.findtext('width'))
height = int(size.findtext('height'))
image_info = {
'file_name': filename,
'height': height,
'width': width,
'id': img_id
}
return image_info

def convert_xmls_to_cocojson(annotation_paths: List[str],
label2id: Dict[str, int],
output_jsonpath: str,
extract_num_from_imgid: bool = True):
output_json_dict = {
"images": [],
"type": "instances",
"annotations": [],
"categories": []
}
bnd_id = 1 # START_BOUNDING_BOX_ID, TODO input as args ?
print('Start converting !')
for a_path in tqdm(annotation_paths):
# Read annotation xml
ann_tree = ET.parse(a_path)
ann_root = ann_tree.getroot()
imgn = (a_path.split('/')[-1]).split('.')[0] + '.jpg' # Second Change: get the image name from xml path
img_info = get_image_info(annotation_root=ann_root,
extract_num_from_imgid=extract_num_from_imgid,
imgname=imgn)
img_id = img_info['id']
output_json_dict['images'].append(img_info)
for obj in ann_root.findall('object'):
ann = get_coco_annotation_from_obj(obj=obj, label2id=label2id)
ann.update({'image_id': img_id, 'id': bnd_id})
output_json_dict['annotations'].append(ann)
bnd_id = bnd_id + 1
for label, label_id in label2id.items():
category_info = {'supercategory': 'none', 'id': label_id, 'name': label}
output_json_dict['categories'].append(category_info)
with open(output_jsonpath, 'w') as f:
output_json = json.dumps(output_json_dict)
f.write(output_json)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant