About 17w training samples in papers

The paper mentioned the use of the gobjaverse dataset, which has about 170,000 shapes. However, the current gobjaverse index json contains 260,000 shapes. After removing more than 40,000 poor quality shapes, there are still 220,000 shapes. How were the extra 50,000 shapes screened out?