BUG: Unexpected behaviour when reading large text files with mixed datatypes

read_csv gives unexpected behaviour with large files if a column contains both strings and integers. eg

``` python

>>> df=DataFrame({'colA':range(500000-1)+['apple', 'pear']+range(500000-1)})
len(set(df.colA))
500001

>>> df.to_csv('testpandas2.txt')
>>> df2=read_csv('testpandas2.txt')
>>> len(set(df2.colA))
762143

 >>> pandas.__version__
'0.11.0'
```

It seems some of the integers are parsed as integers and others as strings.

``` python
>>> list(set(df2.colA))[-10:]
['282248', '282249', '282240', '282241', '282242', '15679', '282244', '282245', '282246', '282247']
>>> list(set(df2.colA))[:10]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Unexpected behaviour when reading large text files with mixed datatypes #3866

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: Unexpected behaviour when reading large text files with mixed datatypes #3866

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions