Efficiently handle new data

**What is the feature request? What problem does it solve?**
We want to scrape jira or confluence on a periodic interval. 
However we want to make sure that we don't republish all content to the vector store and instead only publish the latest changed data. 

**Defintion of done**
After a first scrape of the datasource is done, every successive scrape should update the minimum amount of rows.