Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving ipynb file corrupts the JSON format - Unexpected end of JSON input #181382

Open
Rane90 opened this issue May 3, 2023 · 4 comments
Open
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug important Issue identified as high-priority info-needed Issue requires more information from poster notebook-serialization

Comments

@Rane90
Copy link

Rane90 commented May 3, 2023

Hi,

I've been working with vs code via SSH for quite a few years now and encountered and new bug yesterday that makes working with .ipynb files almost impossible.

What happens is - After saving the file the JSON format is corrupted and it cannot be accessed again. I need to recreate it from scratch.
There are some strange behaviors for this bug:

  1. Often the file saving happens regardless of me saving or not.
  2. Sometimes vs-code explicitly asks me if it's ok to save stating it needs to overwrite changes.
  3. Looking into the JSON file with the text editor, it seems as though it simply "cuts out" the last part of the file. See example:
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for epoch in range(num_epochs):\n",
    "    losses = []\n",
    "    for batch_images, batch_labels in tqdm(train_loader):\n",
    "        optimizer.zero_grad()\n",
    "        batch_images = batch_images.to(device).cuda().fl

*** Notice the string doesn't end properly.

I assume this has to do with the print output of the code cell (I'm using Python's tqdm library)

Does this issue occur when all extensions are disabled?: Yes/No

All extensions are active.

  • VS Code Version: 1.77.3
  • OS Version: Linux:

NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

Steps to Reproduce:

  1. This is the cell causing the issue:
for epoch in range(num_epochs):
    losses = []
    for batch_images, batch_labels in tqdm(train_loader):
        optimizer.zero_grad()
        batch_images = batch_images.to(device).cuda().double().reshape(batch_images.shape[0], 3, 224, 224)
        output = model(batch_images)
        loss = nn.MSELoss()(batch_images, output)
        loss.backword()
        losses.append(loss.item())
        optimizer.step()
    if epoch % 1 == 0:
        print("Epoch [{}/{}], MSE Loss = {:.8}".format(epoch, num_epochs, np.mean(losses)))
    if epoch % 1 == 0:
        umap_validation(model, train_loader, device, epoch)

(Let me know if you wish me to upload the entire notebook)
2. The dataset I'm using is very large - The COCO dataset (~18GB)

@Rane90
Copy link
Author

Rane90 commented May 3, 2023

Of course, I can add these lines to the JSON raw text but this is just a workaround:

               " # End of text
            ]
        }
    ]
}

@aeschli aeschli assigned rebornix and unassigned aeschli May 3, 2023
@rebornix rebornix added bug Issue identified by VS Code Team member as probable bug important Issue identified as high-priority notebook-serialization labels May 3, 2023
@rebornix
Copy link
Member

rebornix commented May 3, 2023

@Rane90 Thanks for the detailed report, it's a critical bug and thank you for sharing with us.

Often the file saving happens regardless of me saving or not

Do you have auto save turned on?

Sometimes vs-code explicitly asks me if it's ok to save stating it needs to overwrite changes.

This usually means that there is another process attempting to modify the file, and when that happens and you attempt to save, the file on disk is newer so we will ask if you want to override.

  1. The dataset I'm using is very large - The COCO dataset (~18GB)

This made me wonder if it's possible that the extension host crashed when we attempt to save. Is the dataset loaded into the kernel fully?

@rebornix rebornix added the info-needed Issue requires more information from poster label May 3, 2023
@rebornix rebornix added this to the May 2023 milestone May 3, 2023
@Rane90
Copy link
Author

Rane90 commented May 4, 2023

Thank you for your response.

Autosave is not turned on.
And yes, a large portion of the dataset is loaded into the kernel. I'm working on a remote server and have even been notified that I'm using too much disk space. Although it's very logical that these two are related I cannot see how to avoid this behavior.

@rebornix rebornix modified the milestones: May 2023, June 2023 May 31, 2023
@rebornix rebornix modified the milestones: June 2023, July 2023 Jun 26, 2023
@rebornix rebornix modified the milestones: July 2023, August 2023 Jul 24, 2023
@atapley
Copy link

atapley commented Aug 25, 2023

Commenting to say I also just ran into this issue. I was attempting to save a dataset and I ran out of room in my container. After this happened, the notebook got corrupted and cut off a portion of my file that cannot be re-added.

@rebornix rebornix modified the milestones: August 2023, September 2023 Aug 30, 2023
@rebornix rebornix modified the milestones: September 2023, October 2023 Sep 26, 2023
@rebornix rebornix modified the milestones: October 2023, November 2023 Oct 24, 2023
@rebornix rebornix modified the milestones: November 2023, December 2023 Nov 28, 2023
@rebornix rebornix modified the milestones: December / January 2024, February 2024 Jan 23, 2024
@rebornix rebornix modified the milestones: February 2024, March 2024 Feb 21, 2024
@rebornix rebornix modified the milestones: March 2024, April 2024 Mar 26, 2024
@rebornix rebornix modified the milestones: April 2024, May 2024 Apr 23, 2024
@rebornix rebornix modified the milestones: May 2024, June 2024 May 29, 2024
@rebornix rebornix modified the milestones: June 2024, July 2024 Jun 24, 2024
@rebornix rebornix modified the milestones: July 2024, August 2024 Jul 23, 2024
@rebornix rebornix modified the milestones: August 2024, September 2024 Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug important Issue identified as high-priority info-needed Issue requires more information from poster notebook-serialization
Projects
None yet
Development

No branches or pull requests

4 participants