Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Multi Smaller Files From a Big Json File #3121

Closed
washere opened this issue May 15, 2024 · 5 comments
Closed

Create Multi Smaller Files From a Big Json File #3121

washere opened this issue May 15, 2024 · 5 comments

Comments

@washere
Copy link

washere commented May 15, 2024

If a JSON file format is like this, example of first 2 notes:


[
   {
      "_id": 1,
      "title": "First Note",
      "note": "Blurb blurb blurb blurb \n blurb blurb blurb \n",
      "category": 6
   },
   {
      "_id": 2,
      "title": "My Second Thought",
      "note": "Blah blah blah \n blah blah \n\n\n",
      "category": 3
   }
]


Can we have each as a separate file, so this would be the first txt file:


      1
      First Note
      Blurb blurb blurb blurb blurb \n\n\n\n
      6

and so on for all notes. Is it possible to do this in jq or am I wasting my time?
Thanks.

@pkoppstein
Copy link
Contributor

The bad news is that jq alone is not up to the job.

The good news is that jq was designed to, and does, work well with other command-line tools.

For details, see e.g. https://stackoverflow.com/questions/70569726/jq-split-json-in-several-files

@washere
Copy link
Author

washere commented May 15, 2024

Thanks buddy.
I was thinking of awk & sed.
RegExp groupings, /1 /2 /3, etc will be mighty cool in jq if on roadmap
Thanks again. 👍

@pkoppstein
Copy link
Contributor

@washere wrote:

I was thinking of awk & sed.

jq and awk work very well together for the use-case you mention.

RegExp groupings

Not sure what you're referring to, but please note that jq does support (nested) regex groupings by name, e.g.

jq -n '"January 3rd, 2020" | capture("(?<month>(?<mon>[^ ]{3})[^ ]+) (?<day>[0-9]+)[^, ]*, *(?<year>[0-9]+)")'
{
  "month": "January",
  "mon": "Jan",
  "day": "3",
  "year": "2020"
}

Since these jqlang "issues" pages are mainly for reporting bugs and requesting enhancements, we generally ask that usage questions and the like be posted to https://stackoverflow.com/questions/tagged/jq where you'll likely get timely and useful responses. If you have a specific ER to make, by all means do so; otherwise, please consider closing this "issue".

@washere
Copy link
Author

washere commented May 15, 2024

Thanks.
I actually already did the task.
I deleted blank lines or with just { or } etc in sublime_text.
In find/replace fields of sublime, can have carriage_returns, (CTRL+Return).

I deleted 1st line of each note (.db primary key) by regexp grouping (need to click regexp icon next to find/replace fields):

"_id": (.*),\n

.* is whatever

renamed category:

      "category": (.*)

to 

      "TYPE": (/1)


/1 being whatever .*

So ended up with 3 lines per note.
Some note lines (2nd line) being 150,000 characters!

Then just used:

split -l 3 mynotes.json Note-

ie: create a new file called Note-xxx from every 3 lines (-l 3).
It created about 900 big text files in a few seconds.

It worked great.
Writing here so might help someone in future searching closed issues.

Although this can be a label (Support or Feature Request), I agree with you so closing this.
Also thanks for the actual support link, would have never found it!
Will keep an eye on jq.
Thanks again 👍

@washere washere closed this as completed May 15, 2024
@washere
Copy link
Author

washere commented May 15, 2024

In case anyone in future needs this:

Because all note content is in a line (single field (cell) of SQLite table (Row for each record) before exportimg to JSON), there will be lots of:

\n
\r
plus some:
\t

2 problems exist, to change them into actual New_Lines (Carriage Returns):

  • Do the replace operation on all files at once (otherwise Sublime_Text is best for single file)
    &
  • Notorious common problem of not getting actual New Lines inserted with text (Hex is easy)

The trick with the latter is to have triple back slashes:


find -type f -exec sed -i 's/\\\n/\\r/g' {} \;
find -type f -exec sed -i 's/\\\r/\\r/g' {} \;
find -type f -exec sed -i 's/\\\t/\\r/g' {} \;


This one gets rid of multiple blank lines, I ran it 5 times, maximum blank lines will be two:

find -type f -exec sed -i 's/\\r\\r\\r/\\r\\r/g' {} \;


ABOVE commands operate on ALL FILES in the directory you're in (pwd).
To apply to other folders, you need to add the folder path to above commands.
But it's easier to be in the directory where the files are, then just run these commands.
Test on multi backup folders, delete and copy back files to test again the look.

Then notes look nice, good luck.

P.S.
This code takes first line of each file (note title) & names the file with it:

make sure in terminal you are in the folder where all the files are, saves typing path in commands.

Just paste the whole chunk in a terminal & return to run it:


for file in *
do
   # Avoid renaming diretories!
   if [ -f "$file" ]
   then
       a=`head -1 $file`
       b=`tail -n 1 $file`
       newname="${a} ${b}"
       if [ -f "$newname" ]
       then
              echo "Cannot rename $file to $newname - file already exists"
       else
              mv "$file" $(echo "$newname.txt" | sed -e 's/[^A-Za-z0-9._-]/_/g')
       fi
   fi
done

This line:
b=tail -n 1 $file
is last line of each file, category, i just append it to filename, you can delete it.

P.P.S.:
The find-type...sed commands above to remove back slashes \ in sqlite single line outputs (\n \t etc) and replace with actual new lines (\r) can be erratic & behave differently often. So I suggest this app which can be found in the Gnome Software app store or Mintinstall app store or Discover app store etc or Synaptics package manager. Or even direct download from sourceforge, I recommend using it instead for mass grep & replace on multi files:

https://regexxer.sourceforge.net/

https://mail.gnome.org/archives/gnome-announce-list/2004-July/msg00022.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants