Revamp of the LOAD CSV tutorial #470

lidiazuin · 2025-05-06T11:10:37Z

Review of the csv-import.adoc page and addition of the csv-file.adoc page for general reference.

AlexicaWright

This a review of the "Working with CSV files" section. I'll have to review the tutorial separately.

modules/ROOT/pages/data-import/csv-files.adoc

AlexicaWright · 2025-05-06T12:16:20Z

modules/ROOT/pages/data-import/csv-files.adoc

+
+=== Data format
+
+All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed.


Suggested change

All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed.

Neo4j reads all data from the CSV file as a string, for other data types, you need to use `toInteger()`, `toFloat()`, `toBoolean()`, or similar functions to convert data to the appropriate type.

split() doesn't change the data type from string, but splits it into separate entities, so it feels odd to group it with the functions that changes the data type. It's mentioned later though, so maybe it's ok?

AlexicaWright · 2025-05-06T13:09:05Z

modules/ROOT/pages/data-import/csv-files.adoc

+=== Field terminator
+
+Also known as delimiter, a field terminator is a character used to separate each field in a CSV file.
+In this example, a comma (`,`) is used, but other characters, such as a tab (`\`) or a pipe (`|`) also work and they can be blended:


Suggested change

In this example, a comma (`,`) is used, but other characters, such as a tab (`\`) or a pipe (`|`) also work and they can be blended:

In this example, a comma (`,`) is used, but other characters, such as a tab (`\t`) or a pipe (`|`) also work and they can be blended:

Not sure if we need to escape the tab to make it render?
If you use a tab, the format is called TSV.

No, the tab is working normally here, both building locally and in Surge. Regarding the TSV file format, do you think it's better to mention it or to remove the tab option as it would make the file a TSV instead of a CSV, which is the topic of this page?

CSV and TSV are both flat files and there is no other difference AFAIK. About the tab, it's not just a forward slash but a t also \t.

modules/ROOT/pages/data-import/csv-files.adoc

AlexicaWright · 2025-05-08T13:44:06Z

modules/ROOT/pages/data-import/csv-files.adoc

+For best performance, always `MATCH` and `MERGE` on a single label with the indexed primary-key property.
+====
+
+Suppose you use xref:#_converting_data_values[the preceding *companies.csv* file], and now you have a file that contains people and which companies they work for:


Suggested change

Suppose you use xref:#_converting_data_values[the preceding *companies.csv* file], and now you have a file that contains people and which companies they work for:

Suppose that you have another file that contains people and which companies they work for using a reference to the xref:#_converting_data_values[*companies.csv* file:

AlexicaWright · 2025-05-08T13:55:21Z

modules/ROOT/pages/data-import/csv-files.adoc

+4,Karen White,1
+----
+
+You should also separate node and relationship creation on a separate processing.


Suggested change

You should also separate node and relationship creation on a separate processing.

To load these two files and create the appropriate relationships between the people from the `people.csv` file with the companies they work for in the `companies.csv` file, you need to load them both and first create nodes from the files, and then create the relationships between them.

To make this process more efficient, it is recommended to separate these tasks, i.e. create the nodes in one clause per file, and then a separate clause to create the relationships.

modules/ROOT/pages/data-import/csv-files.adoc

AlexicaWright · 2025-05-08T13:57:06Z

modules/ROOT/pages/data-import/csv-files.adoc

+
+[source,cypher,role=noplay]
+----
+// clear data


Is this necessary?

AlexicaWright · 2025-05-08T13:58:50Z

modules/ROOT/pages/data-import/csv-files.adoc

+MATCH (e:Employee {employeeId: row.employeeId})
+MATCH (c:Company {companyId: row.Company})
+MERGE (e)-[:WORKS_FOR]->(c)
+RETURN *;


What is returned here?

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

modules/ROOT/pages/data-import/csv-import.adoc

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

…rt subpage and admonition for GraphAcademy

…d into csv-import

AlexicaWright

Some further comments, but we're getting there! Thank you @lidiazuin !

modules/ROOT/pages/data-import/csv-files.adoc

AlexicaWright · 2025-06-03T11:39:47Z

modules/ROOT/pages/data-import/csv-files.adoc

+--
+
+Here, the movie and person data (including the IDs) is repeated in different rows every time new information about a particular actor's role is featured.
+This sort of duplication compromises the structure of the data, which means you need to xref:#_preparing_the_csv_file[prepare your file] before importing.


Maybe this can be rephrased a little? The duplication doesn't really compromise the structure of the data in general, does it? Only if you want your data in a graph structure.
Also, the link doesn't work.

AlexicaWright · 2025-06-03T11:41:53Z

modules/ROOT/pages/data-import/csv-files.adoc

+Here, the movie and person data (including the IDs) is repeated in different rows every time new information about a particular actor's role is featured.
+This sort of duplication compromises the structure of the data, which means you need to xref:#_preparing_the_csv_file[prepare your file] before importing.
+
+== File location


I still think this section should move to Using LOAD CSV . This page is all about the actual file, not uploading it.

modules/ROOT/pages/data-import/csv-files.adoc

modules/ROOT/pages/data-import/csv-import.adoc

AlexicaWright · 2025-06-03T13:34:48Z

modules/ROOT/pages/data-import/csv-import.adoc

+* xref:data-import/csv-files.adoc#_cleaning_up[*Cleaning up CSV files*]: see how to use the `LOAD CSV` command to clean up the file while importing.
+* xref:data-import/csv-files.adoc#_optimization[*Optimization*]: improve performance when working with large amounts of data or complex loading.


These three are all on the same page. Wouldn't it suffice to say "See xref:data-import/csv-files.adoc[Working with CSV files] to learn more about the structure of data, how to clean it up, and optimize it."?

modules/ROOT/pages/data-import/index.adoc

AlexicaWright · 2025-06-03T13:37:53Z

modules/ROOT/pages/data-import/index.adoc

+====
+For a more hands-on option, see the available link:https://graphacademy.neo4j.com/categories/?search=import[GraphAcademy courses] on data import.
+====
+
 == Methods comparison

 The following table shows all supported methods for importing data into Neo4j:


The plot grows thicker.. In Desktop2, "Import" is available, but only for CSV. It is built-in so it's not standalone Data Importer...

You mean the Open folder > Import option?

No, Desktop2 has Importer built in, just like the Aura console, but it only supports csv files.

modules/ROOT/pages/data-import/csv-import.adoc

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

AlexicaWright

Some more comments ;)

AlexicaWright · 2025-06-11T19:11:56Z

modules/ROOT/pages/data-import/index.adoc

+====
+For a more hands-on option, see the available link:https://graphacademy.neo4j.com/categories/?search=import[GraphAcademy courses] on data import.
+====
+
 == Methods comparison

 The following table shows all supported methods for importing data into Neo4j:


No, Desktop2 has Importer built in, just like the Aura console, but it only supports csv files.

AlexicaWright · 2025-06-13T12:12:59Z

modules/ROOT/pages/data-import/csv-import.adoc

+Before loading the file, you need to first create an link:https://neo4j.com/product/auradb/[Aura instance] or choose a link:{docs-home}/deployment-options[deployment of your choice].
+Then, you can load the file using `LOAD CSV` using the following command:


Suggested change

Before loading the file, you need to first create an link:https://neo4j.com/product/auradb/[Aura instance] or choose a link:{docs-home}/deployment-options[deployment of your choice].

Then, you can load the file using `LOAD CSV` using the following command:

The `LOAD CSV` command can be used to load data into any deployment of Neo4j, whether it is an link:https://neo4j.com/product/auradb/[Aura instance] or a local installation.

See link:{docs-home}/deployment-options[deployment options] for information.

The command looks like this:

AlexicaWright · 2025-06-13T12:39:34Z

modules/ROOT/pages/data-import/csv-import.adoc

+[source,cypher]
+--
+LOAD CSV [WITH HEADERS] FROM url [AS alias] [FIELDTERMINATOR char]
+--


Suggested change

--

--

If you include the optional `WITH HEADERS`, the first line of the CSV file is treated as a header and each row is treated as a map of key-value pairs rather than a list of values.

`FROM` lets you specify the location whether it is local or over the internet and it cannot be omitted.

`AS alias` names each row for reference.

The default field terminator in CSV files is the comma, but others are supported and can be specified using the parsing option `FIELDTERMINATOR`.

This is just a suggestion, but since it is a tutorial about this command, I think it's worthwhile to break down the basic command and inform what each part does.

AlexicaWright · 2025-06-13T12:40:51Z

modules/ROOT/pages/data-import/csv-import.adoc

-//Example 2 - file placed in subdirectory within import directory (import/northwind/customers.csv)
-LOAD CSV FROM "file:///northwind/customers.csv"
----
+This is the content of the example `people.csv` file:


Maybe use the result of running that command instead?

AlexicaWright · 2025-06-13T12:41:30Z

modules/ROOT/pages/data-import/csv-import.adoc

-MERGE (c:Company {companyId: row.companyId})
-MERGE (e)-[r:WORKS_FOR]->(c)
----
+Note that the `FIELDTERMINATOR` wasn’t specified in the `LOAD CSV` clause because the default value is a comma. 


If you explain it on first mention, you can omit this. I added a suggestion for that.

modules/ROOT/pages/data-import/csv-import.adoc

AlexicaWright · 2025-06-13T12:49:41Z

modules/ROOT/pages/data-import/csv-import.adoc


-The `neo4j-admin database import` command can be used for the initial graph population only. 
+. Search for typos in the data and in the queries.


This seems like something to do once you know something is inaccurate?

AlexicaWright · 2025-06-13T12:53:45Z

modules/ROOT/pages/data-import/csv-import.adoc

-* Type conversion is possible by suffixing the name with indicators like `:INT`, `:BOOLEAN`, etc.
-
-For more details on this header format and the tool, see the section in the link:https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/neo4j-admin-import/[Neo4j Operations Manual -> Neo4j Admin import^] and the accompanying link:https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/[tutorial^].
+== Model your data


This section is very confusing. I suggest to delete it and link to the chapter on data modeling instead.

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

neo4j-docops-agent · 2025-06-13T14:37:17Z

This PR includes documentation updates
View the updated docs at https://neo4j-docs-getting-started-470.surge.sh

New pages:

Working with CSV files

Updated pages:

AlexicaWright

Second part reviewed! Looking great Lidia!!

AlexicaWright · 2025-06-13T12:57:32Z

modules/ROOT/pages/data-import/csv-files.adoc

+
+=== Field terminator
+
+Also known as delimiter, a field terminator is a character used to separate each field in a CSV file.


Just a thought, but how would the LOAD CSV command work with more than one field terminator?

AlexicaWright · 2025-06-13T13:01:12Z

modules/ROOT/pages/data-import/csv-files.adoc

+--
+
+Here, the movie and person data (including the IDs) is repeated in different rows every time new information about a particular actor's role is featured.
+This sort of duplication compromises the structure of the data when in a graph.


Suggested change

This sort of duplication compromises the structure of the data when in a graph.

This sort of duplication compromises the graph data structure.

AlexicaWright · 2025-06-13T14:52:33Z

modules/ROOT/pages/data-import/csv-files.adoc

+
+== File location
+
+When working with a CSV file in Neo4j, you can access it from a link:


Suggested change

When working with a CSV file in Neo4j, you can access it from a link:

When using the `LOAD CSV` command to load your CSV data, the CSV file is accessed via URL, either over the internet:

AlexicaWright · 2025-06-13T14:54:58Z

modules/ROOT/pages/data-import/csv-files.adoc

+RETURN row
+--
+
+Or from a local folder, if you use an on-premise deployment.


So this doesn't work in Aura? Maybe test to see?

AlexicaWright · 2025-06-13T14:57:21Z

modules/ROOT/pages/data-import/csv-files.adoc

+If you want to open your CSV file from another location, you need to change the link:https://neo4j.com/docs/operations-manual/2025.03/configuration/configuration-settings/#config_server.directories.import[`server.directories.import`] settings.
+
+[IMPORTANT]
+====


This is a very long admonition. Could it be shortened or rewritten as a regular paragraph (i.e. not an admonition)?

AlexicaWright · 2025-06-13T14:59:06Z

modules/ROOT/pages/data-import/csv-files.adoc

+To avoid this problem:
+
+* Check if headers match the data in the file.
+* Adjust formatting, columns, etc _before_ you import for a smooth process.


Suggested change

* Adjust formatting, columns, etc _before_ you import for a smooth process.

It is recommended to adjust formatting, columns, etc _before_ you import for a smooth process.

AlexicaWright · 2025-06-13T15:00:28Z

modules/ROOT/pages/data-import/csv-files.adoc

+. *Inconsistent line breaks*
+
+Ensure line breaks are consistent throughout the file.
+The recommendation is to use the Unix style for compatibility, in case you are using Linux.


Suggested change

The recommendation is to use the Unix style for compatibility, in case you are using Linux.

For Linux users, the recommendation is to use the Unix style for compatibility.

AlexicaWright · 2025-06-13T15:03:24Z

modules/ROOT/pages/data-import/csv-files.adoc

+
+* `*toInteger()*`: converts a value to an integer.
+* `*toFloat()*`: converts a value to a float (e.g. for monetary amounts).
+* `*datetime()*`: converts a value to a `DateTime`.


For consistency, we should either use code block for all data types or none. I suggest to use it for all of them. So string, float etc.

AlexicaWright · 2025-06-13T15:08:19Z

modules/ROOT/pages/data-import/csv-files.adoc

+----
+
+In this case, you should separate node and relationship creation on a separate part of the processing.
+For instance, instead of the following:


Suggested change

For instance, instead of the following:

For example, instead of the following:

Revamp of the LOAD CSV tutorial

53b40ea

lidiazuin added the cherry-pick-this-to-main label May 6, 2025

lidiazuin and others added 3 commits May 6, 2025 13:32

Replacing links to neo4j admin commands to tutorial

9222473

Merge branch 'dev' into csv-import

8893148

Update index.adoc

a66c3d5

AlexicaWright reviewed May 8, 2025

View reviewed changes

Apply suggestions from code review

b7e1c6b

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

AlexicaWright reviewed May 12, 2025

View reviewed changes

modules/ROOT/pages/data-import/csv-import.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-import/csv-import.adoc Outdated Show resolved Hide resolved

lidiazuin and others added 9 commits May 12, 2025 16:09

Apply suggestions from code review

cc382cf

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

removing redundant tutorial for neo4j desktop and finalizing pages

f2594ff

remove unused images

90ab936

reverting changes in apackage-lock

b9fb1ed

adding more info about modeling

d547d6a

Delete package-lock.json

a623bfb

Removing links to the Neo4j Desktop tutorial, adding link to the impo…

245de5e

…rt subpage and admonition for GraphAcademy

Merge branch 'csv-import' of github.com:lidiazuin/docs-getting-starte…

67fc1b1

…d into csv-import

fixes after review

8628e29

AlexicaWright reviewed Jun 3, 2025

View reviewed changes

lidiazuin and others added 2 commits June 4, 2025 11:14

Apply suggestions from code review

e240038

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

fixes after review

f63d9c4

AlexicaWright reviewed Jun 13, 2025

View reviewed changes

Apply suggestions from code review

9bf5d8a

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

AlexicaWright reviewed Jun 13, 2025

View reviewed changes


		=== Data format

		All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed.

	All data from the CSV file is read as a string, so you need to use `toInteger()`, `toFloat()`, `split()`, or similar functions to convert values, when needed.
	Neo4j reads all data from the CSV file as a string, for other data types, you need to use `toInteger()`, `toFloat()`, `toBoolean()`, or similar functions to convert data to the appropriate type.

	In this example, a comma (`,`) is used, but other characters, such as a tab (`\`) or a pipe (`\|`) also work and they can be blended:
	In this example, a comma (`,`) is used, but other characters, such as a tab (`\t`) or a pipe (`\|`) also work and they can be blended:

	Suppose you use xref:#_converting_data_values[the preceding companies.csv file], and now you have a file that contains people and which companies they work for:
	Suppose that you have another file that contains people and which companies they work for using a reference to the xref:#_converting_data_values[companies.csv file:

	You should also separate node and relationship creation on a separate processing.
	To load these two files and create the appropriate relationships between the people from the `people.csv` file with the companies they work for in the `companies.csv` file, you need to load them both and first create nodes from the files, and then create the relationships between them.
	To make this process more efficient, it is recommended to separate these tasks, i.e. create the nodes in one clause per file, and then a separate clause to create the relationships.

		* xref:data-import/csv-files.adoc#_cleaning_up[Cleaning up CSV files]: see how to use the `LOAD CSV` command to clean up the file while importing.
		* xref:data-import/csv-files.adoc#_optimization[Optimization]: improve performance when working with large amounts of data or complex loading.

		Before loading the file, you need to first create an link:https://neo4j.com/product/auradb/[Aura instance] or choose a link:{docs-home}/deployment-options[deployment of your choice].
		Then, you can load the file using `LOAD CSV` using the following command:

-Before loading the file, you need to first create an link:https://neo4j.com/product/auradb/[Aura instance] or choose a link:{docs-home}/deployment-options[deployment of your choice].
-Then, you can load the file using `LOAD CSV` using the following command:
+The `LOAD CSV` command can be used to load data into any deployment of Neo4j, whether it is an  link:https://neo4j.com/product/auradb/[Aura instance] or a local installation.
+See link:{docs-home}/deployment-options[deployment options] for information.
+The command looks like this:

---
+--
+If you include the optional `WITH HEADERS`, the first line of the CSV file is treated as a header and each row is treated as a map of key-value pairs rather than a list of values.
+`FROM` lets you specify the location whether it is local or over the internet and it cannot be omitted.
+`AS alias` names each row for reference.
+The default field terminator in CSV files is the comma, but others are supported and can be specified using the parsing option `FIELDTERMINATOR`.


		The `neo4j-admin database import` command can be used for the initial graph population only.
		. Search for typos in the data and in the queries.


		=== Field terminator

		Also known as delimiter, a field terminator is a character used to separate each field in a CSV file.

	This sort of duplication compromises the structure of the data when in a graph.
	This sort of duplication compromises the graph data structure.


		== File location

		When working with a CSV file in Neo4j, you can access it from a link:

	When working with a CSV file in Neo4j, you can access it from a link:
	When using the `LOAD CSV` command to load your CSV data, the CSV file is accessed via URL, either over the internet:

	* Adjust formatting, columns, etc _before_ you import for a smooth process.

	It is recommended to adjust formatting, columns, etc _before_ you import for a smooth process.

	The recommendation is to use the Unix style for compatibility, in case you are using Linux.
	For Linux users, the recommendation is to use the Unix style for compatibility.

	For instance, instead of the following:
	For example, instead of the following:

Revamp of the LOAD CSV tutorial #470

Are you sure you want to change the base?

Revamp of the LOAD CSV tutorial #470

Uh oh!

Conversation

lidiazuin commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexicaWright left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexicaWright left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexicaWright left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexicaWright Jun 13, 2025 • edited by lidiazuin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neo4j-docops-agent commented Jun 13, 2025

Uh oh!

AlexicaWright left a comment

Choose a reason for hiding this comment

lidiazuin commented May 6, 2025 •

edited

Loading

AlexicaWright Jun 13, 2025 •

edited by lidiazuin

Loading