Skip to content

SPARKNLP-732 Unify all externally supported file systems and cloud access #13919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5 changes: 4 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,10 @@ lazy val utilDependencies = Seq(
exclude ("com.google.code.findbugs", "annotations")
exclude ("org.slf4j", "slf4j-api"),
gcpStorage,
greex)
greex,
azureIdentity,
azureStorage
)

lazy val typedDependencyParserDependencies = Seq(junit)

Expand Down
6 changes: 3 additions & 3 deletions docs/api/com/johnsnowlabs/client/CredentialParams.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
<title>Spark NLP 5.0.2 ScalaDoc - com.johnsnowlabs.client.CredentialParams</title>
<meta name="description" content="Spark NLP 5.0.2 ScalaDoc - com.johnsnowlabs.client.CredentialParams" />
<meta name="keywords" content="Spark NLP 5.0.2 ScalaDoc com.johnsnowlabs.client.CredentialParams" />
<title>Spark NLP 5.1.0 ScalaDoc - com.johnsnowlabs.client.aws.CredentialParams</title>
<meta name="description" content="Spark NLP 5.1.0 ScalaDoc - com.johnsnowlabs.client.aws.CredentialParams" />
<meta name="keywords" content="Spark NLP 5.1.0 ScalaDoc com.johnsnowlabs.client.aws.CredentialParams" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />


Expand Down
102 changes: 78 additions & 24 deletions examples/python/training/english/dl-ner/mfa_ner_graphs_s3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,18 @@
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/training/english/dl-ner/mfa_ner_graphs_s3.ipynb)\n",
"\n",
"# Configuring MFA for S3 access"
"# Training NER with Graphs in S3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Spark NLP you can configure the location to store TF Graphs used while training NER models. Starting at Spark NLP 5.1.0, you can set a GCP Storage URI, or Azure Storage URI or DBFS paths like HDFS or Databricks FS.\n",
"\n",
"In this notebook, we are going to see the steps required to use an external S3 URI to store the logs of traning an NER model\n",
"\n",
"To do this, we need to configure the spark session with the required settings for Spark NLP and Spark ML."
]
},
{
Expand Down Expand Up @@ -43,6 +54,68 @@
"print(\"Spark NLP version\", sparknlp.version())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To configure MFA we just need to define the requires values in spark properties as show below. Look an example to get temporal credentials [here](https://github.com/JohnSnowLabs/spark-nlp/blob/master/scripts/aws_tmp_credentials.sh) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Enter your AWS Access Key:\")\n",
"MY_ACCESS_KEY = input()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Enter your AWS Secret Key:\")\n",
"MY_SECRET_KEY = input()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Enter your AWS Session Key:\")\n",
"MY_SESSION_KEY = input()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Enter your AWS Region:\")\n",
"MY_AWS_REGION"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#S3 Storage configuration\n",
"s3_params = {\n",
" \"spark.jsl.settings.aws.credentials.access_key_id\": MY_ACCESS_KEY,\n",
" \"spark.jsl.settings.aws.credentials.secret_access_key\": MY_SECRET_KEY,\n",
" \"spark.jsl.settings.aws.credentials.session_token\": MY_SESSION_KEY,\n",
" \"spark.jsl.settings.aws.region\": MY_AWS_REGION\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -83,27 +156,7 @@
}
],
"source": [
"spark = sparknlp.start()\n",
"spark"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To configure MFA we just need to define the requires values in spark properties as show below. Look an example to get temporal credentials [here](https://github.com/JohnSnowLabs/spark-nlp/blob/master/scripts/aws_tmp_credentials.sh) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spark.conf.set(\"spark.jsl.settings.aws.credentials.access_key_id\", \"MY_ACCESS_KEY_ID\")\n",
"spark.conf.set(\"spark.jsl.settings.aws.credentials.secret_access_key\", \"MY_SECRET_ACCESS_KEY_ID\")\n",
"spark.conf.set(\"spark.jsl.settings.aws.credentials.session_token\", \"MY_SESSION_TOKEN\")\n",
"spark.conf.set(\"spark.jsl.settings.aws.region\", \"MY_REGION\")"
"spark = sparknlp.start(params=s3_params)"
]
},
{
Expand Down Expand Up @@ -201,7 +254,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -214,7 +267,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Loading