From 9e8c7cb3453bff3a1a6a7cfaeafaaf7c74995d66 Mon Sep 17 00:00:00 2001 From: Keke Yi <40977455+yikeke@users.noreply.github.com> Date: Fri, 3 Jul 2020 10:12:29 +0800 Subject: [PATCH] Update grafana-monitor-best-practices.md and tispark doc (#3073) * Update grafana-monitor-best-practices.md * align https://github.com/pingcap/docs-cn/pull/3257 * Apply suggestions from code review Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> Co-authored-by: TomShawn <41534398+TomShawn@users.noreply.github.com> Co-authored-by: ti-srebot <66930949+ti-srebot@users.noreply.github.com> --- best-practices/grafana-monitor-best-practices.md | 2 +- tispark-overview.md | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/best-practices/grafana-monitor-best-practices.md b/best-practices/grafana-monitor-best-practices.md index 3107987e24aa6..7388876d96595 100644 --- a/best-practices/grafana-monitor-best-practices.md +++ b/best-practices/grafana-monitor-best-practices.md @@ -15,7 +15,7 @@ When you [deploy a TiDB cluster using TiDB Ansible](/online-deployment-using-ans ![The monitoring architecture in the TiDB cluster](/media/prometheus-in-tidb.png) -For TiDB 2.1.3 or later versions, TiDB monitoring uses the pull method instead of the push method which is used in previous versions. It is a good adjustment with the following benefits: +For TiDB 2.1.3 or later versions, TiDB monitoring supports the pull method. It is a good adjustment with the following benefits: - There is no need to restart the entire TiDB cluster if you need to migrate Prometheus. Before adjustment, migrating Prometheus requires restarting the entire cluster because the target address needs to be updated. - You can deploy 2 separate sets of Grafana + Prometheus monitoring platforms (not highly available) to prevent a single point of monitoring. To do this, execute the deployment command of TiDB ansible twice with different IP addresses. diff --git a/tispark-overview.md b/tispark-overview.md index edaaf9b5d0406..535f3bf95091d 100644 --- a/tispark-overview.md +++ b/tispark-overview.md @@ -20,7 +20,8 @@ TiSpark is an OLAP solution that runs Spark SQL directly on TiKV, the distribute + TiSpark integrates with Spark Catalyst Engine deeply. It provides precise control of the computing, which allows Spark read data from TiKV efficiently. It also supports index seek, which improves the performance of the point query execution significantly. + It utilizes several strategies to push down the computing to reduce the size of dataset handling by Spark SQL, which accelerates the query execution. It also uses the TiDB built-in statistical information for the query plan optimization. + From the data integration point of view, TiSpark and TiDB serve as a solution for running both transaction and analysis directly on the same platform without building and maintaining any ETLs. It simplifies the system architecture and reduces the cost of maintenance. -+ also, you can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using TiSpark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ++ You can deploy and utilize tools from the Spark ecosystem for further data processing and manipulation on TiDB. For example, using TiSpark for data analysis and ETL; retrieving data from TiKV as a machine learning data source; generating reports from the scheduling system and so on. ++ Also, TiSpark supports distributed writes to TiKV. Compared to using Spark combined with JDBC to write to TiDB, distributed writes to TiKV can implement transactions (either all data are written successfully or all writes fail), and the writes are faster. ## Environment setup @@ -90,6 +91,8 @@ spark-shell --jars $TISPARK_FOLDER/tispark-${name_with_version}.jar If you do not have a Spark cluster, we recommend using the standalone mode. To use the Spark Standalone model, you can simply place a compiled version of Spark on each node of the cluster. If you encounter problems, see its [official website](https://spark.apache.org/docs/latest/spark-standalone.html). And you are welcome to [file an issue](https://github.com/pingcap/tispark/issues/new) on our GitHub. +If you are using TiDB Ansible to deploy a TiDB cluster, you can also use TiDB Ansible to deploy a Spark standalone cluster, and TiSpark is also deployed at the same time. + #### Download and install You can download [Apache Spark](https://spark.apache.org/downloads.html)