Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB Input Plugin unexcepted output #9555

Closed
TonyXiaoCui opened this issue Jul 29, 2021 · 5 comments · Fixed by #11314
Closed

MongoDB Input Plugin unexcepted output #9555

TonyXiaoCui opened this issue Jul 29, 2021 · 5 comments · Fixed by #11314
Labels
area/mongodb bug unexpected problem or unintended behavior

Comments

@TonyXiaoCui
Copy link

mongo v3.2.10
telegraf 1.19

I choose telegraf+prom+grafana to build my montior
I want to collect topstats of collections
when I check top_stat I found unexcepted output
1、one collection with node_type="PRI" but hostname was cfgsrv ip ,not the primary ip
2、cfgsrv has three rs ,so the output also become three
however my telegraf config do not include cfgsrv ips

I learn that the data is from {top:1} but how do it compare its hostname

@TonyXiaoCui TonyXiaoCui added the bug unexpected problem or unintended behavior label Jul 29, 2021
@lukhas
Copy link

lukhas commented Jul 30, 2021

Same kind of new unexpected behavior here on 1.19.2, while running on a delayed secondary, I get node_type=PRI, presumably reporting the status of the PRIMARY member of the cluster:

root@late2:~# cat /etc/telegraf/telegraf.d/mongodb.conf
[[inputs.mongodb]]
  servers = ["mongodb://127.0.0.1:27017"]
  gather_perdb_stats = true

Excerpt from rs.config():

		{
			"_id" : 42,
			"host" : "late2:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 0,
			"tags" : {
				
			},
			"slaveDelay" : NumberLong(86400),
			"votes" : 0
		},

And yet, telegraf sends:

> mongodb,host=late2,hostname=127.0.0.1:27017,node_type=PRI,rs_name=lichess-rs0 active_reads=1i,active_writes=0i,aggregate_command_failed=0i,aggregate_command_total=224884094i,assert_msg=0i,assert_regular=0i,assert_rollovers=0i,assert_user=1085317i,assert_warning=0i,available_reads=127i,available_writes=128i,commands=44658597008i,commands_per_sec=2274i,connections_available=51040i,connections_current=160i,connections_total_created=242705i,count_command_failed=0i,count_command_total=8076090577i,cursor_no_timeout=0i,cursor_no_timeout_count=0i,cursor_pinned=0i,cursor_pinned_count=1i,cursor_timed_out=0i,cursor_timed_out_count=393329319i,cursor_total=15i,cursor_total_count=17213i,delete_command_failed=0i,delete_command_total=209781533i,deletes=419552236i,deletes_per_sec=114i,distinct_command_failed=0i,distinct_command_total=4033627632i,document_deleted=126426616i,document_inserted=1253618366i,document_returned=112488843511i,document_updated=12846861806i,find_and_modify_command_failed=255i,find_and_modify_command_total=1269857197i,find_command_failed=5i,find_command_total=56721750838i,flushes=92229i,flushes_per_sec=0i,flushes_total_time_ns=3758998488000000i,get_more_command_failed=0i,get_more_command_total=25083354195i,getmores=25083354194i,getmores_per_sec=833i,insert_command_failed=0i,insert_command_total=1236361911i,inserts=1256014897i,inserts_per_sec=52i,jumbo_chunks=0i,latency_commands=2753967494006i,latency_commands_count=31062811019i,latency_reads=18614661745404i,latency_reads_count=94139676023i,latency_writes=4737412538863i,latency_writes_count=14643728288i,member_status="PRI",net_in_bytes=4152637i,net_in_bytes_count=88472109634107i,net_out_bytes=4275456i,net_out_bytes_count=98014337610828i,open_connections=160i,operation_scan_and_order=1886718042i,operation_write_conflicts=60967361i,page_faults=6842i,percent_cache_dirty=2.1,percent_cache_used=80,queries=56721750836i,queries_per_sec=2575i,queued_reads=0i,queued_writes=0i,repl_apply_batches_num=343i,repl_apply_batches_total_millis=184i,repl_apply_ops=6722i,repl_buffer_count=0i,repl_buffer_size_bytes=0i,repl_commands=0i,repl_commands_per_sec=0i,repl_deletes=5006i,repl_deletes_per_sec=0i,repl_executor_pool_in_progress_count=0i,repl_executor_queues_network_in_progress=0i,repl_executor_queues_sleepers=7i,repl_executor_unsignaled_events=0i,repl_getmores=0i,repl_getmores_per_sec=0i,repl_inserts=1i,repl_inserts_per_sec=0i,repl_lag=0i,repl_network_bytes=428682i,repl_network_getmores_num=756i,repl_network_getmores_total_millis=9363i,repl_network_ops=2708i,repl_oplog_window_sec=129140i,repl_queries=0i,repl_queries_per_sec=0i,repl_state=1i,repl_updates=1714i,repl_updates_per_sec=0i,resident_megabytes=79325i,state="PRIMARY",storage_freelist_search_bucket_exhausted=0i,storage_freelist_search_requests=0i,storage_freelist_search_scanned=0i,tcmalloc_central_cache_free_bytes=2627170240i,tcmalloc_current_allocated_bytes=79543192920i,tcmalloc_current_total_thread_cache_bytes=88278144i,tcmalloc_heap_size=106448125952i,tcmalloc_max_total_thread_cache_bytes=1073741824i,tcmalloc_pageheap_commit_count=3231612432i,tcmalloc_pageheap_committed_bytes=82928218112i,tcmalloc_pageheap_decommit_count=3188095833i,tcmalloc_pageheap_free_bytes=669548544i,tcmalloc_pageheap_reserve_count=17095i,tcmalloc_pageheap_scavenge_count=3188095833i,tcmalloc_pageheap_total_commit_bytes=5007844158550016i,tcmalloc_pageheap_total_decommit_bytes=5007761230331904i,tcmalloc_pageheap_total_reserve_bytes=106448125952i,tcmalloc_pageheap_unmapped_bytes=23519907840i,tcmalloc_spinlock_total_delay_ns=17454082778400i,tcmalloc_thread_cache_free_bytes=88264576i,tcmalloc_total_free_bytes=2715467200i,tcmalloc_transfer_cache_free_bytes=19328i,total_available=0i,total_created=0i,total_docs_scanned=382088105447i,total_in_use=0i,total_keys_scanned=570086922602i,total_refreshing=0i,total_tickets_reads=128i,total_tickets_writes=128i,ttl_deletes=481761547i,ttl_deletes_per_sec=0i,ttl_passes=155038i,ttl_passes_per_sec=0i,update_command_failed=0i,update_command_total=11927759674i,updates=11928279599i,updates_per_sec=561i,uptime_ns=9354176505000000i,version="4.4.5",vsize_megabytes=103944i,wtcache_app_threads_page_read_count=53111120533i,wtcache_app_threads_page_read_time=2410027540889i,wtcache_app_threads_page_write_count=11155918206i,wtcache_bytes_read_into=1123123212170522i,wtcache_bytes_written_from=253320223993913i,wtcache_current_bytes=80137150987i,wtcache_internal_pages_evicted=6390688874i,wtcache_max_bytes_configured=100166270976i,wtcache_modified_pages_evicted=8560300759i,wtcache_pages_evicted_by_app_thread=0i,wtcache_pages_queued_for_eviction=57869783433i,wtcache_pages_read_into=53141080362i,wtcache_pages_requested_from=1618805424364i,wtcache_pages_written_from=14910353666i,wtcache_server_evicting_pages=0i,wtcache_tracked_dirty_bytes=2076387999i,wtcache_unmodified_pages_evicted=48137684146i,wtcache_worker_thread_evictingpages=57795868572i 1627626282000000000

@arnnow
Copy link

arnnow commented Aug 25, 2021

Hi,

I have the same kind of behaviour with telegraf 1.19.3.
rollback to 1.19.1 fixed it

Here are the data for a secondary node from influxDB when the rollback was made.

> select connections_current,node_type from mongodb where host = 'mymongohost' and time < '2021-08-25 15:15:00' and time > '2021-08-25 15:03:00'
name: mongodb
time                connections_current node_type
----                ------------------- ---------
1629903810000000000 1200                PRI
1629903840000000000 1204                PRI
1629903870000000000 1208                PRI
1629903900000000000 1208                PRI
1629904410000000000 1135                SEC
1629904440000000000 1138                SEC
1629904470000000000 1139                SEC

mongo is in v4.2.14

@alx75
Copy link

alx75 commented Aug 25, 2021

We do have the same problem. I skimmed through the code and I wonder if the problem is because in the commit that introduced the new mongo driver (https://github.com/influxdata/telegraf/pull/9493/files) we do not set the RunCmdOptions.ReadPreference and by default it's nil which means primary read (runcmdoptions.go) .

I didn't get the time yet to put something together and test the theory

@arnnow
Copy link

arnnow commented Aug 25, 2021

I just switch the configuration to the following :

servers = ["mongodb://127.0.0.1:27017/?connect=direct"]

as per noted here : https://pkg.go.dev/go.mongodb.org/mongo-driver/mongo#Connect

Seems to do the trick.
@alx75 same conclusion i guess.

@alx75
Copy link

alx75 commented Aug 25, 2021

IMO it would be worth putting this info in the mongodb input README.

This is also worth being mentioned in the changelog since this is a breaking change.

powersj added a commit to powersj/telegraf that referenced this issue Jun 16, 2022
By default, the mongo-driver library will attempt to use the primary
node of a cluster. For monitoring with Telegraf, users may specify a
secondary node that they want to monitor. In these cases, the user will
need to specify a direct connection to that node.

The library has a number of options for how it connects, which include
primary, primary preferred, secondary, secondary preferred, and closest.
None of these are options to ensure that the specified node is used.

fixes: influxdata#11275
fixes: influxdata#9555
powersj added a commit to powersj/telegraf that referenced this issue Jun 16, 2022
By default, the mongo-driver library will attempt to use the primary
node of a cluster. For monitoring with Telegraf, users may specify a
secondary node that they want to monitor. In these cases, the user will
need to specify a direct connection to that node.

The library has a number of options for how it connects, which include
primary, primary preferred, secondary, secondary preferred, and closest.
None of these are options to ensure that the specified node is used.

fixes: influxdata#11275
fixes: influxdata#9555
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mongodb bug unexpected problem or unintended behavior
Projects
None yet
4 participants