Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segments cannot be loaded from HDFS deep storage when datasource name has special characters #9788

Open
jon-wei opened this issue Apr 29, 2020 · 0 comments

Comments

@jon-wei
Copy link
Contributor

jon-wei commented Apr 29, 2020

Using non-ASCII chars in datasource names with HDFS storage results in segments failing to load:

java.io.FileNotFoundException: File does not exist: /druid/segments/wikipedia_hadoop_index_test_728672c5-affc-4116-9792-d81e2599eaf4%20Россия%20한국%20中国!%3F/20130831T000000.000Z_20130901T000000.000Z/2020-04-29T02_21_33.533Z/0_index.zip
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_232]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_232]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_232]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_232]
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:849) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:836) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:825) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:285) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:270) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1064) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$2.openInputStream(HdfsDataSegmentPuller.java:124) ~[?:?]
	at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298) ~[?:?]
	at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:249) ~[?:?]
	at org.apache.druid.utils.CompressionUtils.lambda$unzip$1(CompressionUtils.java:182) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:181) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:243) ~[?:?]
	at org.apache.druid.storage.hdfs.HdfsLoadSpec.loadSegment(HdfsLoadSpec.java:57) ~[?:?]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:242) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:230) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:191) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:163) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:130) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:212) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:171) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:258) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:306) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:49) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at org.apache.druid.server.coordination.ZkCoordinator.lambda$childAdded$2(ZkCoordinator.java:147) ~[druid-server-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_232]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /druid/segments/wikipedia_hadoop_index_test_728672c5-affc-4116-9792-d81e2599eaf4%20Россия%20한국%20中国!%3F/20130831T000000.000Z_20130901T000000.000Z/2020-04-29T02_21_33.533Z/0_index.zip
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1435) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1345) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) ~[hadoop-common-2.8.5.jar:?]
	at com.sun.proxy.$Proxy67.getBlockLocations(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_232]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_232]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_232]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) ~[hadoop-common-2.8.5.jar:?]
	at com.sun.proxy.$Proxy68.getBlockLocations(Unknown Source) ~[?:?]
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:847) ~[hadoop-hdfs-client-2.8.5.jar:?]
	... 37 more

This is likely similar to the issues fixed in #6761.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant