Skip to content
This repository has been archived by the owner on Apr 22, 2022. It is now read-only.

Divolte not writing to HDFS #271

Open
tatianafrank opened this issue Jul 17, 2019 · 5 comments
Open

Divolte not writing to HDFS #271

tatianafrank opened this issue Jul 17, 2019 · 5 comments

Comments

@tatianafrank
Copy link

tatianafrank commented Jul 17, 2019

My kafka sink is working but my HDFS sink is not working. Im using hdfs 2.0 so that might be why? Ive got divolte running in a docker container and a hadoop cluster running in the same docker-compose network which I got from https://github.com/big-data-europe/docker-hadoop

here is are the relevant parts of my divolte-collector.conf (some parts stripped for brevity):

hdfs {
      enabled = true
      enabled = ${?DIVOLTE_HDFS_ENABLED}
      threads = 2
      buffer_size = 1048576

      client {
        fs.DEFAULT_FS = "hdfs://localhost:9870"
      }
    }

mappings {
    my_mapping = {
      schema_file = "/opt/divolte/divolte-collector/conf/DivolteRecord.avsc"
      mapping_script_file = "/opt/divolte/divolte-collector/conf/mapping.groovy"
      sources = [browser]
      sinks = [divolte_kafka_sink, divolte_hdfs_sink]
    }
  }

  sinks {
    divolte_hdfs_sink = {
      type = hdfs
      file_strategy {
        sync_file_after_records = 1000
        sync_file_after_records = ${?DIVOLTE_HDFS_SINK_SYNC_NR_OF_RECORDS}
        sync_file_after_duration = 30 minutes
        sync_file_after_duration = ${?DIVOLTE_HDFS_SINK_SYNC_DURATION}
        working_dir = /tmp/working
        working_dir = ${?DIVOLTE_HDFS_SINK_WORKING_DIR}
        publish_dir = /tmp/processed
        publish_dir = ${?DIVOLTE_HDFS_SINK_PUBLISH_DIR}
      }
    }

For fs.DEFAULT_FS, Ive tried hdfs://localhost:9870 and hdfs://namenode:9870 (namenode is the name of the hdfs namenode container running in the same docker network)

@friso
Copy link
Collaborator

friso commented Jul 18, 2019

Can you be a bit more specific about not working? Do you see any errors?

@tatianafrank
Copy link
Author

Here is the error:

[main] WARN [NativeCodeLoader]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable divolte | Exception in thread "main" 2019-07-29 15:20:24.908Z [main] ERROR [HdfsFileManager]: Could not initialize HDFS filesystem or failed to check for existence of publish and / or working directories.. divolte | org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs" divolte | at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3332)

@tatianafrank
Copy link
Author

Then I added fs.file.impl = "org.apache.hadoop.fs.LocalFileSystem" and fs.hdfs.impl = "org.apache.hadoop.hdfs.DistributedFileSystem" to my hdfs configuration in divolte and now im getting a different error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(Ljava/lang/String;)Ljava/net/InetSocketAddress; divolte | at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:99)

@tatianafrank
Copy link
Author

according to this thread (https://stackoverflow.com/questions/45460909/accessing-hdfs-in-java-throws-error) there is an issue with dependency versions in divolte but im not sure who to change that in divolte...

@JulienSerouart
Copy link

This pull request may help : #244

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants