Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException when reading file from sftp #65

Open
sslavian812 opened this issue Jun 25, 2019 · 9 comments
Open

NullPointerException when reading file from sftp #65

sslavian812 opened this issue Jun 25, 2019 · 9 comments

Comments

@sslavian812
Copy link

I'm trying to read a csv file from sftp server and convert to dataframe.
The file is in /ppreports/outgoing/MY.CSV. I can see it when logging in with a GUI.

val df = spark.read
            .format("com.springml.spark.sftp")
            .option("host", HOST)
            .option("username", USER)
            .option("password", PASSWORD)
            .option("fileType", "csv")
            .option("inferSchema", "false")
            .option("createDF", "false")
            .load("/ppreports/outgoing/MY.CSV")

I get

java.lang.NullPointerException
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:453)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:291)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:277)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:212)

If I try to read non-existing file:

val df = spark.read
            .format("com.springml.spark.sftp")
            .option("host", HOST)
            .option("username", USER)
            .option("password", PASSWORD)
            .option("fileType", "csv")
            .option("inferSchema", "false")
            .option("createDF", "false")
            .load("/ppreports/outgoing/non-existing.CSV")

Then I'll predictable get file not found:

2: No such file or directory
	at com.jcraft.jsch.ChannelSftp.throwStatusError(ChannelSftp.java:2833)
	at com.jcraft.jsch.ChannelSftp._stat(ChannelSftp.java:2185)
	at com.jcraft.jsch.ChannelSftp._stat(ChannelSftp.java:2202)
	at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:914)
	at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:874)
	at com.springml.sftp.client.SFTPClient.copyInternal(SFTPClient.java:168)
	at com.springml.sftp.client.SFTPClient.copy(SFTPClient.java:74)
	at com.springml.spark.sftp.DefaultSource.copy(DefaultSource.scala:212)
	at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:80)
	at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:41)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:291)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:277)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:212)

Thus, I conclude that file is there and spark-sftp finds it, but fails to download.
What should I do?

@samuel-pt
Copy link
Contributor

@sslavian812 - What content is present in the file? Can you check whether it is valid CSV file?

Also try to use the latest spark-sftp connector as we solved similar issue like this

@sslavian812
Copy link
Author

sslavian812 commented Jun 27, 2019

Hi @samuel-pt ,thank you for the answer.
I'm still struggling with NPE while reading csv file.

whether it is valid CSV file?

It's a text file, a regular csv. I can download it with curl and open on local machine.

latest spark-sftp

Upgraded from 1.3 to com.springml:spark-sftp_2.11:1.1.5, didn't help.

Seems, that I'll have to implement something custom, say download csv with apache-commons-vfs, upload it to s3 and then read into dataframe using standard api.

@AJAnujsharma
Copy link

Yes this is a issue, Even i'm facing it too..
java.lang.NullPointerException --> When reading a exsisting fie and even upgrading - com.springml:spark-sftp_2.11:1.1.5 didnt helped

Let me know if any other option can be implemented

@vejeta
Copy link

vejeta commented Sep 24, 2019

Can you provide a sample of the file to be tested?

@AJAnujsharma
Copy link

You can use any file either csv or txt. How it does perform is It is trying to perform two things at a same time

  1. Copy the file from sftp to temp location in dbfs
  2. Reading from dbfs

thats why it is failing which is a bug.

If u do a try catch block
try(read the file but do not create dataframe){
copy the data in the dbfs
}catch(once its copied u can load the file to dataframe)

this is a temprory solution but this is a bug

@yuvapraveen
Copy link

@AJAnujsharma Can you please provide the code snippet that you used to sftp from databricks. Not sure I get what you are doing in your catch block. Thanks in advance.

@sauerch91
Copy link

sauerch91 commented Mar 3, 2020

@AJAnujsharma Can you please provide the code snippet that you used to sftp from databricks. Not sure I get what you are doing in your catch block. Thanks in advance.

That works for me! example sftp server used here
// try/except is the workaround
try:
df = (spark
.read
.format("com.springml.spark.sftp")
.option("host", "test.rebex.net")
.option("username", sftp_user)
.option("password", sftp_password)
.option("fileType", "txt")
.option("tempLocation", "/dbfs/tmp/")
.load("/pub/example/readme.txt"))
except:
df = (spark
.read
.format("com.springml.spark.sftp")
.option("host", "test.rebex.net")
.option("username", sftp_user)
.option("password", sftp_password)
.option("fileType", "txt")
.option("tempLocation", "/tmp/")
.load("/pub/example/readme.txt"))

@yuvapraveen
Copy link

@sauerch91 were you able to write to a sftp server? If so can you give me the snippet please.. seems like the library cannot ready from the temporary dbfs location..

@DataBach-maker
Copy link

I am with @yuvapraveen . Does someone have a working example ? Struggling to write to SFTP server and get NPE with newest version 1.0.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants