[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835

Raja10D · 2025-01-27T11:22:28Z

What needs to happen?

I am writing to report an issue we are encountering when running a Hop pipeline on a Spark Standalone cluster.

Our Spark-Submit command works perfectly in a local environment with the following configuration:

spark-submit
--master local[4]
--class org.apache.hop.beam.run.MainBeam
--driver-java-options '-DPROJECT_HOME=/home/decoders/hop_test_2_8_0_backup/hop/config/projects/default'
/home/decoders/hop_test_2_8_0_backup/hop/fat-jar.jar
/home/decoders/hop_test_2_8_0_backup/hop/Creating_5_payoutAmt_source_Dec23.hpl
/home/decoders/hop_test_2_8_0_backup/hop/metadata.json
beam_runner_spark

However, when we replace local[4] with the Spark Standalone cluster URL (spark://10d154:7077), the pipeline fails to run in our real-time production environment. We have verified that all necessary dependencies are included in the fat-jar.jar and reviewed the configuration settings, but the issue persists.

Could you please provide guidance on any additional files or settings that might be required to run the Hop pipeline successfully on a Spark Standalone cluster?

Thank you for your assistance.

Issue Priority

Priority: 1

Issue Component

Component: Other, Component: Hop Run

mattcasters · 2025-01-27T13:50:08Z

Hi @Raja10D, please note that you need to submit this on the master of your production cluster and that you need to specify the master typically as spark://host:port, also in the Hop metadata element beam_runner_spark.
HTH

Raja10D · 2025-01-27T14:15:58Z

Hi @Raja10D, please note that you need to submit this on the master of your production cluster and that you need to specify the master typically as spark://host:port, also in the Hop metadata element beam_runner_spark. HTH

In this spark://host:port we have to mention our port right if we are running on 8080 (master port ) means spark://host:8080
Am I right

mattcasters · 2025-01-27T14:18:34Z

You can see the address at the top of the Spark web console.

mattcasters · 2025-01-27T14:20:35Z

More tips can be found at the documentation of the Hop Spark Pipeline engine.

Raja10D · 2025-01-28T05:48:02Z

I have inserted my spark master web screenshot, and my beam_runner_spark configuration screenshot and spark-submit command

spark-submit
--master spark://10d154:7077
--class org.apache.hop.beam.run.MainBeam
--driver-java-options '-DPROJECT_HOME=/home/decoders/hop_test_2_8_0_backup/hop/config/projects/default'
/home/decoders/hop_test_2_8_0_backup/hop/fat-jar.jar
/home/decoders/hop_test_2_8_0_backup/hop/Creating_5_payoutAmt_source_Dec23.hpl
/home/decoders/hop_test_2_8_0_backup/hop/metadata.json
beam_runner_spark

Step by Step I followed the document, can you guide me where am I making mistake.
It will be very helpful.
Thankyou

bamaer · 2025-01-28T07:42:58Z

what do the relevant intries the /etc/hosts files on your master and other cluster hosts look like?

Raja10D · 2025-01-28T09:25:40Z

My /etc/hosts

127.0.0.1 localhost
127.0.1.1 decoders-Latitude-7480

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.125.1.140 master-node worker-node

My master and worker are in my same machine
My hop - 2.8.0
My beam - 2.50
My Spark - 3.4.4
Java 11

mattcasters · 2025-01-28T12:43:05Z

So no entry for hostname 10d154?

Raja10D · 2025-01-29T10:37:35Z

So no entry for hostname 10d154?

Adding the hostname entry to the /etc/hosts file resolved the problem, and the Hop pipeline is now running smoothly on our Spark Standalone cluster.

Actual content in my /etc/hosts

127.0.0.1 localhost
127.0.1.1 decoders-Latitude-7480

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.125.1.140 master-node worker-node

192.125.1.140 10d154

Your guidance was invaluable, and I appreciate your support. Thank you once again for your assistance!

Thanks all..

mattcasters · 2025-01-29T10:41:37Z

You're welcome @Raja10D , I'm glad you managed to run on Spark. Can you resolve this issue and the next?

Raja10D · 2025-01-29T10:53:59Z

You're welcome @Raja10D , I'm glad you managed to run on Spark. Can you resolve this issue and the next?

I'm glad to report that our next step is to run the pipeline in the production environment(in server). We're preparing to ensure that everything runs smoothly there as well.

Are there any specific configurations or considerations we should be aware of before moving forward with the production deployment? Your insights would be invaluable in helping us achieve a seamless transition

mattcasters · 2025-01-29T10:59:37Z

Just look at the transform specific limitations detailed over here: https://hop.apache.org/manual/latest/pipeline/beam/getting-started-with-beam.html#_universal_transforms

In general, if it runs on your local 1 node Spark, it should run on a larger cluster as well.
Good luck!

Raja10D added awaiting triage task labels Jan 27, 2025

github-actions bot added P1 Critical Issue Hop Run Other labels Jan 27, 2025

mattcasters mentioned this issue Jan 29, 2025

Issue with Running Hop Pipeline on Spark Standalone Cluster #4836

Closed

mattcasters closed this as completed Jan 29, 2025

hansva removed the awaiting triage label Jan 29, 2025

hansva added this to the Not Applicable - 2025 milestone Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835

[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835

Raja10D commented Jan 27, 2025

mattcasters commented Jan 27, 2025

Raja10D commented Jan 27, 2025

mattcasters commented Jan 27, 2025

mattcasters commented Jan 27, 2025

Raja10D commented Jan 28, 2025

bamaer commented Jan 28, 2025

Raja10D commented Jan 28, 2025 •

edited

Loading

mattcasters commented Jan 28, 2025

Raja10D commented Jan 29, 2025

mattcasters commented Jan 29, 2025

Raja10D commented Jan 29, 2025

mattcasters commented Jan 29, 2025

[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835

[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835

Comments

Raja10D commented Jan 27, 2025

What needs to happen?

Issue Priority

Issue Component

mattcasters commented Jan 27, 2025

Raja10D commented Jan 27, 2025

mattcasters commented Jan 27, 2025

mattcasters commented Jan 27, 2025

Raja10D commented Jan 28, 2025

bamaer commented Jan 28, 2025

Raja10D commented Jan 28, 2025 • edited Loading

mattcasters commented Jan 28, 2025

Raja10D commented Jan 29, 2025

Actual content in my /etc/hosts

mattcasters commented Jan 29, 2025

Raja10D commented Jan 29, 2025

mattcasters commented Jan 29, 2025

Raja10D commented Jan 28, 2025 •

edited

Loading