-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task]: Issue with Running Hop Pipeline on Spark Standalone Cluster #4835
Comments
Hi @Raja10D, please note that you need to submit this on the master of your production cluster and that you need to specify the master typically as |
In this spark://host:port we have to mention our port right if we are running on 8080 (master port ) means spark://host:8080 |
You can see the address at the top of the Spark web console. |
More tips can be found at the documentation of the Hop Spark Pipeline engine. |
I have inserted my spark master web screenshot, and my beam_runner_spark configuration screenshot and spark-submit command spark-submit Step by Step I followed the document, can you guide me where am I making mistake. |
what do the relevant intries the |
My /etc/hosts 127.0.0.1 localhost ::1 ip6-localhost ip6-loopback 192.125.1.140 master-node worker-node My master and worker are in my same machine |
So no entry for hostname |
Adding the hostname entry to the /etc/hosts file resolved the problem, and the Hop pipeline is now running smoothly on our Spark Standalone cluster. Actual content in my /etc/hosts127.0.0.1 localhost ::1 ip6-localhost ip6-loopback 192.125.1.140 master-node worker-node 192.125.1.140 10d154 Your guidance was invaluable, and I appreciate your support. Thank you once again for your assistance! Thanks all.. |
You're welcome @Raja10D , I'm glad you managed to run on Spark. Can you resolve this issue and the next? |
I'm glad to report that our next step is to run the pipeline in the production environment(in server). We're preparing to ensure that everything runs smoothly there as well. Are there any specific configurations or considerations we should be aware of before moving forward with the production deployment? Your insights would be invaluable in helping us achieve a seamless transition |
Just look at the transform specific limitations detailed over here: https://hop.apache.org/manual/latest/pipeline/beam/getting-started-with-beam.html#_universal_transforms In general, if it runs on your local 1 node Spark, it should run on a larger cluster as well. |
What needs to happen?
I am writing to report an issue we are encountering when running a Hop pipeline on a Spark Standalone cluster.
Our Spark-Submit command works perfectly in a local environment with the following configuration:
spark-submit
--master local[4]
--class org.apache.hop.beam.run.MainBeam
--driver-java-options '-DPROJECT_HOME=/home/decoders/hop_test_2_8_0_backup/hop/config/projects/default'
/home/decoders/hop_test_2_8_0_backup/hop/fat-jar.jar
/home/decoders/hop_test_2_8_0_backup/hop/Creating_5_payoutAmt_source_Dec23.hpl
/home/decoders/hop_test_2_8_0_backup/hop/metadata.json
beam_runner_spark
However, when we replace local[4] with the Spark Standalone cluster URL (spark://10d154:7077), the pipeline fails to run in our real-time production environment. We have verified that all necessary dependencies are included in the fat-jar.jar and reviewed the configuration settings, but the issue persists.
Could you please provide guidance on any additional files or settings that might be required to run the Hop pipeline successfully on a Spark Standalone cluster?
Thank you for your assistance.
Issue Priority
Priority: 1
Issue Component
Component: Other, Component: Hop Run
The text was updated successfully, but these errors were encountered: