Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallax 실행시 Tensorflow error 문의 #27

Open
dostos opened this issue Dec 6, 2018 · 2 comments
Open

Parallax 실행시 Tensorflow error 문의 #27

dostos opened this issue Dec 6, 2018 · 2 comments

Comments

@dostos
Copy link

dostos commented Dec 6, 2018

안녕하세요,
과제를 진행 중, Parallax에서 실행시에만 발생하는 문제가 생겨 문의 드리고자 합니다.

image
image

위의 코드로 training을 수행시 Parallax 에서만 아래의 문제가 발생합니다.

`(128, 784)
Traceback (most recent call last):
File "/hw2/code/run_parallax.py", line 71, in
cost = autoencoder.partial_fit(sess, batch)
File "/hw2/code/autoencoder/autoencoder_models/Autoencoder.py", line 65, in partial_fit
cost, opt = sess.run((self.cost, self.optimizer), feed_dict={self.x: X})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1148, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1239, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1224, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1296, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/parallax/core/python/common/session_context.py", line 40, in _parallax_run
return self._run_internal(fetches, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (784,) for Tensor u'Placeholder:0', which has shape '(?, 784)'

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[59384,1],0]
Exit code: 1
--------------------------------------------------------------------------`

session run에 들어가는 tensor shape이 문제인듯 한데,
다른 실행 코드들도 같은 방식으로 잘 동작하며 feed 전의 모양은 해당하는 tensor의 shape에 알맞은 모양임을 확인하였습니다.

https://github.com/snuspl/parallax/blob/cpu_enable/parallax/parallax/core/python/common/session_context.py#L40

해당 라인 이후로 feed data가 parallax 내부에서 변환되는 부분이 있는지 알고 싶습니다.

@gyeongin
Copy link
Contributor

gyeongin commented Dec 6, 2018

https://github.com/swsnu/bd2018/blob/master/hw2/run_parallax.py#L61
위 라인을 참고해주세요. ({x: [batch[0]], y: [batch[1]], is_training: [True]})
이 부분은 Parallax 이전 버전에서 worker 여러개를 하나의 session.run으로 실행시켰는데, 이때 feed dict를 worker 갯수만큼 넣어주기 위해 있었던 부분으로, 지금은 사용자 API를 비 직관적이게 만들기 때문에 바뀌어야 하는 부분입니다. (feed dict의 value 부분이 python list 혹은 enumerable일 것으로 가정하고 slice해 worker들에게 나누어줌)

@dostos
Copy link
Author

dostos commented Dec 7, 2018

빠른 답변 감사합니다.

@dostos dostos closed this as completed Dec 7, 2018
@dostos dostos reopened this Dec 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants