You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
안녕하세요,
과제를 진행 중, Parallax에서 실행시에만 발생하는 문제가 생겨 문의 드리고자 합니다.
위의 코드로 training을 수행시 Parallax 에서만 아래의 문제가 발생합니다.
`(128, 784)
Traceback (most recent call last):
File "/hw2/code/run_parallax.py", line 71, in
cost = autoencoder.partial_fit(sess, batch)
File "/hw2/code/autoencoder/autoencoder_models/Autoencoder.py", line 65, in partial_fit
cost, opt = sess.run((self.cost, self.optimizer), feed_dict={self.x: X})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1148, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1239, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1224, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1296, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/parallax/core/python/common/session_context.py", line 40, in _parallax_run
return self._run_internal(fetches, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (784,) for Tensor u'Placeholder:0', which has shape '(?, 784)'
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[59384,1],0]
Exit code: 1
--------------------------------------------------------------------------`
session run에 들어가는 tensor shape이 문제인듯 한데,
다른 실행 코드들도 같은 방식으로 잘 동작하며 feed 전의 모양은 해당하는 tensor의 shape에 알맞은 모양임을 확인하였습니다.
https://github.com/swsnu/bd2018/blob/master/hw2/run_parallax.py#L61
위 라인을 참고해주세요. ({x: [batch[0]], y: [batch[1]], is_training: [True]})
이 부분은 Parallax 이전 버전에서 worker 여러개를 하나의 session.run으로 실행시켰는데, 이때 feed dict를 worker 갯수만큼 넣어주기 위해 있었던 부분으로, 지금은 사용자 API를 비 직관적이게 만들기 때문에 바뀌어야 하는 부분입니다. (feed dict의 value 부분이 python list 혹은 enumerable일 것으로 가정하고 slice해 worker들에게 나누어줌)
안녕하세요,
과제를 진행 중, Parallax에서 실행시에만 발생하는 문제가 생겨 문의 드리고자 합니다.
위의 코드로 training을 수행시 Parallax 에서만 아래의 문제가 발생합니다.
`(128, 784)
Traceback (most recent call last):
File "/hw2/code/run_parallax.py", line 71, in
cost = autoencoder.partial_fit(sess, batch)
File "/hw2/code/autoencoder/autoencoder_models/Autoencoder.py", line 65, in partial_fit
cost, opt = sess.run((self.cost, self.optimizer), feed_dict={self.x: X})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1148, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1239, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1224, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1296, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/parallax/core/python/common/session_context.py", line 40, in _parallax_run
return self._run_internal(fetches, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (784,) for Tensor u'Placeholder:0', which has shape '(?, 784)'
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[59384,1],0]
Exit code: 1
--------------------------------------------------------------------------`
session run에 들어가는 tensor shape이 문제인듯 한데,
다른 실행 코드들도 같은 방식으로 잘 동작하며 feed 전의 모양은 해당하는 tensor의 shape에 알맞은 모양임을 확인하였습니다.
https://github.com/snuspl/parallax/blob/cpu_enable/parallax/parallax/core/python/common/session_context.py#L40
해당 라인 이후로 feed data가 parallax 내부에서 변환되는 부분이 있는지 알고 싶습니다.
The text was updated successfully, but these errors were encountered: