Skip to content

Commit

Permalink
add method and prior figs
Browse files Browse the repository at this point in the history
  • Loading branch information
ajonnavittula committed Apr 26, 2024
1 parent 9ba22e1 commit 1aae2f8
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 4 deletions.
29 changes: 25 additions & 4 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ <h2 class="title is-2" style="text-align: center;">How can we compare human and
<br><br>
-->



Expand All @@ -254,26 +254,47 @@ <h2 class="title is-2" style="text-align: center;">How can we compare human and

<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-2" style="text-align: center;">WHIRL: In-the-Wild Human Imitating Robot Learning</h2>
<h2 class="title is-2" style="text-align: center;">How Does it Work?</h2>
<tr>
<td>
<br>
<div class="columns is-centered has-text-centered">
<img src="./resources/method_v4.png" style="width: 85%;"></img>
<img src="./static/images/method.png" style="width: 85%;"></img>
</div>
</td>
</tr>
<div class="is-vcentered interpolation-panel">
<div class="content has-text-centered">
<p style = "font-size: 18px">
Our method, WHIRL, provides an efficient way to learn from human videos. We have three core components: we first <b>watch</b> and obtain human priors such as hand movement and object interactions. We <b>repeat</b> these priors by interacting in the real world, by both trying to achieve task success and explore around the prior. We <b>improve</b> our task policy by leveraging our agent-agnostic objective function which aligns human and robot videos.
Outline of VIEW, our proposed method for human-to-robot visual imitation learning. (Top Left) VIEW begins with a single video demonstration of a task. (Bottom Left) From this video we extract the object of interest, its trajectory, and the human's human trajectory. (Middle) We then perform compression to obtain a trajectory prior --- a sequence of waypoints the robot arm should interpolate between to complete the task. Unfortunately, this initial trajectory is often imprecise due to the differences between human hands and robot grippers, as well as noise in the extraction process. We therefore refine the prior using a residual network, which is trained on previous tasks to de-noises the current data. (Right) The de-noised trajectory is then segmented into two phases: grasp exploration and task exploration. (Top Right) During grasp exploration, the robot determines how to pick up the object by modifying the pick point in its trajectory. (Bottom Right) Following a successful grasp, the robot proceeds to task exploration, where is simultaneously corrects the remaining waypoints of the trajectory. After completing exploration, the robot synthesizes a complete trajectory. (Middle) This solved trajectory, alongside the prior trajectory, is used to further train the residual network, thus enhancing the performance of our method in future tasks.
</p>
</div>
</div>
</div>
</div>

<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-2" style="text-align: center;">How Do We Extract Human Priors?</h2>
<tr>
<td>
<br>
<div class="columns is-centered has-text-centered">
<img src="./static/images/prior.png" style="width: 85%;"></img>
</div>
</td>
</tr>
<div class="is-vcentered interpolation-panel">
<div class="content has-text-centered">
<p style = "font-size: 18px">
An overview of our prior extraction method (Bottom Left in \fig{method}). Utilizing the 100 Days of Hands ($100$DOH) detector \cite{shan2020understanding}, we first identify the location of the hand and if it is in contact with any objects present in the frame. We then refine the human's hand trajectory using the MANO model \cite{romero2022embodied} to capture wrist movements. Subsequently, to eliminate redundancy, we apply the SQUISHE algorithm \cite{muckell2014compression}. This produces an initial trajectory with key waypoints that the robot should interpolate between. To pinpoint the object of interest amidst potential clutter, we analyze frames where hand-object contact occurs, creating anchor boxes that --- in conjunction with an object detector --- reveal the object the human interacts with most frequently. This identification enables us to construct an accurate object trajectory from the human's video.
</p>
</div>
</div>
</div>
</div>

<!--
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-2" style="text-align: center;">Training Procedure</h2>
Expand Down
Binary file added docs/static/images/method.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/static/images/preview.jpg
Binary file not shown.
Binary file added docs/static/images/prior.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1aae2f8

Please sign in to comment.