Skip to content

Commit

Permalink
add training cost
Browse files Browse the repository at this point in the history
  • Loading branch information
snyhlxde1 committed Mar 5, 2024
1 parent 7a222c4 commit 914545a
Show file tree
Hide file tree
Showing 5 changed files with 59 additions and 0 deletions.
17 changes: 17 additions & 0 deletions content/blogs/cllm/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,23 @@ Our experiments contain three domain-specific tasks, including Spider (text-to-S
**Open-domain conversational Challenge (MT-bench):** CLLM trained from LLaMA2-7B using ShareGPT dataset can achieve roughly the same speedup as Medusa2 when combined with lookahead decoding, with comparable scores on MT-bench. However, CLLM offers higher adaptability and memory efficiency as it requires no modifications to the target model's original architecture and no auxiliary components.
{{< /justify >}}

**Training Cost:**
{{< justify >}}
The fine-tuning cost of CLLMs is moderate, e.g., passing only around 1M tokens for LLaMA-7B to achieve a $3.4\times$ speedup on the Spider dataset. In the cases where the dataset size is large, for example, for CodeSearchNet-Python, only 10% of the dataset is required to generate Jacobi trajectories in training CLLMs to obtain around $2.5\times$ speedup. The total number of tokens can be estimated by taking:

$N = \text{avg # of trajectories per prompt} \times \text{avg seq length} \times \text{# of prompts}$.
{{< /justify >}}

{{< center >}}
| dataset | estimated training cost (tokens) |
|:---:|:---:|
| Spider | $2\times 10^6$ |
| CodeSearchNet-Python | $1 \times 10^8$ |
| GSM8K | $1 \times 10^7$ |
| ShareGPT | $2 \times 10^8$ |

{{< /center >}}

### Fast Forwarding and Stationary Tokens

{{< image src="img/trajectory_compare_aligned.png" alt="trajectory_compare" width="120%" title="Figure 7: Comparison of Jacobi trajectory between a target LLM and CLLMs on Spider. Each point along the Jacobi trajectory is a color-coded sequence: blue for correct tokens matching with AR results, and red for inaccurate ones. CLLM demonstrates enhanced efficiency, converging to the fixed point $2\times$ faster the Target LLM. This increased efficiency in the CLLM can be attributed to the consistency loss which facilitates the learning of the structure of each $n$-token sequence given a prefix.">}}
Expand Down
3 changes: 3 additions & 0 deletions layouts/shortcodes/center.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<div style="text-align: center;">
{{ .Inner | markdownify }}
</div>
Binary file modified public/.DS_Store
Binary file not shown.
Binary file added public/blogs/cllm/img/clm_objective_legacy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions public/blogs/cllm/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,45 @@ <h3 id="results">Results<a hidden class="anchor" aria-hidden="true" href="#resul



<p><strong>Training Cost:</strong>
<div style="text-align: justify;">
<p>The fine-tuning cost of CLLMs is moderate, e.g., passing only around 1M tokens for LLaMA-7B to achieve a $3.4\times$ speedup on the Spider dataset. In the cases where the dataset size is large, for example, for CodeSearchNet-Python, only 10% of the dataset is required to generate Jacobi trajectories in training CLLMs to obtain around $2.5\times$ speedup. The total number of tokens can be estimated by taking:</p>
<p>$N = \text{avg # of trajectories per prompt} \times \text{avg seq length} \times \text{# of prompts}$.</p>

</div>


</p>
<div style="text-align: center;">
<table>
<thead>
<tr>
<th style="text-align:center">dataset</th>
<th style="text-align:center">estimated training cost (tokens)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">Spider</td>
<td style="text-align:center">$2\times 10^6$</td>
</tr>
<tr>
<td style="text-align:center">CodeSearchNet-Python</td>
<td style="text-align:center">$1 \times 10^8$</td>
</tr>
<tr>
<td style="text-align:center">GSM8K</td>
<td style="text-align:center">$1 \times 10^7$</td>
</tr>
<tr>
<td style="text-align:center">ShareGPT</td>
<td style="text-align:center">$2 \times 10^8$</td>
</tr>
</tbody>
</table>

</div>

<h3 id="fast-forwarding-and-stationary-tokens">Fast Forwarding and Stationary Tokens<a hidden class="anchor" aria-hidden="true" href="#fast-forwarding-and-stationary-tokens">#</a></h3>

<figure>
Expand Down

0 comments on commit 914545a

Please sign in to comment.