add training cost

hao-ai-lab · Mar 5, 2024 · 914545a · 914545a
1 parent 7a222c4
commit 914545a
Show file tree

Hide file tree

Showing 5 changed files with 59 additions and 0 deletions.
diff --git a/content/blogs/cllm/index.md b/content/blogs/cllm/index.md
@@ -160,6 +160,23 @@ Our experiments contain three domain-specific tasks, including Spider (text-to-S
 **Open-domain conversational Challenge (MT-bench):** CLLM trained from LLaMA2-7B using ShareGPT dataset can achieve roughly the same speedup as Medusa2 when combined with lookahead decoding, with comparable scores on MT-bench. However, CLLM offers higher adaptability and memory efficiency as it requires no modifications to the target model's original architecture and no auxiliary components.
 {{< /justify >}}
 
+**Training Cost:** 
+{{< justify >}}
+The fine-tuning cost of CLLMs is moderate, e.g., passing only around 1M tokens for LLaMA-7B to achieve a $3.4\times$ speedup on the Spider dataset. In the cases where the dataset size is large, for example, for CodeSearchNet-Python, only 10% of the dataset is required to generate Jacobi trajectories in training CLLMs to obtain around $2.5\times$ speedup. The total number of tokens can be estimated by taking:
+
+$N = \text{avg # of trajectories per prompt} \times \text{avg seq length} \times \text{# of prompts}$.
+{{< /justify >}}
+
+{{< center >}}
+| dataset | estimated training cost (tokens) |
+|:---:|:---:|
+| Spider | $2\times 10^6$ |
+| CodeSearchNet-Python | $1 \times 10^8$ |
+| GSM8K | $1 \times 10^7$ |
+| ShareGPT | $2 \times 10^8$ |
+
+{{< /center >}}
+
 ### Fast Forwarding and Stationary Tokens
 
 {{< image src="img/trajectory_compare_aligned.png" alt="trajectory_compare" width="120%" title="Figure 7: Comparison of Jacobi trajectory between a target LLM and CLLMs on Spider. Each point along the Jacobi trajectory is a color-coded sequence: blue for correct tokens matching with AR results, and red for inaccurate ones. CLLM demonstrates enhanced efficiency, converging to the fixed point $2\times$ faster the Target LLM. This increased efficiency in the CLLM can be attributed to the consistency loss which facilitates the learning of the structure of each $n$-token sequence given a prefix.">}}

diff --git a/layouts/shortcodes/center.html b/layouts/shortcodes/center.html
@@ -0,0 +1,3 @@
+<div style="text-align: center;">
+    {{ .Inner | markdownify }}
+</div>
diff --git a/public/.DS_Store b/public/.DS_Store
diff --git a/public/blogs/cllm/img/clm_objective_legacy.png b/public/blogs/cllm/img/clm_objective_legacy.png
diff --git a/public/blogs/cllm/index.html b/public/blogs/cllm/index.html
@@ -386,6 +386,45 @@ <h3 id="results">Results<a hidden class="anchor" aria-hidden="true" href="#resul
 
 
 
+<p><strong>Training Cost:</strong>
+<div style="text-align: justify;">
+    <p>The fine-tuning cost of CLLMs is moderate, e.g., passing only around 1M tokens for LLaMA-7B to achieve a $3.4\times$ speedup on the Spider dataset. In the cases where the dataset size is large, for example, for CodeSearchNet-Python, only 10% of the dataset is required to generate Jacobi trajectories in training CLLMs to obtain around $2.5\times$ speedup. The total number of tokens can be estimated by taking:</p>
+<p>$N = \text{avg # of trajectories per prompt} \times \text{avg seq length} \times \text{# of prompts}$.</p>
+
+</div>
+
+
+</p>
+<div style="text-align: center;">
+    <table>
+<thead>
+<tr>
+<th style="text-align:center">dataset</th>
+<th style="text-align:center">estimated training cost (tokens)</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align:center">Spider</td>
+<td style="text-align:center">$2\times 10^6$</td>
+</tr>
+<tr>
+<td style="text-align:center">CodeSearchNet-Python</td>
+<td style="text-align:center">$1 \times 10^8$</td>
+</tr>
+<tr>
+<td style="text-align:center">GSM8K</td>
+<td style="text-align:center">$1 \times 10^7$</td>
+</tr>
+<tr>
+<td style="text-align:center">ShareGPT</td>
+<td style="text-align:center">$2 \times 10^8$</td>
+</tr>
+</tbody>
+</table>
+
+</div>
+
 <h3 id="fast-forwarding-and-stationary-tokens">Fast Forwarding and Stationary Tokens<a hidden class="anchor" aria-hidden="true" href="#fast-forwarding-and-stationary-tokens">#</a></h3>
 
     <figure>