Skip to content

Commit

Permalink
Update this notebook from Naereen/Lempel-Ziv_Complexity#1 📝
Browse files Browse the repository at this point in the history
  • Loading branch information
Naereen committed Jul 1, 2017
1 parent 07839b2 commit 2c32865
Showing 1 changed file with 183 additions and 13 deletions.
196 changes: 183 additions & 13 deletions Short_study_of_the_Lempel-Ziv_complexity.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
},
"source": [
"# Table of Contents\n",
" <p><div class=\"lev1 toc-item\"><a href=\"#Short-study-of-the-Lempel-Ziv-complexity\" data-toc-modified-id=\"Short-study-of-the-Lempel-Ziv-complexity-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Short study of the Lempel-Ziv complexity</a></div><div class=\"lev2 toc-item\"><a href=\"#Short-definition\" data-toc-modified-id=\"Short-definition-11\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Short definition</a></div><div class=\"lev2 toc-item\"><a href=\"#Python-implementation\" data-toc-modified-id=\"Python-implementation-12\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Python implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Tests-(1/2)\" data-toc-modified-id=\"Tests-(1/2)-13\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Tests (1/2)</a></div><div class=\"lev2 toc-item\"><a href=\"#Cython-implementation\" data-toc-modified-id=\"Cython-implementation-14\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Cython implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Numba-implementation\" data-toc-modified-id=\"Numba-implementation-15\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Numba implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Tests-(2/2)\" data-toc-modified-id=\"Tests-(2/2)-16\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>Tests (2/2)</a></div><div class=\"lev2 toc-item\"><a href=\"#Benchmarks\" data-toc-modified-id=\"Benchmarks-17\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>Benchmarks</a></div><div class=\"lev2 toc-item\"><a href=\"#Complexity-?\" data-toc-modified-id=\"Complexity-?-18\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>Complexity ?</a></div><div class=\"lev2 toc-item\"><a href=\"#Conclusion\" data-toc-modified-id=\"Conclusion-19\"><span class=\"toc-item-num\">1.9&nbsp;&nbsp;</span>Conclusion</a></div>"
" <p><div class=\"lev1 toc-item\"><a href=\"#Short-study-of-the-Lempel-Ziv-complexity\" data-toc-modified-id=\"Short-study-of-the-Lempel-Ziv-complexity-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Short study of the Lempel-Ziv complexity</a></div><div class=\"lev2 toc-item\"><a href=\"#Short-definition\" data-toc-modified-id=\"Short-definition-11\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Short definition</a></div><div class=\"lev2 toc-item\"><a href=\"#Python-implementation\" data-toc-modified-id=\"Python-implementation-12\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Python implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Tests-(1/2)\" data-toc-modified-id=\"Tests-(1/2)-13\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Tests (1/2)</a></div><div class=\"lev2 toc-item\"><a href=\"#Cython-implementation\" data-toc-modified-id=\"Cython-implementation-14\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Cython implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Numba-implementation\" data-toc-modified-id=\"Numba-implementation-15\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Numba implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Tests-(2/2)\" data-toc-modified-id=\"Tests-(2/2)-16\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>Tests (2/2)</a></div><div class=\"lev2 toc-item\"><a href=\"#Benchmarks\" data-toc-modified-id=\"Benchmarks-17\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>Benchmarks</a></div><div class=\"lev2 toc-item\"><a href=\"#Complexity-?\" data-toc-modified-id=\"Complexity-?-18\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>Complexity ?</a></div><div class=\"lev2 toc-item\"><a href=\"#Conclusion\" data-toc-modified-id=\"Conclusion-19\"><span class=\"toc-item-num\">1.9&nbsp;&nbsp;</span>Conclusion</a></div><div class=\"lev2 toc-item\"><a href=\"#(Experimental)-Julia-implementation\" data-toc-modified-id=\"(Experimental)-Julia-implementation-110\"><span class=\"toc-item-num\">1.10&nbsp;&nbsp;</span>(Experimental) <a href=\"http://julialang.org\" target=\"_blank\">Julia</a> implementation</a></div><div class=\"lev2 toc-item\"><a href=\"#Ending-notes\" data-toc-modified-id=\"Ending-notes-111\"><span class=\"toc-item-num\">1.11&nbsp;&nbsp;</span>Ending notes</a></div>"
]
},
{
Expand Down Expand Up @@ -144,7 +144,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"7.03 µs ± 457 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
"6.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
Expand Down Expand Up @@ -242,12 +242,14 @@
"source": [
"----\n",
"## Cython implementation\n",
"As [this blog post](https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/) explains it, we can easily try to use [Cython](http://Cython.org/) in a notebook cell."
"As [this blog post](https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/) explains it, we can easily try to use [Cython](http://Cython.org/) in a notebook cell.\n",
"\n",
"> See [the Cython documentation](http://docs.cython.org/en/latest/src/quickstart/build.html#using-the-jupyter-notebook) for more information."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"metadata": {
"collapsed": true
},
Expand All @@ -258,8 +260,10 @@
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%cython\n",
Expand All @@ -269,7 +273,7 @@
"ctypedef unsigned int DTYPE_t\n",
"\n",
"@cython.boundscheck(False) # turn off bounds-checking for entire function, quicker but less safe\n",
"def lempel_ziv_complexity_cython(str binary_sequence):\n",
"def lempel_ziv_complexity_cython(str binary_sequence not None):\n",
" \"\"\"Lempel-Ziv complexity for a binary sequence, in simple Cython code (C extension).\"\"\"\n",
" cdef DTYPE_t u = 0\n",
" cdef DTYPE_t v = 1\n",
Expand Down Expand Up @@ -311,7 +315,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand All @@ -320,7 +324,7 @@
"6"
]
},
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -332,14 +336,14 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"130 ns ± 8.72 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
"131 ns ± 5.22 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n"
]
}
],
Expand Down Expand Up @@ -454,7 +458,9 @@
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@jit(\"int32(boolean[:])\")\n",
Expand Down Expand Up @@ -1179,7 +1185,9 @@
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"x = [10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 20480]\n",
Expand Down Expand Up @@ -1514,6 +1522,168 @@
"metadata": {},
"source": [
"----\n",
"## (Experimental) [Julia](http://julialang.org) implementation\n",
"\n",
"I want to (quickly) try to see if I can use [Julia](http://julialang.org) to write a faster version of this function.\n",
"See [issue #1](https://github.com/Naereen/Lempel-Ziv_Complexity/issues/1).\n",
"\n",
"**Disclaimer:** I am still learning Julia!"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\"Lempel-Ziv complexity for a binary sequence, in simple Julia code.\"\n",
"lempel_ziv_complexity (generic function with 1 method)\n",
"\"1001111011000010\"\n",
"6\n",
"100\n",
"10000\n",
"778\n",
"CPU times: user 0 ns, sys: 4 ms, total: 4 ms\n",
"Wall time: 4.86 s\n"
]
}
],
"source": [
"%%time\n",
"%%script julia\n",
"\n",
"\"\"\"Lempel-Ziv complexity for a binary sequence, in simple Julia code.\"\"\"\n",
"function lempel_ziv_complexity(binary_sequence)\n",
" u, v, w = 0, 1, 1\n",
" v_max = 1\n",
" size = length(binary_sequence)\n",
" complexity = 1\n",
" while true\n",
" if binary_sequence[u + v] == binary_sequence[w + v]\n",
" v += 1\n",
" if w + v >= size\n",
" complexity += 1\n",
" break\n",
" end\n",
" else\n",
" if v > v_max\n",
" v_max = v\n",
" end\n",
" u += 1\n",
" if u == w\n",
" complexity += 1\n",
" w += v_max\n",
" if w > size\n",
" break\n",
" else\n",
" u = 0\n",
" v = 1\n",
" v_max = 1\n",
" end\n",
" else\n",
" v = 1\n",
" end\n",
" end\n",
" end\n",
" return complexity\n",
"end\n",
"\n",
"s = \"1001111011000010\"\n",
"lempel_ziv_complexity(s) # 1 / 0 / 01 / 1110 / 1100 / 0010\n",
"\n",
"M = 100;\n",
"N = 10000;\n",
"for _ in 1:M\n",
" s = join(rand(0:1, N));\n",
" lempel_ziv_complexity(s);\n",
"end\n",
"lempel_ziv_complexity(s) # 1 / 0 / 01 / 1110 / 1100 / 0010"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And to compare it fairly, let us use [Pypy](http://pypy.org) for comparison."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4 ms, sys: 0 ns, total: 4 ms\n",
"Wall time: 7.89 s\n"
]
}
],
"source": [
"%%time\n",
"%%pypy\n",
"\n",
"def lempel_ziv_complexity(binary_sequence):\n",
" \"\"\"Lempel-Ziv complexity for a binary sequence, in simple Python code.\"\"\"\n",
" u, v, w = 0, 1, 1\n",
" v_max = 1\n",
" length = len(binary_sequence)\n",
" complexity = 1\n",
" while True:\n",
" if binary_sequence[u + v - 1] == binary_sequence[w + v - 1]:\n",
" v += 1\n",
" if w + v >= length:\n",
" complexity += 1\n",
" break\n",
" else:\n",
" if v > v_max:\n",
" v_max = v\n",
" u += 1\n",
" if u == w:\n",
" complexity += 1\n",
" w += v_max\n",
" if w > length:\n",
" break\n",
" else:\n",
" u = 0\n",
" v = 1\n",
" v_max = 1\n",
" else:\n",
" v = 1\n",
" return complexity\n",
"\n",
"s = \"1001111011000010\"\n",
"lempel_ziv_complexity(s) # 1 / 0 / 01 / 1110 / 1100 / 0010\n",
"\n",
"from random import random\n",
"\n",
"M = 100\n",
"N = 10000\n",
"for _ in range(M):\n",
" s = ''.join(str(int(random() < 0.5)) for _ in range(N))\n",
" lempel_ziv_complexity(s)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we can check that on these 100 random trials on strings of size 10000, the naive Julia version is about twice as fast as the naive Python version (executed by Pypy for speedup).\n",
"\n",
"That's good, but it's not much..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"## Ending notes\n",
"> Thanks for reading!\n",
"> My implementation is [now open-source and available on GitHub](https://github.com/Naereen/Lempel-Ziv_Complexity), on https://github.com/Naereen/Lempel-Ziv_Complexity.\n",
"\n",
Expand Down

0 comments on commit 2c32865

Please sign in to comment.