Code of Thrones
2018-01-30T15:19:51+00:00
http://lith.me
Terry
terry.onblog@gmail.com
Nearest Neighbor Search with KDTree
2015-06-08T00:00:00+00:00
http://lith.me/code/2015/06/08/Nearest-Neighbor-Search-with-KDTree
<p>Offen we need to ask Google Maps, what’s the nearest restaurant/hotel/whatever nearby? Then Google Maps will take your GPS information (latitude, longitude), and do a search on the map to find the nearest location. This is a multidimensional nearest neighbor search problem, in which case <a href="http://en.wikipedia.org/wiki/K-d_tree">k-d tree</a> can be useful. K-d tree is a binary tree of k-dimensional data, and is interesting that it splits the left and right children by different dimensions at different depth of the tree. Similar to other binary trees, searches take O(log n) time on average.</p>
<p>Here’s Java implementation for nearest location search using kd-tree:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">javax.annotation.Nonnull</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">javax.annotation.Nullable</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.util.ArrayList</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.util.Collections</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.util.Comparator</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">static</span> <span class="n">java</span><span class="o">.</span><span class="na">lang</span><span class="o">.</span><span class="na">Math</span><span class="o">.</span><span class="na">cos</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">static</span> <span class="n">java</span><span class="o">.</span><span class="na">lang</span><span class="o">.</span><span class="na">Math</span><span class="o">.</span><span class="na">sin</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">static</span> <span class="n">java</span><span class="o">.</span><span class="na">lang</span><span class="o">.</span><span class="na">Math</span><span class="o">.</span><span class="na">toRadians</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /><span class="kd">class</span> <span class="nc">LocationKDTree</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">K</span> <span class="o">=</span> <span class="mi">3</span><span class="o">;</span> <span class="c1">// 3-d tree</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">final</span> <span class="n">Node</span> <span class="n">tree</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="nf">LocationKDTree</span><span class="o">(</span><span class="nd">@Nonnull</span> <span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">Location</span><span class="o">></span> <span class="n">locations</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">Node</span><span class="o">></span> <span class="n">nodes</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ArrayList</span><span class="o"><>(</span><span class="n">locations</span><span class="o">.</span><span class="na">size</span><span class="o">());</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">for</span> <span class="o">(</span><span class="kd">final</span> <span class="n">Location</span> <span class="n">location</span> <span class="o">:</span> <span class="n">locations</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">nodes</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="k">new</span> <span class="n">Node</span><span class="o">(</span><span class="n">location</span><span class="o">));</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">tree</span> <span class="o">=</span> <span class="n">buildTree</span><span class="o">(</span><span class="n">nodes</span><span class="o">,</span> <span class="mi">0</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Nullable</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="n">Location</span> <span class="nf">findNearest</span><span class="o">(</span><span class="kd">final</span> <span class="kt">double</span> <span class="n">latitude</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">double</span> <span class="n">longitude</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Node</span> <span class="n">node</span> <span class="o">=</span> <span class="n">findNearest</span><span class="o">(</span><span class="n">tree</span><span class="o">,</span> <span class="k">new</span> <span class="n">Node</span><span class="o">(</span><span class="n">latitude</span><span class="o">,</span> <span class="n">longitude</span><span class="o">),</span> <span class="mi">0</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">node</span> <span class="o">==</span> <span class="kc">null</span> <span class="o">?</span> <span class="kc">null</span> <span class="o">:</span> <span class="n">node</span><span class="o">.</span><span class="na">location</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="n">Node</span> <span class="nf">findNearest</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">current</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Node</span> <span class="n">target</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">depth</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">int</span> <span class="n">axis</span> <span class="o">=</span> <span class="n">depth</span> <span class="o">%</span> <span class="n">K</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">int</span> <span class="n">direction</span> <span class="o">=</span> <span class="n">getComparator</span><span class="o">(</span><span class="n">axis</span><span class="o">).</span><span class="na">compare</span><span class="o">(</span><span class="n">target</span><span class="o">,</span> <span class="n">current</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Node</span> <span class="n">next</span> <span class="o">=</span> <span class="o">(</span><span class="n">direction</span> <span class="o"><</span> <span class="mi">0</span><span class="o">)</span> <span class="o">?</span> <span class="n">current</span><span class="o">.</span><span class="na">left</span> <span class="o">:</span> <span class="n">current</span><span class="o">.</span><span class="na">right</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Node</span> <span class="n">other</span> <span class="o">=</span> <span class="o">(</span><span class="n">direction</span> <span class="o"><</span> <span class="mi">0</span><span class="o">)</span> <span class="o">?</span> <span class="n">current</span><span class="o">.</span><span class="na">right</span> <span class="o">:</span> <span class="n">current</span><span class="o">.</span><span class="na">left</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">Node</span> <span class="n">best</span> <span class="o">=</span> <span class="o">(</span><span class="n">next</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">?</span> <span class="n">current</span> <span class="o">:</span> <span class="n">findNearest</span><span class="o">(</span><span class="n">next</span><span class="o">,</span> <span class="n">target</span><span class="o">,</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(</span><span class="n">current</span><span class="o">.</span><span class="na">euclideanDistance</span><span class="o">(</span><span class="n">target</span><span class="o">)</span> <span class="o"><</span> <span class="n">best</span><span class="o">.</span><span class="na">euclideanDistance</span><span class="o">(</span><span class="n">target</span><span class="o">))</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">best</span> <span class="o">=</span> <span class="n">current</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(</span><span class="n">other</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(</span><span class="n">current</span><span class="o">.</span><span class="na">verticalDistance</span><span class="o">(</span><span class="n">target</span><span class="o">,</span> <span class="n">axis</span><span class="o">)</span> <span class="o"><</span> <span class="n">best</span><span class="o">.</span><span class="na">euclideanDistance</span><span class="o">(</span><span class="n">target</span><span class="o">))</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Node</span> <span class="n">possibleBest</span> <span class="o">=</span> <span class="n">findNearest</span><span class="o">(</span><span class="n">other</span><span class="o">,</span> <span class="n">target</span><span class="o">,</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(</span><span class="n">possibleBest</span><span class="o">.</span><span class="na">euclideanDistance</span><span class="o">(</span><span class="n">target</span><span class="o">)</span> <span class="o"><</span> <span class="n">best</span><span class="o">.</span><span class="na">euclideanDistance</span><span class="o">(</span><span class="n">target</span><span class="o">))</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">best</span> <span class="o">=</span> <span class="n">possibleBest</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">best</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Nullable</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="n">Node</span> <span class="nf">buildTree</span><span class="o">(</span><span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">Node</span><span class="o">></span> <span class="n">items</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">depth</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(</span><span class="n">items</span><span class="o">.</span><span class="na">isEmpty</span><span class="o">())</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="kc">null</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="n">Collections</span><span class="o">.</span><span class="na">sort</span><span class="o">(</span><span class="n">items</span><span class="o">,</span> <span class="n">getComparator</span><span class="o">(</span><span class="n">depth</span> <span class="o">%</span> <span class="n">K</span><span class="o">));</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">int</span> <span class="n">index</span> <span class="o">=</span> <span class="n">items</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">/</span> <span class="mi">2</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Node</span> <span class="n">root</span> <span class="o">=</span> <span class="n">items</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">index</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">root</span><span class="o">.</span><span class="na">left</span> <span class="o">=</span> <span class="n">buildTree</span><span class="o">(</span><span class="n">items</span><span class="o">.</span><span class="na">subList</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="n">index</span><span class="o">),</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">root</span><span class="o">.</span><span class="na">right</span> <span class="o">=</span> <span class="n">buildTree</span><span class="o">(</span><span class="n">items</span><span class="o">.</span><span class="na">subList</span><span class="o">(</span><span class="n">index</span> <span class="o">+</span> <span class="mi">1</span><span class="o">,</span> <span class="n">items</span><span class="o">.</span><span class="na">size</span><span class="o">()),</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">root</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="kd">class</span> <span class="nc">Node</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">Node</span> <span class="n">left</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">Node</span> <span class="n">right</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">Location</span> <span class="n">location</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">double</span><span class="o">[]</span> <span class="n">point</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[</span><span class="n">K</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="n">Node</span><span class="o">(</span><span class="kd">final</span> <span class="kt">double</span> <span class="n">latitude</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">double</span> <span class="n">longitude</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">point</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span> <span class="o">=</span> <span class="o">(</span><span class="kt">double</span><span class="o">)</span> <span class="o">(</span><span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">))</span> <span class="o">*</span> <span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">longitude</span><span class="o">)));</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">point</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">=</span> <span class="o">(</span><span class="kt">double</span><span class="o">)</span> <span class="o">(</span><span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">))</span> <span class="o">*</span> <span class="n">sin</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">longitude</span><span class="o">)));</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">point</span><span class="o">[</span><span class="mi">2</span><span class="o">]</span> <span class="o">=</span> <span class="o">(</span><span class="kt">double</span><span class="o">)</span> <span class="o">(</span><span class="n">sin</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">)));</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="n">Node</span><span class="o">(</span><span class="kd">final</span> <span class="n">Location</span> <span class="n">location</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">this</span><span class="o">(</span><span class="n">location</span><span class="o">.</span><span class="na">latitude</span><span class="o">,</span> <span class="n">location</span><span class="o">.</span><span class="na">longitude</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">this</span><span class="o">.</span><span class="na">location</span> <span class="o">=</span> <span class="n">location</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kt">double</span> <span class="nf">euclideanDistance</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">that</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">double</span> <span class="n">x</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span> <span class="o">-</span> <span class="n">that</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">double</span> <span class="n">y</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">-</span> <span class="n">that</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">double</span> <span class="n">z</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">2</span><span class="o">]</span> <span class="o">-</span> <span class="n">that</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">2</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">*</span> <span class="n">y</span> <span class="o">+</span> <span class="n">z</span> <span class="o">*</span> <span class="n">z</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kt">double</span> <span class="nf">verticalDistance</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">that</span><span class="o">,</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">axis</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="kt">double</span> <span class="n">d</span> <span class="o">=</span> <span class="k">this</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="n">axis</span><span class="o">]</span> <span class="o">-</span> <span class="n">that</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="n">axis</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">d</span> <span class="o">*</span> <span class="n">d</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="n">Comparator</span><span class="o"><</span><span class="n">Node</span><span class="o">></span> <span class="nf">getComparator</span><span class="o">(</span><span class="kd">final</span> <span class="kt">int</span> <span class="n">i</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">NodeComparator</span><span class="o">.</span><span class="na">values</span><span class="o">()[</span><span class="n">i</span><span class="o">];</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">static</span> <span class="kd">enum</span> <span class="n">NodeComparator</span> <span class="kd">implements</span> <span class="n">Comparator</span><span class="o"><</span><span class="n">Node</span><span class="o">></span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">x</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Override</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">int</span> <span class="nf">compare</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">a</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Node</span> <span class="n">b</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">Double</span><span class="o">.</span><span class="na">compare</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">0</span><span class="o">],</span> <span class="n">b</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">},</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">y</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Override</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">int</span> <span class="nf">compare</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">a</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Node</span> <span class="n">b</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">Double</span><span class="o">.</span><span class="na">compare</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">1</span><span class="o">],</span> <span class="n">b</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">1</span><span class="o">]);</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">},</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">z</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Override</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">int</span> <span class="nf">compare</span><span class="o">(</span><span class="kd">final</span> <span class="n">Node</span> <span class="n">a</span><span class="o">,</span> <span class="kd">final</span> <span class="n">Node</span> <span class="n">b</span><span class="o">)</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="n">Double</span><span class="o">.</span><span class="na">compare</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">2</span><span class="o">],</span> <span class="n">b</span><span class="o">.</span><span class="na">point</span><span class="o">[</span><span class="mi">2</span><span class="o">]);</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><br data-jekyll-commonmark-ghpages="" /><span class="kd">class</span> <span class="nc">Location</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">double</span> <span class="n">latitude</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">double</span> <span class="n">longitude</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="n">String</span> <span class="n">name</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="o">}</span></code></pre></figure>
<p>The code should be quite straightforward to use: First you need to initialized a <code class="highlighter-rouge">LocationKDTree</code> object with a list of locations, and then you can find the nearest location of given latitude/longitude by <code class="highlighter-rouge">findNearest</code> method. One thing worth noting is that the tree is built as a 3-d tree, because you know, the earth is round! The trick is to convert the (latitude, longitude) pair to a (x, y, z) coordinate by:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">x</span> <span class="o">=</span> <span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">))</span> <span class="o">*</span> <span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">longitude</span><span class="o">))</span><br data-jekyll-commonmark-ghpages="" /><span class="n">y</span> <span class="o">=</span> <span class="n">cos</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">))</span> <span class="o">*</span> <span class="n">sin</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">longitude</span><span class="o">))</span><br data-jekyll-commonmark-ghpages="" /><span class="n">z</span> <span class="o">=</span> <span class="n">sin</span><span class="o">(</span><span class="n">toRadians</span><span class="o">(</span><span class="n">latitude</span><span class="o">))</span></code></pre></figure>
<p><code class="highlighter-rouge">toRadians</code> is a Java function that converts an angel from degree measure to radian measure.</p>
Japanese Tokenization with Java and Lucene
2015-02-05T00:00:00+00:00
http://lith.me/code/2015/02/05/Japanese-tokenization-with-Java-and-Lucene
<p>I was trying to write Japanese analysis program with Java and Lucene 4.4. After trying Lucene’s CJKAnalyzer and Lucene-gosen, I ended up writing my own Tokenizer, Filter and Analyzer.</p>
<h1 id="lucene-cjkanalyzer">Lucene CJKAnalyzer</h1>
<p>Lucene 4.4 comes with a built-in analyzer for Chinese, Japanese and Korean. The demo result for Chinese on <a href="https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/cjk/package-summary.html">Lucene’s document</a> seem quite good, so I gave it a try on Japanese:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">final</span> <span class="n">String</span> <span class="n">s</span> <span class="o">=</span> <span class="s">"バカです。よろしくお願いいたします"</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kd">final</span> <span class="n">CJKAnalyzer</span> <span class="n">cjkAnalyzer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CJKAnalyzer</span><span class="o">(</span><span class="n">Version</span><span class="o">.</span><span class="na">LUCENE_44</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /><span class="kd">final</span> <span class="n">TokenStream</span> <span class="n">tokenStream</span> <span class="o">=</span> <span class="n">cjkAnalyzer</span><span class="o">.</span><span class="na">tokenStream</span><span class="o">(</span><span class="s">""</span><span class="o">,</span> <span class="k">new</span> <span class="n">StringReader</span><span class="o">(</span><span class="n">s</span><span class="o">));</span><br data-jekyll-commonmark-ghpages="" /><span class="kd">final</span> <span class="n">CharTermAttribute</span> <span class="n">charTermAttribute</span> <span class="o">=</span> <span class="n">tokenStream</span><span class="o">.</span><span class="na">addAttribute</span><span class="o">(</span><span class="n">CharTermAttribute</span><span class="o">.</span><span class="na">class</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /><span class="n">tokenStream</span><span class="o">.</span><span class="na">reset</span><span class="o">();</span><br data-jekyll-commonmark-ghpages="" /><span class="k">while</span> <span class="o">(</span><span class="n">tokenStream</span><span class="o">.</span><span class="na">incrementToken</span><span class="o">())</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">charTermAttribute</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span><br data-jekyll-commonmark-ghpages="" /><span class="o">}</span></code></pre></figure>
<p>And here’s what I got:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>バカ
カで
です
よろ
ろし
しく
くお
お願
願い
いい
いた
たし
しま
ます
</code></pre></div></div>
<p>Boo - it’s pure bigrams of the sentence. Most of the bigrams actaully make no sense in Japanese. Very <code class="highlighter-rouge">バカ</code> :)</p>
<h1 id="lucene-gosen">Lucene-gosen</h1>
<p>I tried another one that works with Lucene called Lucene-gosen, but taking a look at the <a href="https://github.com/lucene-gosen/lucene-gosen/blob/master/src/java/org/apache/lucene/analysis/gosen/GosenAnalyzer.java">source code</a>, it apparently doesn’t work with Lucene 4.4.</p>
<h1 id="sen">Sen</h1>
<p><a href="https://java.net/projects/sen">Sen</a> seems to be the original project that Lucene-gosen is based on, so I guess we can wrap up our own Tokenizer with Sen’s components:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">net.java.sen.StreamTagger</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">net.java.sen.Token</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">org.apache.lucene.analysis.Tokenizer</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">org.apache.lucene.analysis.tokenattributes.CharTermAttribute</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.io.IOException</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /><span class="kn">import</span> <span class="nn">java.io.Reader</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /><span class="kd">public</span> <span class="kd">final</span> <span class="kd">class</span> <span class="nc">JapaneseTokenizer</span> <span class="kd">extends</span> <span class="n">Tokenizer</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">final</span> <span class="n">StreamTagger</span> <span class="n">tagger</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /> <span class="kd">private</span> <span class="kd">final</span> <span class="n">CharTermAttribute</span> <span class="n">termAttr</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="nf">JapaneseTokenizer</span><span class="o">(</span><span class="kd">final</span> <span class="n">Reader</span> <span class="n">in</span><span class="o">,</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">senConfPath</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">super</span><span class="o">(</span><span class="n">in</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">tagger</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamTagger</span><span class="o">(</span><span class="n">in</span><span class="o">,</span> <span class="n">senConfPath</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /> <span class="n">termAttr</span> <span class="o">=</span> <span class="n">addAttribute</span><span class="o">(</span><span class="n">CharTermAttribute</span><span class="o">.</span><span class="na">class</span><span class="o">);</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /> <span class="nd">@Override</span><br data-jekyll-commonmark-ghpages="" /> <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">incrementToken</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">if</span> <span class="o">(!</span><span class="n">tagger</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span> <span class="o">{</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="kc">false</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /> <br data-jekyll-commonmark-ghpages="" /> <span class="kd">final</span> <span class="n">Token</span> <span class="n">token</span> <span class="o">=</span> <span class="n">tagger</span><span class="o">.</span><span class="na">next</span><span class="o">();</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">termAttr</span><span class="o">.</span><span class="na">setEmpty</span><span class="o">();</span><br data-jekyll-commonmark-ghpages="" /> <span class="n">termAttr</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">token</span><span class="o">.</span><span class="na">getSurface</span><span class="o">(),</span> <span class="mi">0</span><span class="o">,</span> <span class="n">token</span><span class="o">.</span><span class="na">length</span><span class="o">());</span><br data-jekyll-commonmark-ghpages="" /> <span class="k">return</span> <span class="kc">true</span><span class="o">;</span><br data-jekyll-commonmark-ghpages="" /> <span class="o">}</span><br data-jekyll-commonmark-ghpages="" /><span class="o">}</span></code></pre></figure>
<p>Apart from Tokenizer, we should also provide some filters to do common Japanese processing tricks like removing punctuations, normalizing half-with characters, and ruling out stopwords, etc. And here’s the result I got using the Sen-based Tokenzier:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>バカ
です
よろしく
お願い
いたし
ます
</code></pre></div></div>
<p>Looks good! Cheers!</p>
<p><strong>UPDATE</strong>
I was told that <a href="https://code.google.com/p/mecab/">MeCab</a> is more popular in the Japanese IT industry. I recommend you to try it out if Sen cannot meet your need.</p>
Writing an application on Mac OS X with Mono C#
2013-04-30T00:00:00+00:00
http://lith.me/code/2013/04/30/writing-an-application-on-mac-os-x-with-mono-c
<p>I have a windows form application written in C#, and I thought it would be cool if I can migrate it on to Mac OS with <a href="http://www.mono-project.com/Main_Page">Mono</a>, which claims to provide portable C# on both Windows, Linux and Mac OS.</p>
<p>First I installed <a href="http://www.go-mono.com/mono-downloads/download.html">Mono SDK</a> and <a href="http://xamarin.com/studio">Xamarin Studio</a>, and surprisingly found it able to import the visual studio project directly. Then came the disaster. Strange errors came one after another, and most of them doen’t make sense at all. The errors, as figured out later, are almost impossible to fix because of Mono’s poor support for some of the windows form controls such as <code class="highlighter-rouge">System.Windows.Forms.WebBrowser</code> and <code class="highlighter-rouge">System.Windows.Forms.NotifyIcon</code>.</p>
<p>So I had to give up the UI code based on <code class="highlighter-rouge">Windows.Forms</code> and try to find some substitute. <a href="http://xamarin.com/mac">Xamarin.Mac</a> seem to be a suitable option, with the basis of <a href="http://www.mono-project.com/MonoMac">MonoMac</a>. Xamarin provides a good Hello World example for newcomer to get started. The tutorial walks through creating a basic Xamarin.Mac application, demonstrating the development toolchain, including Xamarin Studio and Xcode, as well as explaining the basic structure of a Xamarin.Mac application (which is so different from the windows form application).</p>
<p>Developing with Xamarin.Mac is such a headache, because you can seldom find any examples except the Xamarin document. However, the application is based on the same UI controls of Cocoa applications, thus it looks as native as the Objective-C applications.</p>
<p>The application cannot run on a naked Mac without the <a href="http://www.go-mono.com/mono-downloads/download.html">Mono Runtime</a>, which makes it less attractive as every endpoint user need to install the Mono Runtime to run the application.</p>