Singh_Thesis.lof

\addvspace {1em}
\contentsline {figure}{\numberline {1.1}{\ignorespaces A sample document image containing three languages: Sanskrit, Hindi and English. When an individual English/Hindi \textsc {ocr} will be run on this document, the recognition output will not be accurate. But, when a multilingual \textsc {ocr} containing a script identification module will be able to recognize the inherent text with high accuracy.}}{2}
\contentsline {figure}{\numberline {1.2}{\ignorespaces In order to move towards a ``paperless office", there are million of documents in several scripts and languages to be digitized. But due to the limitation of existing \textsc {ocr} systems, the inherent script and language of the documents should be known beforehand. Hence, a script identification module is added in \textsc {ocr} system which will identify the scripts and language at word or line level before passing it to corresponding scripts/language \textsc {ocr}.}}{3}
\addvspace {1em}
\contentsline {figure}{\numberline {2.1}{\ignorespaces The basic \nobreakspace {}\textsc {lbp} operator. The figures shows the circular $(8, 1)$, $(16,2)$ and $(8,2)$ neighborhoods. The pixels are bilinearly interpolated whenever the sampling point is not at the center of a pixel. Figure source\nobreakspace {}\cite {LBPOjala2002}.}}{8}
\contentsline {figure}{\numberline {2.2}{\ignorespaces Histogram of Gradients computation by recording the gradient orientation at edges. Figure courtesy\nobreakspace {}\cite {Vibhor13}.}}{9}
\contentsline {figure}{\numberline {2.3}{\ignorespaces \textbf {\textit {k}-Means Clustering}. Example data points, and the clusters computed by \textit {k}-means clustering. Figure courtesy\nobreakspace {}\cite {junejaThesis}.}}{10}
\contentsline {figure}{\numberline {2.4}{\ignorespaces A recurrent neural network, unrolled.\nobreakspace {}\cite {colahSite}}}{13}
\contentsline {figure}{\numberline {2.5}{\ignorespaces The repeating module in a standard \textsc {RNN} contains a single layer\nobreakspace {}\cite {colahSite}}}{14}
\contentsline {figure}{\numberline {2.6}{\ignorespaces \textbf {Preservation of gradient information by \textsc {LSTM}.} The state of the input, forget, and output gate states are displayed below, to the left and above the hidden layer node, which corresponds to a single memory cell. For simplicity, the gates are either entirely open (`$O$') or closed (`-'). The memory cell `remembers' the first input as long as the forget gate is open and the input gate is closed, and the sensitivity of the output layer can be switched on and off by the output gate without affecting the cell.\nobreakspace {}\cite {gravesThesis}}}{15}
\contentsline {figure}{\numberline {2.7}{\ignorespaces The repeating module in a standard \textsc {RNN} contains four interacting layers.\nobreakspace {}\cite {colahSite}}}{15}
\addvspace {1em}
\contentsline {figure}{\numberline {3.1}{\ignorespaces A typical example of a street scene image captured in a multilingual country, e.g. India. Our goal in this chapter is to localize the text and answer ``what script is this?" to facilitate the reading in scene images.}}{17}
\contentsline {figure}{\numberline {3.2}{\ignorespaces Few example images from thee \textsc {ilst} datset we introduce. we provide ground truth text bounding box, script and text for the images. (b) Few cropped word images of our dataset. The dataset can be used for variety of problems including recognition, text localization etc.}}{18}
\contentsline {subfigure}{\numberline {(a)}{\ignorespaces {}}}{18}
\contentsline {figure}{\numberline {3.3}{\ignorespaces Strokes are atomic units of scripts. We show some representative strokes of following scripts (top to bottom): Hindi, Kannada, Malayalam, Tamil and Telugu. Our method yields the strokes which are representative and discriminative enough for a cropped image.}}{21}
\contentsline {figure}{\numberline {3.4}{\ignorespaces Method Overview: The figure depicts the feature computation process where, first we find the local features from the images, we cluster these feature to get the local histogram of visual words. Then we cluster the histogram of visual words to get the representation of words in form of strokes.}}{22}
\contentsline {figure}{\numberline {3.5}{\ignorespaces Confusion matrix on ILST cropped words. Our method achieve a 88.67\% accuracy of script identification on the introduced dataset.}}{25}
\contentsline {figure}{\numberline {3.6}{\ignorespaces Success and Failure Cases. Despite high variations in the dataset, our method correctly identifies the script of scene text images. The ``Success" columns depicts the correctly classified word images, and wrongly classified words are shown in ``Failure" column along with recognized script in red boxes.}}{26}
\contentsline {figure}{\numberline {3.7}{\ignorespaces An example result of End-to-end script identification of our method. We localize the text boxes in images using method using\nobreakspace {}\cite {GomezK14} and\nobreakspace {}\cite {tessOCR}. Then we apply our method to find the inherent script in the text boxes.}}{28}
\addvspace {1em}
\contentsline {figure}{\numberline {4.1}{\ignorespaces Figure depicts the script and language identified at word level in document snippets written in Roman-script (first row) based languages and Indic scripts (second row), respectively. In the first row, red, green and blue rectangles denote German, French and Spanish languages, respectively. In the second row, violet, orange and brown rectangles denote Hindi, Telugu and Malayalam scripts, respectively. Unlike the approaches in the past we propose a method to identify the script and language at word and line level by employing popular Recurrent Neural Network (\textsc {rnn}s).}}{30}
\contentsline {figure}{\numberline {4.2}{\ignorespaces The architecture for \textsc {rnn} based script and language identification. From left to right, the segmented line and word from the document images are horizontally divided into two parts. Then, sequence features are calculated from sliding windows, $w$. Here, $m$ is the number of sliding windows and $n$ is the number of features , $f$, computed from a single window. These features are then given as input to the \textsc {lstm} cell of \textsc {rnn} to identify the script and language of current line/word image.}}{33}
\contentsline {figure}{\numberline {4.3}{\ignorespaces The sequence features are calculated from sliding windows, $w$. Here, $m$ is the number of sliding windows and $n$ is the number of features , $f$, computed from a single window. These features are then given as input to the \textsc {lstm} cell of \textsc {rnn} to identify the script and language of current line/word image.}}{34}
\contentsline {figure}{\numberline {4.4}{\ignorespaces Script identification Results: Some failure cases in script identification at word level. First row, first column shows Kannada words identified as Telugu and the second column in same row shows Telugu words identified as Kannada words. In second row, first column shows the Gurumukhi words as Hindi and in second column of the same row, Hindi words identified as Gurumukhi. Similarly in the third row of the figure, first column shows Bangla words identified as Assamese and vice versa in second column.}}{37}
\contentsline {figure}{\numberline {4.5}{\ignorespaces Confusion Matrix for the script identification at word level. The blank spaces in the graph denotes predictions that are less than 0.40\%.}}{38}
\contentsline {figure}{\numberline {4.6}{\ignorespaces Language Identification Results: Some failure cases for language identification at word level for both the Indian and Roman-script based dataset. In the first row, the first column shows the French words identified as Spanish and the second column shows Spanish words identified as French. In the second row, the first column shows the German words identified as French and the second ones shows French words identified as German. For the third row, the first column shows the Marathi words identified as Hindi, and vice versa in second column. In the fourth row, the first column shows the Assamese words identified as Manipuri and vice versa in the second column.}}{39}
\addvspace {1em}