Work with BOM unicode files #7

GoogleCodeExporter · 2016-01-07T09:57:12Z

The plugin can't work with unicode wile which start with BOM index. There is 
easy solution how to solve this problem. The comons-io (which is used in 
project) have ability to work with this type of files.

Changes in pom.xml
@@ -70,11 +70,11 @@
                                <groupId>org.apache.maven.plugins</groupId>
                                <artifactId>maven-compiler-plugin</artifactId>
                                <configuration>
-                                       <source>1.4</source>
-                                       <target>1.4</target>
+                                       <source>1.6</source>
+                                       <target>1.6</target>
                                </configuration>
                        </plugin>
@@ -99,7 +99,7 @@
                <dependency>
                        <groupId>commons-io</groupId>
                        <artifactId>commons-io</artifactId>
-                       <version>1.3.1</version>
+                       <version>2.4</version>
                </dependency>

Changes in AbstractDBMojo.java:
@@ -19,6 +19,8 @@
 import java.util.List;
 import java.util.zip.GZIPInputStream;

+import org.apache.commons.io.ByteOrderMark;
+import org.apache.commons.io.input.BOMInputStream;
 import org.apache.commons.lang.StringUtils;
 import org.apache.maven.plugin.AbstractMojo;
 import org.apache.maven.plugin.MojoExecutionException;
@@ -233,14 +240,13 @@

         // check encoding
         checkEncoding();
-
+
         // our file reader
-        Reader reader;
-        reader = new InputStreamReader(ips, scriptEncoding);
-
+        Reader reader = inputStreamToReaderBOM(ips);
+
         // create SQL Statement
         Statement st = con.createStatement();
-
+
         StringBuffer sql = new StringBuffer();
         String line;
         BufferedReader in = new BufferedReader(reader);
@@ -323,17 +328,16 @@
             ips = new GZIPInputStream(ips);
             getLog().info(" file is gz compressed, using gzip stream");
         }
-
+
         // check encoding
         checkEncoding();
-
+
         // our file reader
-        Reader reader;
-        reader = new InputStreamReader(ips, scriptEncoding);
-
+        Reader reader = inputStreamToReaderBOM(ips);
+
         // create SQL Statement
         Statement st = con.createStatement();
-
+
         StringBuffer sql = new StringBuffer();
         String line;
         BufferedReader in = new BufferedReader(reader);


+
+    private Reader inputStreamToReaderBOM(InputStream in) throws IOException {
+        BOMInputStream bOMInputStream = new BOMInputStream(in,
+            ByteOrderMark.UTF_16LE, ByteOrderMark.UTF_16BE,
+            ByteOrderMark.UTF_32LE, ByteOrderMark.UTF_32BE);
+        ByteOrderMark bom = bOMInputStream.getBOM();
+
+        String charsetName = bom == null ? scriptEncoding : 
bom.getCharsetName();
+
+        return new InputStreamReader(new BufferedInputStream(bOMInputStream), 
charsetName);
+    }
+

Original issue reported on code.google.com by [email protected] on 2 Feb 2015 at 7:00

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Jan 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work with BOM unicode files #7

Work with BOM unicode files #7

GoogleCodeExporter commented Jan 7, 2016

Work with BOM unicode files #7

Work with BOM unicode files #7

Comments

GoogleCodeExporter commented Jan 7, 2016