character encoding – All inclusive Charset to avoid java.nio.charset.MalformedInputException: Input length = 1?

character encoding – All inclusive Charset to avoid java.nio.charset.MalformedInputException: Input length = 1?

You probably want to have a list of supported encodings. For each file, try each encoding in turn, maybe starting with UTF-8. Every time you catch the MalformedInputException, try the next encoding.

Creating BufferedReader from Files.newBufferedReader

Files.newBufferedReader(Paths.get(a.txt), StandardCharsets.UTF_8);

when running the application it may throw the following exception:

java.nio.charset.MalformedInputException: Input length = 1

But

new BufferedReader(new InputStreamReader(new FileInputStream(a.txt),utf-8));

works well.

The different is that, the former uses CharsetDecoder default action.

The default action for malformed-input and unmappable-character errors is to report them.

while the latter uses the REPLACE action.

cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)

character encoding – All inclusive Charset to avoid java.nio.charset.MalformedInputException: Input length = 1?

ISO-8859-1 is an all-inclusive charset, in the sense that its guaranteed not to throw MalformedInputException. So its good for debugging, even if your input is not in this charset. So:-

req.setCharacterEncoding(ISO-8859-1);

I had some double-right-quote/double-left-quote characters in my input, and both US-ASCII and UTF-8 threw MalformedInputException on them, but ISO-8859-1 worked.

Leave a Reply

Your email address will not be published.