Skip to content

[Bug]: JSONinput trim doesn't remove non breakable space (NBSP) #7229

@PhilipCubix

Description

@PhilipCubix

Apache Hop version?

2.17.0

Java version?

21.0.9.10

Operating system

Windows

What happened?

Hi,

we had a JSON file containing the following line: a city name followed by 26 non breakable spaces (identified using https://www.babelstone.co.uk/Unicode/whatisit.html)
"city": "GAYE                          ",

These should have been trimmed: I looked into trimming in Hop, and I could find

public static boolean isSpace(char c) { return c == ' ' || c == '\t' || c == '\r' || c == '\n' || Character.isWhitespace(c); }

in: hop/core/src/main/java/org/apache/hop/core/Const.java

This function is not used, but it seems that the logic in Hop does align (partially) with what I'd expect: not only space itself but also whitespace characters like non breakable spaces should be trimmed.

However: I was also able to find:

` public Object convertDataFromString(
...
// Trimming
StringBuilder strpol;
switch (trimType) {
case IValueMeta.TRIM_TYPE_LEFT:
strpol = new StringBuilder(pol);
while (!strpol.isEmpty() && strpol.charAt(0) == ' ') {
strpol.deleteCharAt(0);
}
pol = strpol.toString();

    break;
  case IValueMeta.TRIM_TYPE_RIGHT:
    strpol = new StringBuilder(pol);
    while (!strpol.isEmpty() && strpol.charAt(strpol.length() - 1) == ' ') {
      strpol.deleteCharAt(strpol.length() - 1);
    }
    pol = strpol.toString();

    break;
  case IValueMeta.TRIM_TYPE_BOTH:
    strpol = new StringBuilder(pol);
    while (!strpol.isEmpty() && strpol.charAt(0) == ' ') {
      strpol.deleteCharAt(0);
    }
    while (!strpol.isEmpty() && strpol.charAt(strpol.length() - 1) == ' ') {
      strpol.deleteCharAt(strpol.length() - 1);
    }
    pol = strpol.toString();
    break;
  default:
    break;
}

`
in hop/core/src/main/java/org/apache/hop/core/row/value/ValueMetaBase.java (lines 4562 until 4593)

And, this code does not use that function in the class Const, but rather a very basic space only trim.

I cannot identify with certainty if this code causes the JSONinput transform to trim it's input without removing non breakable spaces, but at least I've tried to pin this down.

Issue Priority

Priority: 2

Issue Component

Component: Transforms

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions