Skip to content

Provide Struct data type support for oracle plugin#659

Open
vanshikaagupta22 wants to merge 7 commits into
data-integrations:developfrom
cloudsufi:oracle-structsupport
Open

Provide Struct data type support for oracle plugin#659
vanshikaagupta22 wants to merge 7 commits into
data-integrations:developfrom
cloudsufi:oracle-structsupport

Conversation

@vanshikaagupta22

@vanshikaagupta22 vanshikaagupta22 commented May 6, 2026

Copy link
Copy Markdown
Contributor

This change adds native support for resolving Oracle STRUCT types (Object Types) into CDAP RECORD schemas. By querying the ALL_TYPE_ATTRS metadata table, the schema builder dynamically processes complex structures with support for up to 4 levels of nesting. Internal attributes are first translated to standard SQL data types via a dedicated mapper before being converted into the final CDAP schema.
When reading the data, the implementation overrides the setField() function, providing a custom implementation to properly extract and map individual custom object attributes according to requirements.

Note: Users will require explicit EXECUTE privileges on the custom Oracle object types to successfully resolve the schema.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for resolving Oracle STRUCT types into CDAP RECORD schemas by querying the ALL_TYPE_ATTRS metadata table. The implementation includes recursive resolution for nested structures and an updated mapping for primitive Oracle types. Feedback highlights several critical issues: the metadata query lacks an OWNER filter which could lead to incorrect schema resolution in multi-schema environments; fully qualified type names containing dots will cause IllegalArgumentException when creating CDAP records; and there are logic errors in the primitive type mapping, specifically regarding Oracle-specific type naming conventions and an invalid comparison between Java class names and SQL type strings.

Comment on lines +177 to +180
String sql = "SELECT * FROM ALL_TYPE_ATTRS WHERE TYPE_NAME = ? ORDER BY ATTR_NO";

try (PreparedStatement stmt = connection.prepareStatement(sql)) {
stmt.setString(1, typeName.substring(typeName.lastIndexOf('.') + 1));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The query to ALL_TYPE_ATTRS should include a filter on the OWNER column. Without it, if multiple schemas define a type with the same name, the query will return attributes for all of them, leading to an incorrect or corrupted schema.

Additionally, the typeName provided by the JDBC driver is often fully qualified (e.g., OWNER.TYPE_NAME). You should parse the owner from the typeName string if present, and fallback to the provided schemaName otherwise.

if (attrSchema != null) {
fields.add(Schema.Field.of(attrName, attrSchema));
} else {
Schema nestedSchema = getStructSchema(connection, schemaName, attrTypeName);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When recursing for nested STRUCT types, the schemaName should be updated to the owner of the attribute's type to ensure the subsequent query filters correctly. This owner can be retrieved from the ATTR_TYPE_OWNER column in the ALL_TYPE_ATTRS result set.

schemaName, typeName));
}

return Schema.recordOf(typeName, fields);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

CDAP Schema.recordOf requires the record name to be a valid identifier (matching [A-Za-z_][A-Za-z0-9_]*). If typeName is fully qualified (e.g., MY_SCHEMA.MY_TYPE), this call will throw an IllegalArgumentException due to the dot. Consider using only the simple type name part for the record name.

    String recordName = typeName.contains(".") ? typeName.substring(typeName.lastIndexOf('.') + 1) : typeName;
    return Schema.recordOf(recordName, fields);

Comment on lines +211 to +244
case "TIMESTAMP WITH TZ":
return isTimestampOldBehavior ? Schema.of(Schema.Type.STRING) : Schema.of(Schema.LogicalType.TIMESTAMP_MICROS);
case "TIMESTAMP WITH LTZ":
return getTimestampLtzSchema();
case "TIMESTAMP":
return Schema.of(Schema.LogicalType.DATETIME);
case "DATE" :
return Schema.of(Schema.LogicalType.DATE);
case "BINARY FLOAT":
case "FLOAT":
return Schema.of(Schema.Type.FLOAT);
case "BINARY DOUBLE":
case "DOUBLE":
return Schema.of(Schema.Type.DOUBLE);
case "BFILE":
case "RAW":
case "LONG RAW":
return Schema.of(Schema.Type.BYTES);
case "INTERVAL DAY TO SECOND":
case "INTERVAL YEAR TO MONTH":
case "VARCHAR2":
case "VARCHAR":
case "CHAR":
case "CLOB":
case "BLOB":
case "LONG":
return Schema.of(Schema.Type.STRING);
case "INTEGER":
return Schema.of(Schema.Type.INT);
case "NUMBER":
case "DECIMAL":
// FLOAT and REAL are returned as java.sql.Types.NUMERIC but with value that is a java.lang.Double
if (Double.class.getTypeName().equals(typeName)) {
return Schema.of(Schema.Type.DOUBLE);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues in mapPrimitiveOracleType:

  1. Type Name Mismatches: The strings used in the switch (e.g., "TIMESTAMP WITH TZ", "BINARY FLOAT") likely do not match the values returned by Oracle in ALL_TYPE_ATTRS.ATTR_TYPE_NAME. For example, Oracle typically uses "TIMESTAMP WITH TIME ZONE", "BINARY_FLOAT", and "BINARY_DOUBLE". Please verify these against the Oracle metadata documentation.
  2. Logic Error: At line 243, Double.class.getTypeName().equals(typeName) compares the Java class name string ("java.lang.Double") with the SQL type name (e.g., "NUMBER"), which will always be false. Since metadata is not available here to check the column class, this check should probably be removed or replaced with a check on the SQL type name if applicable.

# Conflicts:
#	oracle-plugin/src/main/java/io/cdap/plugin/oracle/OracleSourceSchemaReader.java
#	oracle-plugin/src/test/java/io/cdap/plugin/oracle/OracleSchemaReaderTest.java
@vishwasvaidya-cloudsufi vishwasvaidya-cloudsufi force-pushed the oracle-structsupport branch 2 times, most recently from a11f5cc to af67d36 Compare June 15, 2026 10:01
return getBfileBytes(bfile, columnName);
}

public static byte[] getBfileBytes(Object bfile, String columnName) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this static?

}
if (attrValue instanceof BigDecimal) {
BigDecimal bigDecimal = (BigDecimal) attrValue;
if (Schema.LogicalType.DECIMAL.equals(fieldSchema.getLogicalType())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a separate method for handling bigdecimal

*/
private static final Map<String, Integer> DATA_TYPE_MAP = new HashMap<>();
static {
DATA_TYPE_MAP.put("TIMESTAMP WITH LOCAL TZ", TIMESTAMP_LTZ);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does oracle stores this type as TIMESTAMP WITH LOCAL TIMEZONE also, verify once

StructuredRecord.Builder builder = StructuredRecord.builder(schema);

for (int index = 0; index < attributes.length; index++) {
Schema.Field field = fields.get(index);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal columns are resilient to schema/result-set ordering differences because they are matched by name (resultSet.findColumn(field.getName())).
STRUCT attributes rely on Struct.getAttributes() order exactly matching schema.getFields() order. If a user edits the output schema (drops, renames, or reorders an attribute — all legitimate in CDAP), this silently writes values into the wrong fields, or throws IndexOutOfBoundsException when counts differ.

recordBuilder.set(field.getName(), attrValue.toString());
return;
}
if ("oracle.sql.NCLOB".equals(attrClassName)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oracle.sql.NCLOB implements java.sql.Clob, so the instanceof Clob branch (getSubString) is evaluated before the "oracle.sql.NCLOB".equals(className) branch (toString()). The NCLOB-specific branch is unreachable.

Assert.assertEquals(expectedSchemaFields.get(1).getSchema(), actualSchemaFields.get(1).getSchema());
}

@Test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write test cases for all data types and multiple level of structs also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants