Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: '_id' CRITICAL ('Exception writing records', KeyError('_id')) #48

Open
waleedarshad-vf opened this issue Jul 6, 2020 · 6 comments

Comments

@waleedarshad-vf
Copy link

waleedarshad-vf commented Jul 6, 2020

Mongo to redshift. I got this error what can be the possible reasons

RROR Exception writing records
Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
    {'version': target_table_version})
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
    for table_batch in denest.to_table_batches(schema, key_properties, records):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
    key_properties)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
    key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'
CRITICAL ('Exception writing records', KeyError('_id'))
Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
    {'version': target_table_version})
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 808, in write_batch_helper
    for table_batch in denest.to_table_batches(schema, key_properties, records):
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 21, in to_table_batches
    key_properties)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/denest.py", line 47, in _get_streamed_table_schemas
    key_prop_schemas[key] = schema['properties'][key]
KeyError: '_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/macbookpro/.virtualenvs/target-redshift/bin/target-redshift", line 10, in <module>
    sys.exit(cli())
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 55, in cli
    main(args.config)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 49, in main
    target_tools.main(redshift_target)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 28, in main
    stream_to_target(input_stream, target, config=config)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 77, in stream_to_target
    raise e
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 64, in stream_to_target
    line
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 152, in _line_handler
    state_tracker.flush_stream(line_data['stream'])
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 41, in flush_stream
    self._write_batch_and_update_watermarks(stream)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/stream_tracker.py", line 67, in _write_batch_and_update_watermarks
    self.target.write_batch(stream_buffer)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 72, in write_batch
    return PostgresTarget.write_batch(self, nullable_stream_buffer)
  File "/Users/macbookpro/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 309, in write_batch
    raise PostgresError(message, ex)
target_postgres.exceptions.PostgresError: ('Exception writing records', KeyError('_id'))
@waleedarshad-vf
Copy link
Author

@dmosorast again your help is need

@waleedarshad-vf
Copy link
Author

@bonysimon007
Copy link

@waleedarshad-vf Did you find the solution?

@waleedarshad-vf
Copy link
Author

@bonysimon007 No, We moved to stitchdata which is the enterprised version and worked

@JackMcKew
Copy link

From the looks of this article, my best guess is it's probably stemming from the schema not being specified in the catalog.json

https://titanwolf.org/Network/Articles/Article?AID=b4eb664e-70f4-430c-b568-bdd0d32bcf35#gsc.tab=0

@JackMcKew
Copy link

I managed to get around this by using https://github.com/transferwise/pipelinewise-tap-mongodb and https://github.com/transferwise/pipelinewise-target-postgres

I wrote some Python code to edit the catalog that comes out after tap-mongodb --config config/singer/tap-mongodb-config.json --discover > catalog.json

import json
import pathlib
input_file = 'catalog.json'
output_file = 'mongo/catalog.json'

with open(input_file) as f:
    data = json.load(f)
    for stream in data['streams']:
            stream_metadata = stream['metadata'][0]
            stream_metadata['metadata']['selected'] = True
            stream_metadata['metadata']['replication-method'] = 'FULL_TABLE'

            # https://github.com/singer-io/getting-started/blob/master/docs/DISCOVERY_MODE.md
            # https://github.com/singer-io/tap-mongodb/issues/48

            stream['schema']['properties'] = {
                "_id": {
                        "type": [
                            "string",
                            "null"
                        ]
                    },
                "document": {
                        "type": [
                            "object",
                            "array",
                            "string",
                            "null"
                        ]
                    }
            }

    pathlib.Path(output_file).parent.mkdir(parents=True, exist_ok=True)
    with open(output_file, 'w') as json_file:
        json.dump(data, json_file)

Which will sync the entire document into a single JSONB column in PostgreSQL.

If you are then looking to expand the JSONB column into a full table in PostgreSQL, this can be done in SQL statements. Although if using camelCase for document keys in MongoDB then this causes funny behaviour due to snake_case in PostgreSQL. I was able to define the function given here: https://rudism.com/converting-json-keys-to-snake-case-in-postgresql/.

CREATE TEMPORARY TABLE snake_case (document jsonb);

INSERT INTO snake_case SELECT json_keys_to_snake_case FROM <SCHEMA_NAME>.<TABLE_NAME> m, json_keys_to_snake_case(m.document);

INSERT INTO <SCHEMA_NAME>.<TABLE_NAME> (
    _id,
    document_key1,
    document_key2
)
SELECT 
    t."_id",
    t."document_key1",
    t."document_key2"
FROM
	snake_case m
	CROSS JOIN LATERAL
	jsonb_to_record(m.document) as t(
        _id TEXT,
        document_key1 <DATA_TYPE>,
        document_key2 <DATA_TYPE>
        );

DROP TABLE IF EXISTS snake_case;

Where <SCHEMA_NAME> is the schema the target-postgres sent the data to, <TABLE_NAME> is the table the data is in and document_key1 and document_key2 are example keys from the MongoDB collection

parisholley added a commit to parisholley/tap-mongodb that referenced this issue Oct 21, 2021
parisholley added a commit to parisholley/tap-mongodb that referenced this issue Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants