I’m implementing DynamoDB in our project. We have to put large data strings into database so we are splitting data into small pieces and inserting multiple rows with only one attribute value changed – part of string. One column (range key) contains a number of part. Inserting and selecting data works perfectly fine for small and large strings. The problem is deleting an item. I read that when you want to delete an item you need to specify primary key for such item (hash key or hash key and range key – depends on table). But what if I want to delete items that have particular value for one of attributes? Do I need to scan (scan, not query) entire table and for each row run delete or batch delete? Or is there some another solution without using two queries? What I’m trying to do is to avoid scanning entire table. I think we will have about 100 – 1000 milions of rows in such table, so scanning will be very slow.
Thanks for help.
There are no way to delete an arbitrary element in DynamoDB. You indeed need to know the
hash_key and the
query does not fit your needs for this (ie. you even do not know the
hash_key), then you’re stuck.
The best would be to re-thing your data modeling. Build a custom index or do ‘lazy delete‘.
To achieve ‘lazy delete’, use a table as a queue of element to delete. Periodically, run an EMR on it to do all the delete in the batch in a single scan operation. It’s really not the best solution but the only way I can think of to avoid re-modeling.
TL;DR: There is no real way but workarounds. I highly recommend that you re-model at least part of your data.