Question:
I use the following query to create my table.
1 2 |
create table t1 (url varchar(250) unique); |
Then I insert about 500 urls, twice. I am expecting that the second time I had the URLs that no new entries show up in my table, but instead my count value doubles for:
1 2 |
select count(*) from t1; |
What I want is that when I try and add a url that is already in my table, it is skipped.
Have I declared something in my table deceleration incorrect?
I am using RedShift from AWS.
Sample
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1'; INSERT 0 1 urlenrich=# select * from seed; url | wascrawled | source | date_crawled -----------------------+------------+--------+-------------- http://www.google.com | 0 | 1 | (1 row) urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1'; INSERT 0 1 urlenrich=# select * from seed; url | wascrawled | source | date_crawled -----------------------+------------+--------+-------------- http://www.google.com | 0 | 1 | http://www.google.com | 0 | 1 | (2 rows) |
Output of \d seed
urlenrich=# \d seed
1 2 3 4 5 6 7 8 9 10 |
Table "public.seed" Column | Type | Modifiers --------------+-----------------------------+----------- url | character varying(250) | wascrawled | integer | default 0 source | integer | not null date_crawled | timestamp without time zone | Indexes: "seed_url_key" UNIQUE, btree (url) |
Answer:
Figured out the problem
Amazon RedShift does not enforce constraints…
As explained here
http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html
They said they may get around to changing it at some point.
NEW 11/21/2013
RDS has added support for PostGres, if you need unique and such an postgres rds instance is now the best way to go.