Skip-over/ignore duplicate rows on insert

2020-02-17 09:08发布

I have the following tables:

DataValue

DateStamp    ItemId   Value
----------   ------   -----
2012-05-22   1        6541
2012-05-22   2        12321
2012-05-21   3        32

tmp_holding_DataValue

DateStamp    ItemId   Value
----------   ------   -----
2012-05-22   1        6541
2012-05-22   4        87
2012-05-21   5        234

DateStamp and ItemId are the primary key columns.

I'm doing an insert which runs periodically throughout the day (in a stored procedure):

insert into DataValue(DateStamp, ItemId, Value)
select DateStamp, ItemId, Value from tmp_holding_DataValue;

This moves data from the holding table (tmp_holding_DataValue) across into the main data table (DataValue). The holding table is then truncated.

The problem is that as in the example, the holding table could contain items which already exist in the main table. Since the key will not allow duplicate values the procedure will fail.

One option would be to put a where clause on the insert proc, but the main data table has 10 million+ rows, and this could take a long time.

Is there any other way to get the procedure to just skip-over/ignore the duplicates as it tries to insert?

4条回答
疯言疯语
2楼-- · 2020-02-17 09:41
INSERT dbo.DataValue(DateStamp, ItemId, Value)
SELECT DateStamp, ItemId, Value 
FROM dbo.tmp_holding_DataValue AS t
WHERE NOT EXISTS (SELECT 1 FROM dbo.DataValue AS d
WHERE DateStamp = t.DateStamp
AND ItemId = t.ItemId);
查看更多
老娘就宠你
3楼-- · 2020-02-17 09:47

In SQL Server 2008+:

MERGE
INTO    dataValue dv
USING   tmp_holding_DataValue t
ON      t.dateStamp = dv.dateStamp
        AND t.itemId = dv.itemId
WHEN NOT MATCHED THEN
INSERT  (dateStamp, itemId, value)
VALUES  (dateStamp, itemId, value)
/*
WHEN MATCHED THEN
UPDATE
        value = t.value
*/
-- Uncomment above to rewrite duplicates rather than ignore them
查看更多
4楼-- · 2020-02-17 09:57

I ran into a similar requirement that ended up throwing the same duplicate key error, and then the idea was to select multiple columns that are distinct (Primary) while returning also other columns, check:

INSERT INTO DataValue(DateStamp, ItemId, Value)
SELECT DISTINCT DateStamp, ItemId, MAX(Value) AS Value
FROM tmp_holding_DataValue
GROUP BY DateStamp, ItemId

In fact, the goal could be accomplished without Distinct as well since the aggregate function MAX will pick a single value.

查看更多
冷血范
5楼-- · 2020-02-17 10:00

You could assign the PK as Ignore Duplicate Key = Yes. Then it will just give a warning duplicate key ignored and continue. I am not guessing. I tested this.

What I found is that I cannot do this is SMSS. Have to drop and recreate the index via script. But you can right click on the index, select drop and recreate, and then just change Ignore Duplicate Key = Yes. For me SMSS did not immediately show the change.

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[PKallowDup]') AND name = N'PK_PKallowDup')
ALTER TABLE [dbo].[PKallowDup] DROP CONSTRAINT [PK_PKallowDup]
GO

USE [test]
GO

/****** Object:  Index [PK_PKallowDup]    Script Date: 05/22/2012 10:23:13 ******/
ALTER TABLE [dbo].[PKallowDup] ADD  CONSTRAINT [PK_PKallowDup] PRIMARY KEY CLUSTERED 
(
    [PK] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = ON, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

Or I think you could use an outer join

INSERT dbo.DataValue(DateStamp, ItemId, Value)
SELECT t.DateStamp, t.ItemId, t.Value 
  FROM dbo.tmp_holding_DataValue AS t 
  left join dbo.DataValue AS d
    on d.DateStamp = t.DateStamp
   AND d.ItemId = t.ItemId
 WHERE d.DateStamp is null 
   and d.ItemId    in null
查看更多
登录 后发表回答