-->

Are there any good reasons to have a database tabl

2020-06-06 05:04发布

问题:

Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.

Pros: - whether you want to or not, you can now uniquely identify every row in your table which previously you could not do - you can't do sql replication without a primary key on your table

Cons: - an extra 32 bits for each row of your table

Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.

Are there other reasons besides size that every table shouldn't just have an integer identity field?

回答1:

Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.

In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).



回答2:

The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:

CREATE TABLE Authorship (
  author_id INT NOT NULL,
  book_id   INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors (author_id),
  FOREIGN KEY (book_id) REFERENCES Books (book_id)
);

I also prefer a natural key when I design a tagging system:

CREATE TABLE Tags (
  tag VARCHAR(20) PRIMARY KEY
);

CREATE TABLE ArticlesTagged (
  article_id INT NOT NULL,
  tag        VARCHAR(20) NOT NULL,
  PRIMARY KEY (article_id, tag),
  FOREIGN KEY (article_id) REFERENCES Articles (article_id),
  FOREIGN KEY (tag) REFERENCES Tags (tag)
);

This has some advantages over using a surrogate "tag_id" key:

  • You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
  • You prevent two distinct tags from having the exact same spelling.
  • Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.


回答3:

Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.



回答4:

Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.

That being said, I typically use autoincrementing 64bit integers for primary keys.



回答5:

Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)

Your primary key needs to be defined on how you are going to link the table to other tables.



回答6:

I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.

So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.

For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.



回答7:

One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.

In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).



回答8:

To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.

Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value

Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'

In this situation making code a primary key ensures no duplicates, and is more human readable.



回答9:

The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.

Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.

A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.

So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.

If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.



回答10:

A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.



回答11:

I know for sure that some very smart people working for web giants do this. While I don't know why their own reasons, I know 2 cases where PK-less tables make sense:

  • Importing data. The table is temporary. Insertions and whole table scans need to be as fast as possible. Also, we need to accept duplicate records. Later we will clean the data, but the import process needs to work.
  • Analytics in a DBMS. Identifying a row is not useful - if we need to do it, it is not analytics. We just need a non-relational, redundant, horrible blob that looks like a table. We will build summary tables or materialized views by writing proper SQL queries.

Note that these cases have good reasons to be non-relational. But normally your tables should be relational, so... yes, they need a primary key.