Jump to content

Talk:Join (SQL)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

New comments at the bottom of the page, please.

SemiJoins

[edit]

what about semijoins

From the article:
A semi join is an efficient join method where first the join attributed of one table are collected and reported to the second one. In was first reported in 1981. It can be improved with a Bloom-Filter (hashing).
Yeah, it could do with a lot more detail. Neilc 13:59, 16 Apr 2005 (UTC)
The problem with the current description for semi joins is that it doesn't say anything at all: How is something reported? What means "collect" in the above sentence? And how exactly would the improvement with hashing work? I think that we either expand this to something comprehensible or remove it. --Stolze 21:50, 14 April 2007 (UTC)[reply]

I removed the semi-join stuff from the article because it didn't say anything meaningful. --Stolze 11:52, 8 May 2007 (UTC)[reply]

Images for join types

[edit]

It would be good if someone would add images to help people visualize how each type of join is evaluated. I'm thinking that there should be drawings with the overlapping rings, highlighting the solution set. I would be willing to create them, however, I do not have experience loading images into Wikipedia. DBBell 21:08, 14 December 2005 (UTC)[reply]

I hope you don't mix this up with set operations like UNION, EXCEPT, and INTERSECT? If not, you could draw the images and we will incorporate them. --Stolze 12:19, 22 April 2007 (UTC)[reply]

A visual image might be to consider customers and orders. A left outer join would provide a set of all customers, whether or not the customer had any orders, PLUS the order records for those customers that had orders. It is possible that a customer might have no orders, so the business might want to delete him or her from the database, or send them a catalog.

A right outer join on the orders would be useful for that business to correct possible errors, since an order without a customer to pay for it would be a problem. This type of join would produce a set of records for all orders whether or not there was a customer related to it, PLUS the customer that ‘owns’ the order, so to speak.

In either type of outer join, it is only necessary to check for nulls in the related table to find ONLY those records that DO NOT have a related record in the related table.

Hope this helps ?Foresight2008 (talk) 03:52, 26 July 2008 (UTC)[reply]

here is an image: http://stackoverflow.com/a/4715847/4880924 --Kangaroo5Aust (talk) 00:35, 22 November 2015 (UTC)[reply]

Spanish language wikipedia has pretty images [1]. Donalus (talk) 13:23, 13 September 2017 (UTC)[reply]

I added some of the Venn Diagrams from the Spanish language page. They also have an image that is similar to the one posted by Kangaroo5Aust, but it needs to be translated into English.[2] Donalus (talk) 15:55, 13 September 2017 (UTC)[reply]

Surely these Venn diagrams are incorrect. The sets are disjoint and so there can be no intersection. For example when A is the set of employees and B is the set of Departments then the set of elements x where x is a member of A and x is also a member of B is {}. If A is the set of rows in table A and B is the set of rows from table B then the intersection of A with B is the {} since no row appears in both tables. Joins create tuples that is a concatenation of an A row with a B row. It would then be appropriate to consider the set of tuples. In the diagrams shown, A_tuple would be the set of A-B tuples where the rows from A which have a matching row in B as well as A rows which have no B match being substituted with null. B_tuple would be the set of complementary rows based on rows from B. The intersection of A_tuple and B_tuple would be joined rows with matching non-null values from both A and B. STACS-BM (talk) 22:10, 9 February 2024 (UTC)[reply]

I removed the image inner join, since it illustrated the concept with just an intersection. While the intersection of the sets being joined is important, it's not what an inner join is. Simply using an illustration of an intersection makes a false equivalence with sets, while values in a table do not have to be unique. Ivirshup (talk) 07:36, 22 December 2020 (UTC)[reply]

References

Left Outer Join

[edit]

Just a quick question. Was it neccasary to have a "select _distinct_ *" in the code example for left outer joins?

It is not necessary. DISTINCT is a concept orthogonal to joins. Therefore, I removed this keyword.

For those of us who work daily with SQl but have to look up the definition of 'orthogonal', "select distinct" should be assumed to be a crutch to fix a problem that may be solved more efficiently by other changes. Some experts with more experience than me recommend searching for any code that uses 'select distinct' as a quick means of identifying SQL that should be worked on to improve system efficiency. — Preceding unsigned comment added by 63.234.230.254 (talk) 16:04, 11 August 2014 (UTC)[reply]

Equi-Join Columns

[edit]

I believe this might be an error. From the article:

The resulting joined table contains two columns named DepartmentID, one from table Employee and one from table Department.

If I'm not mistaken, this would be incorrect behavior under the SQL standard. I believe this is how it worked in earlier versions of MySQL, and possibly other DBMSs, but as per standard MySQL now only includes one instance of the join column.

Also, under Natural Joins, the order of the columns given may be incorrect (once again, possibly this is the actual order in some implementations). I believe the spec calls for the join columns to come first, and then the other columns of each table in turn.

I am basing this on the [MySQL docs], the part which begins:

The columns of a NATURAL join or a USING join may be different from previously. Specifically, redundant output columns no longer appear, and the order of columns for SELECT * expansion may be different from before...

Won't edit, as I don't think implementation docs are necessarily a good source for the standard, and I may be misunderstanding. DocAvid (talk) 21:03, 10 January 2008 (UTC)[reply]

You are correct, according to ISO/IEC 9075-2:2003 natural joins do suppress duplicate columns. This is from Section 7.7 (Syntax rules):
"7) If there is at least one corresponding join column, then let SLCC be a <select list> 
of <derived column>s of the form

COALESCE ( TA.C, TB.C ) AS C

for every column C that is a corresponding join column, taken in order of their
ordinal positions in RT1."

It's stating that the resulting duplicate named column in the select list is the coalesced result from the columns of the same name in both table A and table B.SqlPac (talk) 05:51, 6 February 2008 (UTC)[reply]

It should be clearer what is meant by "A" and "B" in the article, i.e. which part of the sample query is the A table and which is the B table. (better yet, call the tables A and B to make it completely clear. —Preceding unsigned comment added by 70.66.236.249 (talk) 22:26, 19 March 2008 (UTC) Since SQL returns sets by default, the results would be DISTINCT because the duplicates cannot exist in a proper SET, as I understand it. Foresight2008 (talk) 03:42, 26 July 2008 (UTC) CL[reply]

The SQL standard deviates from the relational model in several ways even if it is supposed to be the standard language of relational databases, and all databases I know of deviate from the SQL standard. All DBMSs I know of will NOT allow you to create a table where multiple columns have the same name like "CREATE TABLE bad (col INT, col INT);" but most WILL let you run a query that results in multiple columns with the same name like "SELECT 1 AS col, 2 AS col;". More info at Relational_model#SQL_and_the_relational_model. 139.80.112.182 (talk) 21:38, 11 December 2008 (UTC)[reply]

Equi Joins

[edit]

From the article:

"SQL:2003 does not have a specific syntax to express equi-joins, but some database engines provide a shorthand syntax: for example, MySQL and PostgreSQL support USING(DepartmentID) in addition to the ON ... syntax."

This is not true. ISO/IEC 9075-2:2003 (E) defines the using syntax in 7.7, Joined tables:

<join specification> ::= <join condition> | <named columns join> <join condition> ::= ON <search condition> <named columns join> ::= USING <left paren> <join column list> <right paren>

This section should not have another example than the one used throughout the article. It is confusing to see a natural join demonstrated twice with two similar but different datasets. — Preceding unsigned comment added by 68.132.255.43 (talk) 15:39, 5 June 2017 (UTC)[reply]

Implicit outer joins

[edit]

Does this article mean that SQL 2003 doesn't support implicit left/right joins using *= and =* syntax? 71.4.124.241 (talk) 18:10, 14 April 2008 (UTC)[reply]

ANSI/ISO SQL has never had that implicit outer join syntax. 155.4.126.237 (talk) 07:47, 24 May 2017 (UTC)[reply]

Full outer joins by left joins

[edit]

I believe the example of how to construct a full outer join using only left joins is incorrect (the first one using outer left and outer right joins seems good). There is nothing ensuring that the two tables that are being unioned have the same column ordering. The correct fix would be to ensure that the second left join has the same column ordering as the first left join by instead of doing doing

UNION
SELECT *
FROM   department

doing

UNION
SELECT employee.*, department.*
FROM   department

You probably need a similar fix for the case of only using right outer joins. I verified that the command as shown does not work as stated in SQLite (version 3.4). Note that I am not a SQL expert though.

Foreign key error

[edit]

The Sample tables section mentions the use of a foreign key:

"In the following tables, department.DepartmentID is the primary key, while employee.DepartmentID is a foreign key."

This is incorrect given the definition of a foreign key [1], as it mentions: "The values in one row of the referencing columns must occur in a single row in the referenced table. Thus, a row in the referencing table cannot contain values that don't exist in the referenced table (except potentially NULL)."

Unfortunately, this seems to be the case, as the article itself says: "On the other hand, the employee 'Jasper' has no link to any currently valid Department in the Department Table." and as such this would violate the referential integrity constraint.

The examples are excellent, but I think it shouldn't be mentioned that it is a foreign key. I don't think it matters to the example anyway, because the presence of a foreign key is not required.

[1] http://en.wikipedia.org/wiki/Foreign_key —Preceding unsigned comment added by Svenmathijssen (talkcontribs) 10:04, 19 May 2008 (UTC)[reply]

Yeah. Department.departmentID is not a real Foreign Key, so the article should be changed. Aaron Schulz 14:00, 19 May 2008 (UTC)[reply]
Foreign keys makes no difference when joining. You declare foreign keys to ensure data consistency (when INSERT/UPDATE/DELETE.) No need to confuse less experienced users here, talking about foreign keys. 155.4.126.237 (talk) 07:48, 24 May 2017 (UTC)[reply]

Computation of Cartesian product

[edit]

From the article:

"The SQL-engine computes the Cartesian product of all records in the tables. That is, processing combines each record in table A with every record in table B. Only those records in the joined table that satisfy the join predicate remain."

I seriously doubt the SQL engine will explicitly compute the Cartesian product for such a query, especially in the presence of a clustered primary key on Department.DepartmentID further optimized by a sort order on Employee.DepartmentID (as the example suggests). The result can be thought of as a Cartesian product with the unnecessary tuples removed, but the engine will benefit from using the index. —Preceding unsigned comment added by Svenmathijssen (talkcontribs) 10:21, 19 May 2008 (UTC)[reply]

Yeah, it really would use the index and scan while joining with each row, otherwise it would be hella slow. Aaron Schulz 13:56, 19 May 2008 (UTC)[reply]

Full outer join emulations

[edit]

All emulations provided should be corrected to use UNION ALL instead of UNION as it is now. The reason:

1) queries work in these special cases because returned rows are unique both for left joins and right joins. However in case of duplicate rows these emulations would give incorrect result.

2) queries with UNION (rather than UNION ALL) need more database resources because database must filter out duplicates.

Example from the data as provided in article:

SELECT DepartmentName from employee
FULL OUTER JOIN Department 
ON employee.DepartmentID = department.DepartmentID;

DEPARTMENTNAME
---------------
Sales
Engineering
Engineering
Clerical
Clerical
Marketing
Marketing


8 rows selected.

Union gives us only unique rows:

SELECT DepartmentName from employee
LEFT OUTER JOIN Department
ON employee.DepartmentID = department.DepartmentID
UNION
SELECT DepartmentName from employee
RIGHT OUTER JOIN Department
ON employee.DepartmentID = department.DepartmentID
WHERE employee.DepartmentID IS NULL;

DEPARTMENTNAME
---------------
Clerical
Engineering
Marketing
Sales


5 rows selected.

Union all gives us correct result:

SELECT DepartmentName from employee
LEFT OUTER JOIN Department
ON employee.DepartmentID = department.DepartmentID
UNION ALL
SELECT DepartmentName from employee
RIGHT OUTER JOIN Department
ON employee.DepartmentID = department.DepartmentID
WHERE employee.DepartmentID IS NULL;

DEPARTMENTNAME
---------------
Sales
Engineering
Engineering
Clerical
Clerical
Marketing
Marketing


8 rows selected.

Gintsp (talk) 16:39, 28 July 2008 (UTC)[reply]



About natural joins

[edit]

User Mckaysalisbury under history page said: "natural joins are superior. Relational algebra doesn't even have a "join on", if you really think natural is prone to errors, that means you're using it wrong. Maybe we should axe the entire section?"

I completely disagree with this statement as well as I think it should be kept there. Writing SQL statements is not the same as dealing with Relational algebra - even more - many people writing SQL statements have never thought of Relational algebra. OK probably that's bad, but that's real life :) Also Natural joins ARE really evil with many possible side effects as I've explained it in my blog post and should be used with big caution. Also one of the SQL standard SQL-89 and SQL-92 authors Joe Celko said "Frankly, NATURAL JOIN was a bad idea. Any change to a table can suddenly add or remove a column in the join on the fly. You do not know what you have done until you see both table declarations. Much better to have explicit column names."

Gintsp (talk) 13:08, 15 September 2008 (UTC)[reply]

Please, if you are going to add a section on a type of join, please show the SQL code. This is, after all, a page on SQL joins, not the concept of "join" in general (if such a thing exists). Without any sample code, I have no clue how to implement this. Thank you. — Preceding unsigned comment added by 146.142.1.10 (talk) 21:59, 5 November 2015 (UTC)[reply]

MATHS?

[edit]

I skimmed through and found no maths.

Why didn't anyone explain the joins from the correct relational point of view, which is formal/mathematical? —Preceding unsigned comment added by 193.136.19.11 (talk) 09:55, 2 October 2008 (UTC)[reply]

Response: for the theory behind SQL, see relational algebra articles. See Join_(relational_algebra) —Preceding unsigned comment added by 86.173.64.233 (talk) 14:54, 11 December 2008 (UTC)[reply]

Don't you think It would be a good idea to add this link to "see also" section in article? —Preceding unsigned comment added by 85.232.225.130 (talk) 17:54, 13 November 2009 (UTC)[reply]

Sentence doesn't make sense?

[edit]

This sentence under "Natural Joins" doesn't make sense to me. I'm not sure what the intended meaning is, it seems like two sentences have run together (see emphasis):

An error-message such as "ORA-25155: column used in NATURAL join cannot have qualifier" is an error to help prevent or reduce the problems that could occur may encourage checking and precise specification of the columns named in the query, and can also help in providing compile time checking (instead of errors in query).

Xurizaemon (talk) 22:51, 12 March 2009 (UTC)[reply]

I Disagree with Reviewer's View that the Content Should Be Simplified

[edit]

This article is very good. There is no end to the abundance of dummied-down, impractically simplified introductions to SQL on the net. And as far as I can tell this is exactly mid-level as far as SQL is concerned. The top of this page implies this is what it is aiming for. If you want advanced, follow the path of the abstract math that underlies the language. Nothing about this is advanced. If you work in SQL you run across other's SQL queries that are loaded with these kinds of joins. As an example, I am writing this addition because I was spinning my wheels on the FULL OUTER JOIN construct on no less than three SQL interpreters.

It is hard to find a reference that presents joins as comprehensively and concisely as this article. If you search the net for SQL tutorials, you will find few this clear and concise on this topic. I think it would be a real shame to dummy this article down (simplify).

--Ttreker (talk) 11:29, 25 April 2010 (UTC)[reply]

I also believe this is a very good article -- it's at the perfect level for me. — Preceding unsigned comment added by 174.6.76.165 (talk) 19:30, 25 September 2013 (UTC)[reply]

Additional Concern on Natural Joins from the Object Oriented Perspective

[edit]

Natural join usage encourages naming conventions that run counter to the clean implementation of structured object oriented relational code wrappers.

If you are working in a relational system with an object-oriented wrapper to your relational objects, natural join naming conventions hinder solid polymorphic design. From an object perspective, you don't want primary key id fields named something different on each object. You want all of your objects responding to the same message (having the same method name), e.g. .id. This way if you have a heterogeneous collection of arbitrary objects, you know that you can send any of them the .id message and get it's primary integer key back. In other words, it doesn't matter what table the record is from, you always know .id will return the primary key. An example of just one of the many benefits of this approach is that if you combine this with a single sequence generator (generating unique key field values) for all of your id fields on all your relational tables, then it becomes easy to reference any arbitrary relational object by its id value alone, without reference to its table name even (building an application to achieve this is trivial). In other words, it is as if each object had its own UPC code. How ineffective would it be to reuse UPCs for each class of object in a grocery or department store? It should be clear to most people that this principle transfers naturally to software objects, at least where they map to real world objects.

From this perspective it is also redundant, for example, to have a department record with a field that calls it department_id or the like. With all deference to the survivors of the Monty Python troop, this is an example of "the department of redundancy department."
The OO approach:

department.id

The Monty Python approach:

department.department_id

The later naming convention is encouraged by the desire to employ natural joins.

In a data model supporting an OO wrapper, the convention of naming foreign key fields for the table that they reference is powerful. In this convention table names are *always* singular. This allows a foreign key field to simply be named for the table it references. The field name itself is singular or plural depending on the nature of the relationship between the tables, i.e. does it reference one, or many related objects.

With this paradigm, relational code wrapper generators can easily be written that read the database schema dynamically and generate object wrappers for relational records, tables, and result sets. If one has followed the conventions described above, the object code becomes immensely readable, and easy to write in an object oriented fashion. With such a consistent naming convention, the code wrappers written to implement joins (and other relational features) can be written on one superclass from which classes for each specific table can inherit. Adherence to such naming conventions allows, for example, for hiding the structure of intermediate tables used in many-to-many relationships. The underlying SQL necessary to support these structures can be be encapsulated in the superclass, which leverages the consistent naming conventions to achieve this.

I could gone on and on, actually. The power of this abstracted, polymorphic, and encapsulated approach is immense. It leads to a situation where the underlying relational structure melds seamlessly into the encapsulating code. This approach lends itself well to applications mapping software objects to the real world, and makes it easy for even novice developers to knock out some powerful applications without worrying about the mundane details of the structure of their queries. Many common queries can be auto generated dynamically in the superclass in this paradigm. More complex queries can be dealt with using naturalistic construction of "predicate" objects insulated from the nuances of SQL. This facilitates thinking problems through from the perspective of the business case being solved and not the underlying SQL. At the same time, all the power of this underlying SQL is fully leveraged.

Natural join field naming requirements are highly counter to this paradigm. They simply make a mess of it.

There are certainly situations where the above-outlined approach may be more of a hindrance than a help. There are no absolute rules in software development. But in applications that map real world objects to software objects, it is an extremely powerful approach. These kinds of systems are extremely common in our modern day world.

SQL developers may quickly zero in on the downside that there are a lot of duplicate field names across tables which makes ad hoc SQL queries using "select * from tab1, tab2" prone to confusion. However, in a properly structured system, these queries are not required as often as it may seem that they might be. And for ad hoc queries, many SQL user interfaces automatically disambiguate duplicate column names on the fly without user intervention. This is a common problem in SQL without consideration of the OO wrapper structure.

So how might someone make this point in this article in a succinct manner. Here is a shot:

"The use of Natural Joins in SQL encourages field naming conventions which present complications for object oriented wrappers. There is a fundamentally inherent asymmetry between primary fields and foreign key fields which suggests that they should not be named the same. As an example, from an object-oriented standpoint, all objects should respond polymorphically to an ".id" message by returning their primary key. Foreign key fields should be named for the table they address. This is counter to the implied requirement by Natural Joins that a primary key field and a foreign key field be named the same."

--Ttreker (talk) 03:48, 26 April 2010 (UTC)[reply]

I enjoyed your article, Treker, very informative. I disagree with your conclusion, however, as the wiki article is not about the OO paradigm or even about keys. Instead, i suggest something like this:

"Note the use of either JOIN--USING or NATURAL JOIN in SQL requires the field naming convention used above. An alternative naming convention, such as used in these table creations,
CREATE TABLE department
(
 ID INT,
 Name VARCHAR(20)
);
CREATE TABLE employee
(
 LastName VARCHAR(20),
 Department INT
);

and which is perhaps just as popular, wouldn't work with JOIN--USING or NATURAL JOIN queries. It would require the ON form of query:

SELECT *
FROM employee JOIN department
  ON employee.Department = department.ID;

"

Martin F — Preceding unsigned comment added by 174.6.76.165 (talk) 20:10, 25 September 2013 (UTC)[reply]

Making content accessible to a wider audience is not tantamount to "dumbing down."

[edit]

I can't tell whom that comment was for --possibly because what was being commented on was removed. But why see it as a zero-sum game? The art of writing is making the article accessible to a wider audience without making it less accessible to others. A few of the Wall Street Journal's Economics editors are masters of this: In a detailed article about the economy they will insert a concise elementary sentence --such as one would see in a primer-- clarifying what the "Fed" is, or what the inter-bank rate is, etc. This opens the article to a MUCH wider audience, yet it never registers with expert readers, who just fly over it without even noticing. It's not a distraction for experts.

Regarding: "Nothing about this is advanced. If you work in SQL..." Why write this article only for those who work with SQL? Such people have books on the subject, often acquired in school or other training. What do they need a Wiki article for?

Note that one of the reviewers (above) is a patent attorney. Who am I to say that a dairy farmer would not find this topic useful or enlightening? He somehow might even come up with a cure for cancer after reading this and some other articles? :^) Who knows?

CousinJohn (talk) 20:04, 26 May 2010 (UTC)[reply]

Are the insert examples really necessary?

[edit]

I think the sql examples are mostly useful, but I wonder if the section demonstrating inserting data is really necessary. The article already gives visual tables of the data to be used in the examples. I'm not certain how showing how to insert this data really adds anything. The article is about joins after all, not data insertion. Removing them would make the article flow better I think.

MaxMahem (talk) 03:18, 19 November 2010 (UTC)[reply]
Feel free to skip that section to preserve your flow. Cutting and pasting the code saves time and errors in constructing the sample tables. —Preceding unsigned comment added by 69.171.176.210 (talk) 04:33, 4 January 2011 (UTC)[reply]

First mention of join

[edit]

I have changed the type in the first mention of "join", as required by MOS:LEAD. (Also, as a reader, I found the Courier type confusing.) In my opinion, the article is mostly about the concept of a "join", not the syntax of a particular language. (I believe that joins are used in other languages than SQL.) It might be better to move the article to something like "Database join" and leave "Join (SQL)" as a redirect. If you do, remember the disamb page. --RoyGoldsmith (talk) 16:20, 24 January 2011 (UTC)[reply]

Motivation

[edit]

It would be nice if there were an explanation for the typical motivation for using joins as opposed to a single table. I'm a newbie with respect to database design, so this was the (missing) information I was seeking when looking up the article.

Database normalization. Jpatokal (talk) 02:34, 23 May 2011 (UTC)[reply]

Right outer join

[edit]

The current text for right outer join states that they're rarely used because they can be replaced by left outer joins, but it doesn't provide any examples of why they might ever be used.

One situation I have occasionally found use for them is when I have a single table needing to be left outer joined to two inner joined tables. In this situation (assuming that the joins are all simple ID matches) you cannot simply left join table1 to table2 and then join table2 to table3, as any results for table1 which had no matches in table2 will be removed when the join from table1+table2 to table3 fails - the most common way of resolving this that I've seen is to left join table3 to table2, which works in the sense that table1+table2 results with null information for table2 will also receive null information for table3, but it leaves open the possibility of null results being attached for table3 to table1+table2 results which do have a table2 component.

In my mind it is a superior solution to first join table2 to table3, and then to right outer join the table2+table3 combination to table1. Another option is to join table2 to table3 in a subquery and then left join the results of that subquery to table1, but this makes the query more complicated than necessary. --210.11.199.87 (talk) 05:05, 24 March 2011 (UTC)[reply]

Self join example

[edit]

I think the example given in the Self Join section -- finding pairings of employees in the same country -- is too unreal to be helpful, and request that it be replaced with the fairly standard parent-child example of listing all employees along with the name of their manager. Matthew C. Clarke 07:33, 20 May 2011 (UTC)[reply]

Joining more than two tables - Comments on Inner Joins

[edit]

I wonder if this article should have a section about joining three or more tables. I seem to remember that there's a faster way to do this that doesn't come from a Cartesian product of the three tables, but from the (smaller) Cartesian products of two of the tables, the results of which are processed in another Cartesian product with the third table. This speeds things up. Does anybody know any more about this? —MiguelM (talk) 03:10, 7 July 2011 (UTC)[reply]

Most real-world queries involve more than two tables. Have never even imagined a Cartesian product of three tables produced in a single operation. Don't think that a three-way inner join exists except in theory. The execution cost could be enormous compared to other methods. Three tables inner joined would probably be processed in order, with the inner joined results from the first pair inner joined to the last table. The execution plans produced by database managers would bear this out (my experience is with MSSQL). The actual order of the inner joins may be subject to change by the database's optimizer based on statistics. If one of the inner joins is more selective (reduces the record set to a relatively small number) than the other, the optimizer may reverse the order to reduce cost. This would not be obvious unless you review the actual execution plan but the results should be the same.

I work every day creating complex SQL queries in a reporting database and don't even bother to count the number of tables joined together in a query. A dozen would be typical, 20-30 in a single view or stored procedure is not surprising. Tables range from a handful of rows to 3-billion, four fields per table to 120 or more, multi-gigabyte sizes for many tables.

When querying a reporting database I've never been concerned about cartesian products (except when making a mistake in the join fields). Using MSSQL and no doubt other databases, the optimizer takes care of the details.

This article is very misleading in that it encourages inner joins at the exclusion of others. Who says they are the most common or default? The wording says "for applications". I have never read that anywhere else and have spent many hours studying books written by leading experts.

My method for years is to typically use left outer joins (MSSQL) and inner joins rarely, only where it has been proven by tests that NULL values won't eliminate data/returned rows along with deleting related information from other tables. My database doesn't encforce referential integrity, and that design decision is not something I control. Right outer and Full outer joins are pretty rare but have come up. Use UNION of queries many many times.

Inner Joins are NOT always faster than Left outer joins. Any complex query has many other considerations than just slamming in inner joins everywhere. Have seen inner joins kill a server as it matched up two tables with tens of millions of records before filtering the data and returning a list of fields. Speed of processing isn't the only consideration.

IF YOU USE AN INNER JOIN AND DON'T REALIZE THAT A TABLE HAS SOME NULL VALUES ON THE JOIN KEYS THE DATABASE WILL SILENTLY DISCARD BOTH SIDES OF THE JOIN! NO ERROR MESSAGE. YOU WILL GET ENOUGH DATA TO THINK IT'S CORRECT BUT WILL BE WRONG IN A SMALL WAY EVERY TIME!! OR IF THERE ARE A LOT OF NULLS YOU WILL BE WRONG IN A BIG WAY!!!

When joining a large number of tables you need to know your data. Table record sizes, expected number of rows selected, whether indexes are available that cover all of the fields that you want returned so the database doesn't have to plow through the entire table. Then it can be structured using a variety of methods to 'get the number of records down to the smallest number as soon as possible". After that, everything is efficient or if it isn't it doesn't matter.

Nested queries, correlated subqueries, derived tables, common table expressions, functions, temp tables, custom indexed tables, multiple sequenced queries in a stored procedure are ways to structure complex queries. They all belong in your bag of tricks, along with the knowledge of where they are best used. Learn as you go, figure out how to use the tools to get cost figures and record time and cost results as you make changes. The query will get faster (or you will backtrack so always make multiple backups).

Everyone seems to want the faster code, maybe to support transaction processing which typically involves a small number of tables, limited number of fields, updating minimal number of rows in a controlled process. This isn't NASCAR. Speed is important but accuracy is more important (still efficiency IS important).

Natural Join section is unclear

[edit]

I added a clarifying sentence to the section on Natural Joins, explaining why they're dangerous. Without it, it's not at all clear what the difference is between natural joins and equijoins, and it made me want to avoid both of them. —MiguelM (talk) 03:46, 7 July 2011 (UTC)[reply]

implicit "OUTER" for LEFT JOIN?

[edit]
  • Is LEFT OUTER JOIN the same as LEFT JOIN in standard SQL?
  • Fulfill MySQL, PostgreSQL etc. this standard?
  • Why do exist different symbols for "LEFT JOIN" (⋉) and "LEFT OUTER JOIN" (⟕)?

I am confused. :-( --RokerHRO (talk) 12:10, 13 July 2011 (UTC)[reply]

The OUTER keyword is optional. I.e. LEFT JOIN means LEFT OUTER JOIN, and RIGHT JOIN means RIGHT OUTER JOIN. (In both cases there are an "outer" table.) — Preceding unsigned comment added by 192.71.97.254 (talk) 09:17, 19 October 2011 (UTC)[reply]

I found this article to be confusing

[edit]

Why are there two natural join sections? Why does it not use the example introduced at the top consistently? — Preceding unsigned comment added by 84.154.15.166 (talk) 19:14, 7 December 2014 (UTC)[reply]

Please add a section about "LATERAL JOIN"

[edit]

See https://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3#LATERAL_JOIN — Preceding unsigned comment added by 89.246.199.104 (talk) 11:31, 30 May 2015 (UTC)[reply]

Natural join is described in two places now

[edit]

Both by itself, and inside Inner Join. The two should be merged. ★NealMcB★ (talk) 00:46, 23 December 2015 (UTC)[reply]

NATURAL JOIN is just a special case, of either INNER JOIN or OUTER JOIN. Keep things together, i.e. no main section for NATURAL JOIN. Also JOIN USING syntax should be placed under INNER and OUTER JOIN. 155.4.126.237 (talk) 07:48, 24 May 2017 (UTC)[reply]

Natural join does not have an SQL example

[edit]

It would be great to give an SQL example for a natural join, because I was trying to distinguish between a natural join and an inner join, it seems like inner join is a superset of natural join, is it perhaps less efficient? All the other types of joins have exmaples; please add an SQL example for natural join. SystemBuilder (talk) 18:06, 5 May 2016 (UTC)[reply]

A NATURAL JOIN can be either an INNER JOIN (select ... from t1 NATURAL JOIN t2), or an OUTER JOIN (select ... from t1 NATURAL LEFT JOIN t2). 155.4.126.237 (talk) 07:49, 24 May 2017 (UTC)[reply]
[edit]

Hello fellow Wikipedians,

I have just modified one external link on Join (SQL). Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 12:40, 26 April 2017 (UTC)[reply]

Example queries in section "Inner join" are not equivalent

[edit]

In section Inner join, the statement, "The following example is equivalent to the previous one, but this time using implicit join notation:" is false because the two example queries are not equivalent and do not produce the same result. In order to make the two queries equivalent, the second query should read:

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName 
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

and the query result should read:

Employee.LastName Employee.DepartmentID Department.DepartmentName
Robinson 34 Clerical
Jones 33 Engineering
Smith 34 Clerical
Heisenberg 33 Engineering
Rafferty 31 Sales

--RedViking20200702 (talk) 06:57, 11 August 2020 (UTC)[reply]

I corrected the second query as I described in my previous comment.

--RedViking20200702 (talk) 07:03, 11 August 2020 (UTC)[reply]

Join index section

[edit]

The section on join indices probably needs (in 2022) attention by someone knowledgeable in the matter. Specific issues:

  1. It places great emphasis on summarising the capabilities of specific products, and given that it cites the year 2012, there is a definite risk that this information is now outdated.
  2. It fails to explain how a join index is any different from an ordinary index. The way it describes these indices — that they are updated when the underlying tables change, that they cover a subset of the columns, etc. — sounds very much like the way an ordinary index works. I suspect that the key point is that a join index combines data from several tables, whereas ordinary indices would not, but the article should state the matter explicitly.

90.129.219.218 (talk) 10:32, 21 March 2022 (UTC)[reply]

Great Info!
Needs clarification at the start of the topic for what is considered TABLE "A" and TABLE "B".
Is it:
- Table is the first table listed (e.g. from TableA join TableB ON ... )
or
- Table A is defined in the ON operator (e.g. On ( TableA.field1 = TableB.field2) )
Also would be helpful to have info about multi-table joins. Is the collective "table" of previous joins considered TableA for follow-on joins? 12.200.137.200 (talk) 21:29, 14 September 2024 (UTC)[reply]

"Join (SQL" listed at Redirects for discussion

[edit]

An editor has identified a potential problem with the redirect Join (SQL and has thus listed it for discussion. This discussion will occur at Wikipedia:Redirects for discussion/Log/2022 October 27#Join (SQL until a consensus is reached, and readers of this page are welcome to contribute to the discussion. Steel1943 (talk) 19:41, 27 October 2022 (UTC)[reply]