SQL-The Least You Need To Know
Notably, most computer programs that interact with database use SQL for that purpose, no matter what programming language they use.
SQL doesn't have to be embedded in computer programs. Programmers and non-programmers alike routinely write SQL queries and run them directly against the database. At some point, almost all TNG administrators will need to run an ad-hoc query. Even if someone else writes the query for you, you should at least have a general understand of SQL.
Most SQL queries are written as a single statement with several "clauses". Queries often include conditional functions (much like those in Microsoft Excel formulas). Much less often they contain conditional clauses, which are really out of the scope of this article.
All queries start with a keyword that is a verb that indications what kind of operation the query will perform. By far, the most common query operation is SELECT, which reads and displays data. In natural languages, we probably wouldn't think of UPDATE, INSERT, DROP, or ALTER operations as queries, but in SQL, they are. For readability, SQL keywords are typically written in uppercase letters, but they don't have to be.
SELECT queries essentially always produce a grid (well, a table, but not a database table) where rows represent result records and columns represent result values. In the simplest queries,
- The rows represent records from a database table, and
- The columns are just the values of database fields.
As queries get more complicated
- The rows represent what we might call logical records that are assembled from more than one database table, and
- The columns may contain computed values rather than direct field values.
- must start with a SELECT clause that lists the values to display (in columns),
- And then specify one FROM clause that specifies the (first) table to use.
SELECT queries (really, of all types) may also have
- JOIN clauses that pull other tables into the query,
- A WHERE clause that uses boolean expressions to control which records to display (in rows),
- An ORDER BY clause to specify the sort order
- And several others, only a couple of which will be touched on here.
Example 1 - People & Events
This example includes queries that examine the People table and specific events for selected people. It introduces and describes SELECT, FROM, ORDER BY, INNER JOIN...ON, field and table aliases, composite keys, concatenation functions, TNG date fields, and TNG foreign key fields that can link to different table, depending on the field value.
Example 2 - Multiple Parents
The goal: Find people who have more than one set of parents. Typically, this would happen when birth and adoptive parents are both in the database, or when a parent remarries and the researcher wants to mark the new spouse as a step-parent. This example includes queries that examine the People table and specific events for selected people. It introduces and describes SELECT, FROM, ORDER BY, INNER JOIN...ON, field and table aliases, composite keys, the COUNT function, subqueries, and TNG foreign key fields that can link to different table, depending on the field value.
gotta be simple and quick
simple and quick
DROP, TRUNCATE, and ALTER Queries
VERY simple and quick
In MySQL, the letter case of tablenames and fieldnames does not matter. (That is if you define a field named "personID", then "personid" would refer to the same field.) In other database products, letter case may be significant, in which case "personid" and "personID" would be different fields. Or a database product might require all table names to be uppercase. Or one product might allow underscores in tables, and another might even allow spaces.
Since SQL doesn't care about the letter case of tablenames, SQL statements that express a tablename with the "wrong" letter case will be perfectly legal and work just fine.
Aspects OF TNG's database
TNG's database does not have formally defined foreign keys, which means that TNG SQL queries (whether embedded in PHP programs, or executed independently) must be more explicit about which fields in one table match which fields in another. For instance
- The fields Families.husband and Families.wife are expected to contain personID values, which in combination with Families.gedcom, serve as foreign keys to People.
- But a rogue query could story any other value (that matches the fields' datatype and length) in those fields. Bad values would generally not cause an error; they would simply cause a JOIN from Families.wife or Families.husband not to match a People record.
- It is entirely up to the PHP programmer and SQL query-writer to make sure that the right fields are used in JOIN clauses and that they have the expected values.
But TNG can also take advantage of the fact that those relationships are not formalized. For instance,
- Events.persfamID field can have have personID or familyID values; that is, one field (well, in combination with Events.gedcom) can support links to two different tables.
- In Citations.eventID can have values other than Event table keys. Some values in the Citations.eventID field have other purposes, and don't point to table's primary key.
Well, where I say "people p" in my queries, the p is an alias for "people". As result I can use "p" instead of "people" everywhere else in the query. I find aliases to be extremely useful. They frequently make it easier to read the query. Beside, having to write the full tablename everywhere is a drag.
I admit that sometimes using explicit table names does make it easier to understand what the query is doing. So, I try to remember to think about other humans and use the full tablename.
Note that not all programmers use alias. For example, most of Darrin's SQL queries do not. (That may be the case for historical reasons; I don't know. But it's definitely true that his queries do not generally need to be readable.)
Here's a simple example, using the first few lines of Query 2-3 above:
My version, with aliases:
SELECT f.familyID, h.lastname Husband, w.lastname Wife, ee.tag, e.eventdate, e.eventplace FROM tng_families f INNER JOIN tng_events e ON e.persfamID=f.familyID AND e.gedcom=f.gedcom INNER JOIN tng_eventtypes ee ON ee.eventtypeID=e.eventtypeID LEFT JOIN tng_people h ON h.personID=f.husband AND h.gedcom=f.gedcom LEFT JOIN tng_people w ON w.personID=f.wife AND w.gedcom=f.gedcom
Here, "f" is an alias for families, "e" for events, "ee" for eventtypes, and both "h" and"w" for people. Both "h" and "w" are used for people because the query uses the people table twice - once for the husband, and once for the wife. When there are two instances of a table in a query, you have to use alias to eliminate ambiguity. But the others are not required; they are just handy.
Using explicit table names when aliases are not required, the query might look like this:
SELECT tng_families.familyID, h.lastname Husband, w.lastname Wife, tng_eventtypes.tag, tng_events.eventdate, tng_events.eventplace FROM tng_families INNER JOIN tng_events ON tng_events.persfamID=tng_families.familyID AND tng_events.gedcom=tng_families.gedcom INNER JOIN tng_eventtypes ON tng_eventtypes.eventtypeID = tng_events.eventtypeID LEFT JOIN people h ON h.personID = tng_families.husband AND h.gedcom = tng_families.gedcom LEFT JOIN people w ON w.personID = tng_families.wife AND w.gedcom = tng_families.gedcom
I'll use aliases every time.
Even More Details
Now, I'm going go dig even deeper in to SQL to make a few additional points that affect the verbosity and/or clarity of SQL. I can't say that it is essential for you to know them, but they might affect your ability to understand other SQL queries.
- Formally, you use the AS keyword to define aliases: "FROM families AS f", "INNER JOIN events AS e", etc.
- In English, tablenames are usually expressed with an initial capital letter. I'll use that notion in all English text below.
- In SQL, fieldnames, tablenames, and keywords do not have to use the correct letter case. For example
INNER JOIN events ON events.persfamID=families.familyID AND events.gedcom=families.gedcom
could be expressed as
inner join Events ON Events.persfamID=Familes.familyID and Events.gedcom=Families.gedcom.persfamID
- Tablenames don't have to be attached to fieldnames when a fieldname exists in only one of the tables in the query. For instance , in the SELECT clause, the fieldname "familyID" is used only in the Families table, "tag" occurs only in Eventtypes, and eventdate and eventplace occur only in the Events table. Thus, the first four lines of the query could be
SELECT familyID, h.lastname Husband, w.lastname Wife, tag, eventdate, eventplace FROM families INNER JOIN events ON events.persfamID=familyID AND events.gedcom=families.gedcom INNER JOIN eventtypes ON eventtypes.eventtypeID=events.eventtypeID
because "persfamID" occurs in Events and Citations, "eventtypeID" occurs only in Eventtypes, and "gedcom" occurs in almost every TNG table.
Of course, to avoid explict use of tablenames (or aliases), the programmer has to know the names of every field in every table that is used in the query. Otherwise, how could programmers know that they can say "familyID" instead of "families.familyID"? In truth "familyID is used in other tables in the database. But those other tables are not relevant to this query.
INNER JOIN vs LEFT JOIN
INNER JOIN is used to connect Families to Events because, if there are no Event records for a given Family, well, we don't need to show that Family at all.
INNER JOIN is used to connect Events to Eventtypes because we know that there has to be a match. The Events table does not contain the event tag. Instead, the event type is represented by the "eventtypeID" field, which is a number that points to the Eventtypes table. Why? Well, this may not be the ideal example, but if we want to change census events (CENS) to residence events (RESI), we only have to change CENS to RESI in one record in the Eventtypes table. We don't have to touch the Events table at all.
LEFT JOIN is used to connect the husband and wife in a Families record because we want to see event records even if we don't know who the husband wife it. That is, when the husband or wife in a family is unknown, the husband or wife field will be empty and will not link to a family record. If we used an INNER JOIN, the absence of a Person record for a husband or wife would remove the Family from the results.
More specifically, if you back to Query 2, you'll see
LEFT JOIN citations c ON c.eventID=e.eventID<br> WHERE c.citationID IS NULL...
Here, the LEFT JOIN is looking up all citations for a given event. If there are no such citations, the LEFT JOIN will produce a table row in which, in the absence of any Citation records, the citationID value would be null. And since those are the very events we want, the WHERE clause has the condition "c.citationID IS NULL". Hence we want to see events where the ci would (and, as would all if we cared about other fields in the citation table, theall fields in the citation recordwe're still going to get a result row for that event, but the value of the citationID in the results will be null (which is different from blank or empty) we're matching a citation record with an event the If you go back to the Query 2, you'
The personID is sufficient to identify a person record in the query results; the person's name is not essential. It is there only for the possible convenience of a human who is using the results. That is, a human who wants to resolve citationless events can use the personID to look up the relevant person records. But the name of a person the database could allow the human to focus on people who are more important for some reason or another. people that llows the human to recognize certain people humans can use the familyIDwants to open a record in the People table in order to resolve citationless events, needed at all. That is, the personID is suffient to look up a person in the database to try to resolve events without citations, The person's name is there only for the convenience of humans who read the results. In this query, the name may not be relewants to know the names of each person who has events without citations. (Well, humans who reads the results may not want to know the name of each person in the query results; they may only want to be able to recognize some names.) of Maybe not to know the names of each person. citationless"person who has citationlessis reading the query results. If the husband or wife in a given family is unknown, the husband or wife field would be empty, and would not link to a People record. In this query, the familyID is sufficient to identify a family record; the the husband's and wife's last name are useful only to help a human recognize a family.
Finally, the essential difference between INNER JOIN and LEFT JOIN is that, with INNER JOIN, if the ON condition fails, no result row will appear in the table. With OUTER JOIN, a table row will result no matter whether the ON condition matches, and the missing record will be populated with NULL values.