Incremental elimination using predicate negation
Application requirements sometimes include the need to match consumers with products. A common solution is to create a consumer table having a varchar2 column which will contain a string of predicates or a where clause. The consumer’s where clause is tested against the attributes of a product. The process of matching might be implemented in any of the following modes:
- Tell me if this consumer matches this product
- Tell me all of the consumers that match this product
- Tell me all of the products that match this consumer
In any case, if the where clause evaluation results in TRUE, the matching row will be returned in the result set. In other words the steps are;
- execute a SELECT statement containing the consumer’s where clause
- determine FOUND or NOT FOUND
- and if FOUND, consider it to be a match
This approach works fine in some situations, but what if the product is built gradually over a period of time? For example, assume we have an adoption agency for children. The child is the product. The Prakesh family ( the consumer ) is waiting to receive a child, but they have a set of ‘child matching’ requirements which we have expressed as a where clause. It looks like this:
(ths = 'N' and hair_color = 'blond' and autosomal_recessive = 'No')
Now babyX is born bald as a billiard ball, so we don’t yet know the hair_color. The results of the THS test will be known in one week and the autosomal_recessive test in 2 weeks. The Prakesh family thinks that babyX might be right for them but they won’t know for sure until all of the child data is available. A few days passes and babyX has enough fuzz on her head to see that she has blond hair. So, we have one piece of data, and this one piece of data has the potential to eliminate the Prakesh family as parents of babyX. In the database we have a record for babyX that looks like
Baby_ID | THS | Hair Color | Autosomal Recessive |
123 |
| blond |
|
We want to see if our new data value for hair color eliminates the Prakesh family as candidates for babyX, so we build our SELECT statement like this:
SELECT 'matched'
FROM TbaBy
WHERE Baby_Id = 123
AND ths = 'N'
AND Hair_Color = 'blond'
AND AutoSoMal_Recessive = 'No';
Of course, the result will be no match. The problem is clear; if we are going to apply the where_clause before all of the data is gathered, then we need to allow for NULL values in our where clause. We-code the where_clause so that it looks like this:
SELECT 'matched'
FROM TbaBy
WHERE (ths = 'N'
OR ths IS NULL )
AND (Hair_Color = 'blond'
OR Hair_Color IS NULL )
AND (AutoSoMal_Recessive = 'No'
OR AutoSoMal_Recessive IS NULL );
Now, given this new where clause, as soon as a new data item is written to the baby record, we can run the where clause each time we receive more data on the baby .and we will be able to tell the Prakesh family the moment that any one of their criteria is not met.
In essence, by adding the “or is null” to the where_clause, we have changed the question from
- Does this baby match these parents?
to - Does this baby NOT match these parents?
Adding the IS NULL test is simple as long as either the left-hand side predicate or the right-hand side predicate is a constant. It becomes a little trickier when both left-hand and right-hand sides are both variables.
Using the predicate negation technique
Predicate negation reverses the logic of the where clause so that any row that would not be returned is now returned and visa versa.
CREATE TABLE PRODUCT ( PRODID NUMBER(5), PRODSIZE INTEGER, COLOR VARCHAR2(10 BYTE), FABRIC VARCHAR2(10 BYTE), CONSTRAINT PRODUCT_PK PRIMARY KEY (PRODID) ); SET DEFINE OFF; INSERT INTO PRODUCT (PRODID, PRODSIZE, COLOR, FABRIC) VALUES (10, 14, NULL, 'cotton'); INSERT INTO PRODUCT (PRODID, PRODSIZE, COLOR, FABRIC) VALUES (20, 14, 'red', 'wool'); INSERT INTO PRODUCT (PRODID, PRODSIZE, COLOR, FABRIC) VALUES (30, 15, 'white', 'cotton'); COMMIT; |
-- the following selects assumes that all attribute
-- values have been provided ( no nulls ). A NULL value in any of the criteria
-- will result in no_data_found, hence no match.
SELECT *
FROM Product
WHERE ProdSize = 14
AND ((Color = 'red'
AND Fabric = 'cotton')
OR (Fabric = 'wool'));
-- by reversing the logic of the where clause and reversing
-- the interpretation of FOUND vs NOT FOUND, NULL values will not
-- be considered a reason for rejecting a match. If a record is returned
-- it means the product does not match the criteria.
SELECT ProdId
FROM Product
WHERE (ProdSize != 14
OR ((Color != 'red'
OR Fabric != 'cotton')
AND (Fabric != 'wool')));
-- to prove that the result is the inverse of the prior result
-- we anti join to the complete table.
SELECT *
FROM Product
WHERE ProdId NOT IN (SELECT ProdId
FROM Product
WHERE (ProdSize != 14
OR ((Color != 'red'
OR Fabric != 'cotton')
AND (Fabric != 'wool'))));
Predicate negation can be accomplished by changing
OR into AND = into != > into <= < into >= IN into NOT IN BETWEEN into NOT BETWEEN AND into OR |
To learn more about the formal rules of predicate negation see
2 comments:
Unfortunately, your code does not solve the problem of nulls.
a null color will not be returned for a clause of either
where color = 'red'
or a clause of
where color != 'red
both will return false for a null
Thanks for the comment dombrooks. You have a good point. I can see now that I was not clear toward the end. The goal of predicate negation is to allow us to change the meaning of null in a business sense. Instead of null meaning "we don't have a match" we want null to mean "we do have a match" To change the meaning of NULL we have to change the question we are asking from "can we state with certainty that this record (row) does fulfill the consumer's requirements" to "can we state with certainty that this record (row)does not fulfill the consumer's requirements". The result of our SELECT statement changes meaning from "NOT_FOUND means no match" to "NOT_FOUND means match" So, as you correctly stated, if the color is NULL then "where color = 'red' will return no rows and where color != 'red' will also return no rows. It's the fact that we negated the predicates that allows us to interpret the not_found differently.
I will try to fix my post. Thanks for your help.
Mike
Post a Comment