Saturday, October 6, 2007

Incremental elimination using predicate negation

Application requirements sometimes include the need to match consumers with products. A common solution is to create a consumer table having a varchar2 column which will contain a string of predicates or a where clause. The consumer’s where clause is tested against the attributes of a product. The process of matching might be implemented in any of the following modes:

  • Tell me if this consumer matches this product
  • Tell me all of the consumers that match this product
  • Tell me all of the products that match this consumer

In any case, if the where clause evaluation results in TRUE, the matching row will be returned in the result set. In other words the steps are;

  • execute a SELECT statement containing the consumer’s where clause
  • determine FOUND or NOT FOUND
  • and if FOUND, consider it to be a match

This approach works fine in some situations, but what if the product is built gradually over a period of time? For example, assume we have an adoption agency for children. The child is the product. The Prakesh family ( the consumer ) is waiting to receive a child, but they have a set of ‘child matching’ requirements which we have expressed as a where clause. It looks like this:

(ths = 'N' and hair_color = 'blond' and autosomal_recessive = 'No')

Now babyX is born bald as a billiard ball, so we don’t yet know the hair_color. The results of the THS test will be known in one week and the autosomal_recessive test in 2 weeks. The Prakesh family thinks that babyX might be right for them but they won’t know for sure until all of the child data is available. A few days passes and babyX has enough fuzz on her head to see that she has blond hair. So, we have one piece of data, and this one piece of data has the potential to eliminate the Prakesh family as parents of babyX. In the database we have a record for babyX that looks like

Baby_ID

THS

Hair Color

Autosomal Recessive

123

 

blond

 

We want to see if our new data value for hair color eliminates the Prakesh family as candidates for babyX, so we build our SELECT statement like this:

SELECT 'matched'
FROM   TbaBy
WHERE  Baby_Id = 123
      
AND ths = 'N'
      
AND Hair_Color = 'blond'
      
AND AutoSoMal_Recessive = 'No';

Of course, the result will be no match. The problem is clear; if we are going to apply the where_clause before all of the data is gathered, then we need to allow for NULL values in our where clause. We-code the where_clause so that it looks like this:

SELECT 'matched'
FROM   TbaBy
WHERE  (ths = 'N'
        
OR ths IS NULL )
      
AND (Hair_Color = 'blond'
            
OR Hair_Color IS NULL )
      
AND (AutoSoMal_Recessive = 'No'
            
OR AutoSoMal_Recessive IS NULL );

Now, given this new where clause, as soon as a new data item is written to the baby record, we can run the where clause each time we receive more data on the baby .and we will be able to tell the Prakesh family the moment that any one of their criteria is not met.

In essence, by adding the “or is null” to the where_clause, we have changed the question from

  • Does this baby match these parents?
    to
  • Does this baby NOT match these parents?

Adding the IS NULL test is simple as long as either the left-hand side predicate or the right-hand side predicate is a constant. It becomes a little trickier when both left-hand and right-hand sides are both variables.

Using the predicate negation technique

Predicate negation reverses the logic of the where clause so that any row that would not be returned is now returned and visa versa.

CREATE TABLE PRODUCT
(
  PRODID   
NUMBER(5),
  PRODSIZE 
INTEGER,
  COLOR    
VARCHAR2(10 BYTE),
  FABRIC   
VARCHAR2(10 BYTE),
 
CONSTRAINT PRODUCT_PK
 
PRIMARY KEY
 
(PRODID)
);

SET DEFINE OFF;
INSERT INTO PRODUCT   (PRODID, PRODSIZE, COLOR, FABRIC)
 
VALUES   (10, 14, NULL, 'cotton');
INSERT INTO PRODUCT   (PRODID, PRODSIZE, COLOR, FABRIC)
 
VALUES   (20, 14, 'red', 'wool');
INSERT INTO PRODUCT   (PRODID, PRODSIZE, COLOR, FABRIC)
 
VALUES   (30, 15, 'white', 'cotton');
COMMIT;

 

-- the following selects assumes that all attribute
-- values have been provided ( no nulls ). A NULL value in any of the criteria
-- will result in no_data_found, hence no match.
SELECT *
FROM   Product
WHERE  ProdSize = 14
      
AND ((Color = 'red'
            
AND Fabric = 'cotton')
            
OR (Fabric = 'wool'));

-- by reversing the logic of the where clause and reversing
-- the interpretation of FOUND vs NOT FOUND, NULL values will not
-- be considered a reason for rejecting a match. If a record is returned
-- it means the product does not match the criteria.
SELECT ProdId
FROM   Product
WHERE  (ProdSize != 14
        
OR ((Color != 'red'
              
OR Fabric != 'cotton')
            
AND (Fabric != 'wool')));

-- to prove that the result is the inverse of the prior result
-- we anti join to the complete table.
SELECT *
FROM   Product
WHERE  ProdId NOT IN (SELECT ProdId
                     
FROM   Product
                     
WHERE  (ProdSize != 14
                              
OR ((Color != 'red'
                                    
OR Fabric != 'cotton')
                                  
AND (Fabric != 'wool'))));

Predicate negation can be accomplished by changing

 
OR  into  AND
=   into  !=
>  into  <=
<  into  >=
IN  into  NOT IN
BETWEEN  into  NOT BETWEEN
AND into OR

To learn more about the formal rules of predicate negation see

De_Morgan’s theorem

 

2 comments:

DomBrooks said...

Unfortunately, your code does not solve the problem of nulls.

a null color will not be returned for a clause of either
where color = 'red'
or a clause of
where color != 'red

both will return false for a null

Unknown said...

Thanks for the comment dombrooks. You have a good point. I can see now that I was not clear toward the end. The goal of predicate negation is to allow us to change the meaning of null in a business sense. Instead of null meaning "we don't have a match" we want null to mean "we do have a match" To change the meaning of NULL we have to change the question we are asking from "can we state with certainty that this record (row) does fulfill the consumer's requirements" to "can we state with certainty that this record (row)does not fulfill the consumer's requirements". The result of our SELECT statement changes meaning from "NOT_FOUND means no match" to "NOT_FOUND means match" So, as you correctly stated, if the color is NULL then "where color = 'red' will return no rows and where color != 'red' will also return no rows. It's the fact that we negated the predicates that allows us to interpret the not_found differently.

I will try to fix my post. Thanks for your help.
Mike

Labels