throbber
[3B2-9] man2012040078.3d
`
`5/11/012
`
`17:26 Page 78
`
`Anecdotes
`
`Early History of SQL
`
`Donald D. Chamberlin
`
`Ray Boyce and I first met E.F. (Ted) Codd at a symposium
`he organized at the IBM T.J. Watson Research Center in
`Yorktown Heights, New York, in 1972. Ray and I were
`both recent hires at the Watson Center. I had recently
`completed my PhD at Stanford University, and Ray had
`completed his at Purdue University. We were members
`of a recently reorganized IBM group that was looking
`for a mission. At that time, Ted Codd was a computer
`scientist at IBM’s San Jose Research Laboratory and
`was proposing a new way of organizing data that he
`called the ‘‘relational data model.’’
`One of the most important research areas in com-
`puter science in the early 1970s was the development
`of systems and languages for handling what computer
`scientists call persistent data. This term denotes data
`that remains in a computer system indefinitely, until
`it is explicitly deleted. Systems for managing persistent
`data were spreading quickly in the business world. A
`database management language proposed by the Coda-
`syl Data Base Task Group (DBTG)1 was receiving a lot of
`attention. Ray and I spent some time studying this lan-
`guage, learning concepts such as ‘‘currency indicators’’
`and ‘‘set occurrence selection.’’ With a little practice,
`we learned how to represent a database query in the
`form of a program that navigated through a network
`of pointers to find the desired information.
`
`Designing a Relational Language
`For Ray and me, our exposure to the relational data
`model at Codd’s research symposium was a revelation.
`For the first time, we could see how a query that would
`require a complex program in the DBTG language
`could be reduced to a few simple lines using one of
`Codd’s relational languages. It became a game for the
`two of us to invent queries and challenge each other
`to express them in various query languages.
`One of the queries that came out of this game was as
`follows: ‘‘Find names of employees who earn more than
`their managers.’’ The query was based on a three-
`column employee table. Each row of the table represented
`an employee and contained a name, a salary, and the
`
`Table 1. Employee.
`
`Name
`
`Smith
`Jones
`Baker
`Nelson
`
`Salary
`
`45,000
`40,000
`50,000
`55,000
`
`Manager
`
`Harker
`Smith
`Smith
`Baker
`
`Editor: Craig Partridge
`
`name of the employee’s manager. (This is a simple ex-
`ample. In a real application, employees would be iden-
`tified by some unique identifier such as an employee
`number.) Table 1 shows the structure of the table
`with four example rows.
`The third row of the table indicates that Baker’s sal-
`ary is $50,000 and Baker’s manager is Smith. The first
`row indicates that Smith’s salary is $45,000, so Baker
`earns more than his manager. Similarly, Nelson’s salary
`is $55,000, but Nelson’s manager is Baker, who earns
`$50,000, so Nelson also earns more than his manager.
`The result of the query, based on these four sample
`rows, is Baker and Nelson.
`In his research papers, Codd introduced two rela-
`tional query languages, called Relational Algebra2 and
`Relational Calculus (also known as the Data Sublanguage
`Alpha3). Relational Algebra consists of several operators,
`usually represented by symbols such as those in Figure 1.
`Using these operators, the query about well-paid
`employees could be represented as in Figure 2a.
`Codd’s Relational Calculus was based on a notation
`used in formal logic, using an existential quantifier 9
`(meaning ‘‘for each’’) and a universal quantifier 8
`(meaning ‘‘for all’’). Similar to Relational Algebra, Rela-
`tional Calculus could represent the well-paid employee
`query compactly (see Figure 2b).
`Ray and I were impressed by how compactly Codd’s
`languages could represent complex queries. However,
`at the same time, we believed that it should be possible
`to design a relational language that would be more
`accessible to users without formal training in mathe-
`matics or computer programming. We believed that
`barriers to widespread acceptance of Codd’s languages
`existed on two levels. The first barrier came from the
`mathematical notation, which was hard to enter at a
`keyboard. This barrier was superficial and could be eas-
`ily dealt with by replacing symbols with keywords—for
`example, replacing p with ‘‘project’’ and 8 with ‘‘for
`all.’’ The more difficult barrier was at the semantic
`level. The basic concepts of Codd’s languages were
`adapted from set theory and symbolic logic. This was
`natural given Codd’s background as a mathematician,
`but Ray and I hoped to design a relational language
`based on concepts that would be familiar to a wider
`population of users. We also hoped to extend the lan-
`guage to encompass database updates and administra-
`tive tasks such as the creation of new tables and
`views, which had traditionally been outside the scope
`of a query language.
`After attending Codd’s symposium, Ray and I spent
`the next year experimenting with language designs.
`
`78 IEEE Annals of the History of Computing
`
`Published by the IEEE Computer Society
`
`1058-6180/12/$31.00 c 2012 IEEE
`
`BlackBerry Corporation Exhibit 1025, pg. 1
`
`

`

`[3B2-9] man2012040078.3d
`
`5/11/012
`
`17:26 Page 79
`
`Our first attempt, called Square,4 was based
`on the notion of mapping and used a sub-
`script notation that was difficult to type.
`When we moved to the San Jose Research
`Laboratory in 1973 to join the System R proj-
`ect, we began work on another new language
`that we called Sequel. Sequel allowed the
`well-paid-employee query to be represented
`in a readable form free from mathematical
`concepts and symbols, as in Figure 2c.
`Ray and I hoped that, with a little practice,
`users could learn to read queries like this al-
`most as though they were English prose.
`This example query might be read as follows:
`‘‘Find an employee (let’s call him ‘e’) and
`another employee (let’s call him ‘m’) where
`e’s manager matches m’s name (in other
`words, e’s manager is m) and e’s salary is
`greater than m’s salary (in other words,
`e earns more than his manager); then print
`e’s name (for every such employee).’’
`It is important to note that the Sequel ver-
`sion of this query describes the information it
`is looking for but does not provide a detailed
`plan for how to find this information. This is
`why Sequel is called a declarative (rather than
`a procedural) language. Translating the de-
`clarative query statement into a detailed
`plan for processing the query is the job of
`an optimizing compiler.
`From the beginning, Sequel was intended to
`be used both for data manipulation (querying
`and updating data) and for data definition (cre-
`ation of tables, views, and assertions). To em-
`phasize this duality, Ray and I wrote two
`
`Figure 1. Examples of Relational Algebra operators.
`
`papers: ‘‘SEQUEL: A Structured English Query
`Language,’’5 and ‘‘Using a Structured English
`Query Language as a Data Definition Facility.’’6
`As luck would have it, the first of these papers
`became quite well known, while the second
`was never published outside IBM.
`In 1974, about one month after present-
`ing a paper on Sequel at a technical confer-
`ence in Ann Arbor, Michigan, Ray Boyce
`died suddenly of a ruptured brain aneurysm
`at the age of 26, leaving behind a wife of
`five years and a 10-month-old daughter.
`I often remember how much I enjoyed work-
`ing with Ray. We had a seamless partnership.
`In his brief career, Ray collaborated with Ted
`Codd on the Boyce-Codd Normal Form7 and
`with me on Sequel. I think that Ray would
`have been pleased to see the impact that his
`ideas have had on the world.
`After Ray’s untimely death, the Sequel lan-
`guage continued to evolve as a part of the
`System R project at San Jose Research Labora-
`tory. System R was installed on an experi-
`mental basis in three IBM customer sites,
`and a more complete Sequel language design
`was published in 1976,8 based in part on the
`experience collected by early users. In 1977,
`
`Figure 2. Three versions of the query, ‘‘Find names of employees who earn more than their managers.’’
`
`October–December 2012 79
`
`BlackBerry Corporation Exhibit 1025, pg. 2
`
`

`

`[3B2-9] man2012040078.3d
`
`5/11/012
`
`17:26 Page 80
`
`Anecdotes
`
`The SQL standard
`
`has been helpful in
`
`providing a mechanism
`
`for the controlled
`
`evolution of the
`
`language.
`
`because of a trademark issue, the name
`Sequel was shortened to SQL.
`
`The SQL Standard
`Commercial implementations of SQL, such
`as Oracle and DB2, began to appear in the
`late 1970s and early 1980s. By 1986, a stan-
`dard language definition called ‘‘Database
`Language SQL’’ had been formally adopted
`by the ANSI and ISO standards groups.9 A
`conformance test suite for the SQL standard
`was developed by the National Institute of
`Standards and Technology, and many SQL-
`based products were validated by this test
`suite between 1988 and 1996. New versions
`of the SQL standard were published in
`1996, 1999, 2003, 2006, and 2008.
`The SQL standard has been helpful in
`providing a mechanism for the controlled
`evolution of the language, providing a
`forum in which both users and implement-
`ers have a voice. Over the years, the evolv-
`ing standard has corrected many of the
`initial deficiencies of SQL and has added
`many new features, including outer joins,
`table expressions,
`recursion,
`triggered
`actions, user-defined types and functions,
`and online analytic processing (OLAP)
`functions. In the hands of the ANSI X3H2
`Committee, the definition of SQL has
`evolved from a 12-page research paper to
`an international standard comprising hun-
`dreds of pages.
`SQL has been a more successful query lan-
`guage than Ray Boyce and I had any reason
`to expect in 1974. I think that the most im-
`portant ingredient in this success was Ted
`Codd’s breakthrough work in defining the
`relational data model and raising the level
`of abstraction with which users could inter-
`act with stored data. While SQL has been
`criticized as departing from some of Codd’s
`original principles, experience has shown
`
`the language to be simple enough to learn
`easily and expressive enough to do useful
`work. Other important contributions to the
`success of SQL include the following:
` The language benefited greatly from hav-
`ing robust early implementations on
`multiple platforms. System R provided a
`multiuser implementation with transac-
`tional semantics and a sophisticated opti-
`mizing compiler. At roughly the same
`time, Oracle implemented the language
`on the widely used Unix platform.
` By combining query, update, and admin-
`istrative tasks into a single language, SQL
`makes it easy for authorized users to mod-
`ify database schemas and install new
`applications while the system is running.
`These tasks had traditionally been per-
`formed by specialized database admin-
`istrators during system maintenance
`intervals. Making every user his own data-
`base administrator removed an important
`bottleneck from application development.
` The SQL standard provided a common
`ecosystem in which vendors could de-
`velop competing implementations, tools,
`and training materials. The standard also
`reassured users that they would not be-
`come dependent on a single software ven-
`dor, although vendors had a regrettable
`tendency to implement different subsets
`of the standard and to include proprietary
`features.
`
`Criticisms of SQL
`languages, SQL has
`Like most successful
`attracted its share of criticism, which has
`tended to focus on the following issues.
`
`Orthogonality and Completeness. The ear-
`liest versions of SQL lacked support for
`some aspects of the relational data model,
`including primary keys and referential integ-
`rity. The early language also lacked orthogon-
`ality because it did not allow subqueries to be
`used in place of named tables and it failed to
`provide a way to name the columns of a
`query result. All these serious deficiencies
`were corrected by the 1992 version of the
`SQL standard, illustrating the helpful influ-
`ence of the standards process.
`
`Nulls. SQL supports a ‘‘null value’’ that rep-
`resents a data item that is missing or inappli-
`cable. The null value is not comparable to
`any other value, and for this reason, SQL
`implements a three-valued logic in which a
`
`80 IEEE Annals of the History of Computing
`
`BlackBerry Corporation Exhibit 1025, pg. 3
`
`

`

`[3B2-9] man2012040078.3d
`
`5/11/012
`
`17:26 Page 81
`
`search condition might be neither true nor
`false. Nulls and three-valued logic were
`both introduced by Ted Codd in his early
`papers, and Rule 3 of Codd’s famous ‘‘Twelve
`Rules’’10 requires a relational database system
`to support nulls. Various writers have com-
`plained that nulls and three-valued logic
`make queries more confusing and optimiza-
`tion more difficult. Researchers have pro-
`posed other approaches to the problem of
`missing data, but none of these approaches
`is without disadvantages. SQL lets users spec-
`ify, on a column-by-column basis, where
`nulls are permitted and where they are pro-
`hibited. Over the years, nulls have proven
`useful
`in the design of various features,
`such as outer join, that have been added dur-
`ing the evolution of the language.
`
`Duplicates. Unlike Codd’s original defini-
`tion of the relational data model, SQL per-
`mits duplicate rows to exist, both in a
`database table and in a query result. SQL
`also allows users to selectively prohibit dupli-
`cate rows in a table or in a query result. The
`intent of this approach is to give users con-
`trol over the potentially expensive process
`of duplicate elimination. In some applica-
`tions, duplicate rows might be meaningful—
`for example, in a point-of-sale system, a cus-
`tomer might purchase several identical items
`in the same transaction. In other applica-
`tions, such as printing an address list, dupli-
`cate values could be unexpected but users
`might prefer not to pay the cost of detecting
`and eliminating them. As in the case of nulls,
`the SQL approach provides users with tools
`to control duplicate rows according to the
`needs of specific applications. The cost of
`sorting a million records to detect duplicates
`seemed more significant in 1974 than it does
`today, when an ordinary laptop has thou-
`sands of times more memory and processing
`power than a mainframe computer of the
`1970s.
`
`Impedence Mismatch. SQL was designed to
`be used both as a stand-alone language for
`interactive queries and as an application de-
`velopment language for online transaction
`processing (OLTP). This is a helpful unifica-
`tion of concepts, but in OLTP applications,
`SQL is usually embedded in (or called from)
`a host programming language such as C or
`Java. Often the data types of the host language
`are not the same as those of SQL, and the host
`language is usually more procedural, whereas
`SQL is more declarative. The resulting
`
`SQL was designed to be
`
`used both as a stand-
`
`alone language for
`
`interactive queries and
`
`as an application
`
`development language
`
`for OLTP.
`
`‘‘impedance mismatch’’ tends to increase ap-
`plication complexity and interfere with
`global optimization. One approach to this
`problem has been the development of com-
`putationally complete SQL-based scripting
`languages such as Persistent Stored Modules
`(PSM).11
`
`Legacy
`When Ray and I were designing Sequel in
`1974, we thought that the predominant use
`of the language would be for ad-hoc queries
`by planners and other professionals whose
`domain of expertise was not primarily data-
`base management. We wanted the language
`to be simple enough that ordinary people
`could ‘‘walk up and use it’’ with a minimum
`of training. Over the years, I have been sur-
`prised to see that SQL is more frequently used
`by trained database specialists to implement re-
`petitive transactions such as bank deposits,
`credit card purchases, and online auctions.
`I am pleased to see the language used in a vari-
`ety of environments, even though it has not
`proved to be as accessible to untrained users
`as Ray and I originally hoped.
`Looking back at my experience at IBM Re-
`search in the 1970s, I feel fortunate to have
`been working at that place and time. Ted
`Codd, Ray Boyce, and the System R team
`were wonderful people to work with, and
`the impact of our work has been gratifying.
`I am grateful for having had the opportunity
`to participate in this work.
`
`References
`1. ‘‘CODASYL Data Base Task Group,’’ April 71
`Report, ACM, 1971.
`2. E.F. Codd introduced the operators of relational
`algebra in ‘‘Relational Completeness of Data
`
`October–December 2012 81
`
`BlackBerry Corporation Exhibit 1025, pg. 4
`
`

`

`[3B2-9] man2012040078.3d
`
`5/11/012
`
`17:26 Page 82
`
`Anecdotes
`
`Base Sublanguages,’’ IBM Research Report RJ
`987, Mar. 1972. Several versions of the rela-
`tional algebra exist, all of which include some
`version of the operators used in this paper.
`There is no recognized standard notation for
`these operators. The notation used in this article
`is taken from H. Garcia-Molina, J. Ullman, and
`J. Widom, Database Systems: the Complete Book,
`Prentice-Hall, 2002, pp. 189–237.
`3. E.F. Codd, ‘‘A Data Base Sublanguage Founded
`on the Relational Calculus,’’ Proc. ACM SIGFIDET
`Workshop on Data Description, Access, and Con-
`trol, ACM Press, 1971, pp. 35–68.
`4. R. Boyce et al., ‘‘Specifying Queries as Relational
`Expressions: the SQUARE Sublanguage,’’ Comm.
`ACM, vol. 18, no. 11, 1975, pp. 621–628.
`5. D. Chamberlin and R. Boyce, ‘‘SEQUEL: A Struc-
`tured English Query Language,’’ Proc. ACM
`SIGFIDET Workshop on Data Description, Access,
`and Control, ACM Press, 1974, pp. 249–264.
`See also http://www.almaden.ibm.com/cs/
`people/chamberlin/sequel-1974.pdf.
`6. R. Boyce and D. Chamberlin, ‘‘Using a Structured
`English Query Language as a Data Definition Fa-
`cility,’’ IBM Research Report RJ1318, Dec. 1973.
`7. The Boyce-Codd Normal Form is a database de-
`sign discipline taught in most advanced
`
`textbooks on database management. For exam-
`ple, see R. Ramakrishnan and J. Gehrke, Data-
`base Management Systems, 3rd ed., McGraw
`Hill, 2003, pp. 615–617.
`8. D. Chamberlin et al., ‘‘SEQUEL 2: A Unified
`Approach to Data Definition, Data Manipulation,
`and Control.’’ IBM J. Research and Development,
`vol. 20, Nov. 1976, p. 560.
`9. See the ANSI/ISO/IEC 9075-1, Information
`technology - Database languages - SQL - Part
`1: Framework (SQL/Framework); ANSI/ISO/IEC
`9075-2, Information technology - Database
`languages - SQL - Part 2: Foundation (SQL/
`Foundation); and so on at http://www.ansi.org
`or http://www.iso.ch.
`10. E.F. Codd., ‘‘Does Your DBMS Run by the Rules?’’
`Computer World, vol. 21, Oct. 1985. See also
`http://en.wikipedia.org/wiki/Codd’s_12_rules.
`11. ANSI/ISO/IEC 9075-4, Database Language SQL,
`Part 4: Persistent Stored Modules (SQL/PSM);
`http://www.ansi.org.
`
`Donald D. Chamberlin is an adjunct professor
`at the University of California, Santa Cruz. His work
`on SQL and System R at IBM has been recognized
`by the ACM SIGMOD Innovation Award and by
`the Computer History Museum. Contact him at
`chamberlin.don@gmail.com.
`
`82 IEEE Annals of the History of Computing
`
`BlackBerry Corporation Exhibit 1025, pg. 5
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket