|   | 
 
 PostgreSQL 
 PostgreSQL is a free 
 software object-relational 
 database server (database management system), released under the flexible BSD-style 
 license. It offers an alternative to other open-source database systems (such as MySQL, Firebird, 
 and MaxDB), as 
 well as proprietary 
 systems such as Oracle, 
 IBM's DB2 and Microsoft 
 SQL Server. 
 The official pronunciation of "PostgreSQL" is "Post-Gress-Q-L" (listen 
 here (5.6k 
 MP3)). 
 History 
 PostgreSQL has had a lengthy evolution, starting with the Ingres project at UC Berkeley. The project lead, Michael 
 Stonebraker had left Berkeley to commercialize Ingres in 1982, but eventually returned to academia. After returning to Berkeley in 1985, 
 Stonebraker started a post-Ingres project to address the problems with contemporary database systems that had become increasing clear during the 
 early 1980s. 
 The resulting project, named Postgres, aimed at introducing the 
 minimum number of features needed to add complete support for types. These 
 features included the ability to define types, but also the ability to 
 fully describe relationships – which up until this time had been widely 
 used but maintained entirely by the user. In Postgres the database 
 "understood" relationships, and could retrieve information in related 
 tables in a natural way using rules. 
 Starting in 1986 the team released a number of papers describing the 
 basis of the system, and by 1988 the project had a prototype version up 
 and running. Version 1 was released to a small number of users in June 1989, followed by 
 Version 2 with a re-written rules system in June 1990. 1991's Version 3 
 re-wrote the rules system again, but also added support for multiple 
 storage managers, and an improved query engine. By 1993 there were a huge 
 number of users and the project was being overwhelmed with requests for 
 support and features. After releasing a Version 4 primarily as a cleanup, 
 the project ended. 
 Although the Postgres project had officially ended, the BSD license 
 (under which Postgres was released) enabled Open Source 
 developers to obtain copies and develop the system further. In 1994 two UC 
 Berkeley graduate students, Andrew Yu and Jolly Chen, added a SQL language 
 interpreter to replace the earlier QUEL system Ingres 
 had been based on, creating Postgres95. The code was subsequently released 
 to the web to find its own way in the world. Postgres95 was an open source 
 descendant of this original Berkeley code. In 1996 it was decided that the 
 project should be renamed; in order to reflect the database's new SQL 
 query language, Postgres95 was renamed to PostgreSQL. The first 
 PostgreSQL release was version 6.0. The software has been subsequently 
 maintained by a group of database developers and volunteers from around 
 the world, coordinated via the Internet. Since 6.0, many subsequent 
 releases have been made, and many improvements have been made to the 
 system; as of this writing, the current release series is 7.4, with 7.5 
 expected in mid to late 2004. 
 Although the license allowed for the commercialization of Postgres, the 
 Postgres code was not developed commercially with the same rapidity as 
 Ingres, which is somewhat surprising considering the advantages the 
 product offered. The main offshoot was created when Michael Stonebraker 
 and Paula Hawthorn, an original Ingres team member who moved from Ingres, 
 formed Illustra 
 Information Technologies to commercialize Postgres. 
 Illustra's product was first introduced in 1991, where it was used in 
 the Sequoia 2000 project late that year. By 1995 the product had 
 added an ability to write plug-in modules they referred to as DataBlades. 
 Unlike other plug-in technologies, with DataBlades external authors could 
 write code to create new low-level datatypes, and tell the database how to 
 store, index and manage it. For instance, one of the most popular 
 DataBlades was used to create a time-series, a list of one 
 particular variable over time, often with gaps. For instance, the price of 
 a stock over time changes, but there are times, like weekends, where the 
 data does not change and there is no entry. Traditional databases have 
 difficultly handling this sort of task; while they can find a record for a 
 particular date, finding the one that is "active" in one of the gaps is 
 time consuming. With the Time Series DataBlade, this was fast and 
easy. 
 DataBlades were incredibly successful and started to generate 
 considerable industry "buzz", eventually leading Informix to 
 purchase the company outright in 1996. Industry insiders claimed that it 
 would not be possible to merge the two products, but in fact this was 
 fairly easy because both were based on the original Ingres code and 
 concepts. Informix released their Illustra-based Universal Server 
 in 1997, leaving them in an unchallenged position in terms of technical 
 merit. 
 Description 
 A cursory examination of PostgreSQL might suggest that the system 
 resembles other database systems quite closely. PostgreSQL uses the SQL 
 language to run queries on data that is organized as a series of tables 
 with foreign keys linking related data together. The primary 
 advantage of PostgreSQL over some of its competitors is best described as 
 programmability: PostgreSQL makes it much easier to build 
 real-world applications using data taken from the database. 
 The SQL data stores simple data types in "flat" tables, requiring the 
 user to gather up related information using queries. This contrasts with 
 the way the data itself ends up being used, typically in a high-level 
 language with rich data types where all of the related data is considered 
 to be a complete unit of its own, typically referred to as a record 
 or object depending on the language. 
 Converting information from the SQL world into the programming world is 
 difficult because the two simply have very different models of the way 
 data is organized. This problem is widely known as impedance 
 mismatch in the industry, and mapping from one model to the other 
 typically takes up about 40% of a project's time. A number of mapping 
 solutions, typically referred to as object-relational 
 mapping, are on the market, but they tend to be expensive and have 
 problems of their own, notably performance. 
 PostgreSQL can solve many of these issues directly in the database. 
 PostgreSQL allows the user to define new types based out of the normal SQL 
 types, allowing the database itself to understand complex data. For 
 instance, you can define an address to consist of several 
 strings for things like street number, city and country. From that point 
 on one can easily create tables containing all the fields needed to hold 
 an address with a single line. 
 PostgreSQL also allows types to include inheritance, one of the major 
 concepts in object-oriented 
 programming. For instance, one could define a post_code 
 type, and then create us_zip_code and 
 canadian_postal_code based on them. Addresses could then be 
 specialized for us_address and canadian_address, 
 including specialized rules to validate the data in each case. 
 Another very useful feature is that PostgreSQL can directly understand 
 the relationships that exist between tables. People in the real world 
 typically have several addresses, which in the relational model is stored 
 by placing the addresses in one table and the rest of the user information 
 in another. The addresses are "related" to a particular user by storing 
 some unique information, say the user's name, in the address table itself. 
 In order to find all the addresses for "Bob Smith", the user writes a 
 query that "joins" the data back together, by selecting a particular name 
 from the users table and then searching for that name in the address 
 table. Doing a search for all the users in New York is somewhat complex, 
 requiring the database to find all the user names in the address table, 
 then search the user table for those users. A typical search might look 
 like this: 
 SELECT u.* FROM user u, address a WHERE a.city='New York' and 
 a.user_name=u.user_name 
 In PostgreSQL the relationship between users and addresses can be 
 explicitly defined. Once defined the address becomes a property of the 
 user, so the search above can be simplified greatly to: 
 SELECT * FROM user WHERE address.city='New York' 
 This code requires no "join": the database itself understands the 
 user.address relationship. A related example shows the power of types, if 
 one uses Postgres to do: 
 SELECT address FROM user 
 The results will be broken out automatically, returning only those 
 addresses for users, not those for companies or other objects that might 
 use the address table. 
 Finally the programming of the database itself is greatly enhanced due 
 to functions. Most SQL systems allow you to write a stored 
 procedure, a block of SQL code that can be called in other SQL 
 statements. However SQL itself is a poor programming language, and complex 
 logic is very difficult to create. Worse, many of the most basic 
 operations in a programming language, like branching and looping, are not 
 supported in SQL itself. Instead each vendor has written their own 
 extensions to the SQL language to add these features, which are not cross 
 platform. 
 In PostgreSQL you can write the logic in most any language you choose, 
 and the list of supported languages is growing with every release. The 
 code is then inserted into the server as a function, a small 
 wrapper that makes the code appear as if it were a stored procedure. In 
 this way SQL code can call (for instance) C code and vice-versa, 
 dramatically increasing simplicity and performance. 
 These advantages add up to making PostgreSQL easily the most advanced 
 database system from a programming perspective, which is one reason for 
 the success of Illustra. Using PostgreSQL can dramatically reduce overall 
 programming time on many projects, with the advantages growing with 
 project complexity. 
 Features 
 Some features of PostgreSQL rarely found in other relational 
 databases include: 
 
 - User-defined types 
 
 - User-defined operators 
 
 - Availability of multiple stored 
 procedure languages, including C, 
 SQL, Perl, Python, 
 Tcl, Ruby, 
 Parrot, 
 shell 
 script, or the native PL/PgSQL 
 
 - Support for geographic objects via PostGIS 
 
 - Concurrency managed via a Multi-Version Concurrency Control (MVCC) 
 design, which ensures excellent performance even under heavy concurrent 
 access circumstances 
 
 - Table inheritance 
 
 - Rules -- a way of implementing server-side logic that allows the 
 application developer to modify the "query tree" of an incoming query 
 
 - Expressional indexes -- an index that is created on an expression, 
 not necessarily a single column from a table. 
 
 - Partial indexes -- an index that is only created on a portion of a 
 table. This saves disk space and improves performance if only part of 
 the table actually requires an index.
  
 In addition, PostgreSQL supports almost all the constructs expected 
 from an enterprise-level database, including: 
 
 - Referential 
 integrity constraints including foreign key constraints, column 
 constraints, and row checks 
 
 - Triggers 
 
 - Views 
 
 - Outer joins 
 
 - Sub-selects 
 
 - Transactions 
 
 - Strong compliance with the SQL standard (SQL92, SQL1999) 
 
 - Encrypted connections via SSL 
 
 - Binary and textual large-object storage 
 
 - Online backup 
 
 - Domains
  
 Shortcomings 
 One can loosely separate the shortcomings PostgreSQL into: 
 
 - a lack of some advanced features 
 
 - architectural features which cause undesirable side effects
  
 PostgreSQL lacks a good replication solution. Third-party packages 
 exist which address this, however. Replication is used in situations where 
 a single database server cannot keep up with the workload or the cost of 
 database downtime is very high, or both. Of lesser importance are 
 point-in-time recovery, nested transactions and updateable views. 
 In the second group of concerns a need for periodic 
 VACUUMing should be noted. Due to MVCC, when a row is updated 
 or deleted the old version of the row is kept since it may still be in use 
 by another transaction, or the transaction which modified it may rollback. 
 Vacuuming is the process of marking as stale data which the DBMS is 
 certain that will no longer be needed. Old versions of PostgreSQL required 
 an exclusive lock on a table to perform the VACUUM but in 
 recent versions VACUUMing can be performed concurrently with 
 normal database access. This significantly reduces the impact of 
 VACUUMing on other database activity. Nevertheless, it is 
 still considered good practice to schedule VACUUMs during 
 periods of low database load. Failure to VACUUM/VACUUM 
 ANALYZE regularly may result in additional disk space consumption 
 (since stale row versions will not be reused) and in performance loss 
 (because saved table statistics become outdated). In addition, the entire 
 database must be VACUUMed at least once every billion 
 transactions, or else no new transactions may be performed. 
 Standard aggregate functions, such as COUNT, also perform 
 unusually poorly compared to some other database systems in cases where 
 the aggregate is applied to a large portion of a table. Specifically, some 
 other databases will make use of indexes or system metadata to process a 
 query such as SELECT COUNT(*) FROM table; efficiently. Due to 
 the design of PostgreSQL, this optimization cannot easily be 
 implemented. 
 Also, considerable demand exists for a native port to the Microsoft 
 Windows environment. PostgreSQL can run under Windows using the Cygwin Unix 
 emulation library, but offers less than optimal performance and suffers 
 from a complicated installation. The upcoming 7.5 release is expected to 
 include a native Win32 port. 
  
 External links
 
 Newsgroups and Mailing Lists Discussions
 
 | 
  |