One thing that trips me up about postgres are schemas. What is the conceptual difference between a "schema" and a "database"?

In PostgreSQL, the hierarchy is: host > cluster > database > schema > object.

By host, I mean a server. A host can have many database clusters. Usually it has just one. To have more than one, you would have to have more than one PostgreSQL instance running, each listening on a separate port. The default port is 5432. I don't recommend more than one cluster per host. I just mention it as possible.

A cluster can have many databases.

A database can have many schemas.

A schema can have many objects. By object, the most familiar is a table, but there are other kinds of objects: views, functions, custom types, sequences. An object cannot exist directly in the database. It must be part of a schema. The default schema is called "public". Traditionally in other databases, there was a schema for each user. So if jdoe logs in, his default schema is also called jdoe. In fact in other databases this is the only schema a user can have. You cannot make more schemas and name them whatever you wish.

The advantage of a schema over a database is that you can make a query that uses objects in different schemas.

  select *
  from schema1.table1
      join schema2.table2 on table1.col = table2.col
If the tables were in different databases, then you could not combine them as easily. I think you would have to resort to Foreign Data Wrappers.

I have gotten a long way, with many applications over many years, with one host, one cluster, one database, and many schemas.

> I don't recommend more than one cluster per host. I just mention it as possible.

I actually use this as a cheap data backup mechanism. I have a primary database cluster on SSD, and two replicas on two separate harddrives. All running as three postgresql clusters on the same machine.

The harddrives run intentionally on different filesystems, so that filesystem bug will not eat all my DB data. (in case someone wants to suggest raid ;))

This seems complicated and would still fail if something happens to the host hardware itself right? What about projects like wal-e/wal-g that continuously archive to cloud storage?

https://github.com/wal-e/wal-e

https://github.com/wal-g/wal-g