Discussion:
[Gramps-devel] Creating new PostgreSQL database
Thomas Guyot-Sionnest
2017-07-13 13:45:33 UTC
Permalink
Hi,

Has anyone been successful at creating new PostgreSQL databases?

Besides Gramps insisting on replacing my en_CA locale to en_GB (which I
had to install for database creation to work, but more about this at the
bottom of this post), I encountered a few issues as well:

1. Port being ignored - I assume no ones uses custom ports, but my fix
goes further and converts all database.* as keyword args so I can also
set other things in config.ini such as sslmode='verify-full'

2. Empty db detection fails as the query in
gramps/plugins/db/dbapi/postgresql.py uses the wrong parameter type

3. Gramps trying to create BLOB columns - I added a _hack_query line to
convert to bitea


The error I'm getting now is this - I haven't had time to dig deeper...


2017-07-13 09:36:36.451: ERROR: dbloader.py: line 110: relation "person"
does not exist

Traceback (most recent call last):
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
919, in _create_secondary_columns
% (field, table_name))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: column "gender" does not exist
LIGNE 1 : SELECT gender FROM person LIMIT 1
^


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/gramps/gui/dbloader.py", line
196, in read_file
force_python_upgrade)
File "/usr/lib/python3/dist-packages/gramps/gen/db/generic.py", line
591, in load
self._initialize(directory)
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
124, in _initialize
self._create_schema()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
223, in _create_schema
self._create_secondary_columns()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
927, in _create_secondary_columns
% (table_name, field, sql_type))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: relation "person" does not exist



Now back to locale, I see that gramps creates a collation for the user's
locale... I assume multiple users with different locales may connect. Is
that OK or shall we force the locale, ex by adding it to
settings.ini/.py? (i.e. force everyone on a single collation)

If we allow multiple collations, then right now gramps drops the old
collation and creates a new one on connect, so I guess we should only
create new collations and leave the others alone? I haven't really
player with SQL collations anyway so I'd really like some advice here.

Regards,
--
Thomas
Thomas Guyot-Sionnest
2017-07-13 13:49:08 UTC
Permalink
I forgot to add all the fixes mentioned below are in my branch here:

https://github.com/dermoth/gramps/commits/pgsql_fix

Regards,
Post by Thomas Guyot-Sionnest
Hi,
Has anyone been successful at creating new PostgreSQL databases?
Besides Gramps insisting on replacing my en_CA locale to en_GB (which I
had to install for database creation to work, but more about this at the
1. Port being ignored - I assume no ones uses custom ports, but my fix
goes further and converts all database.* as keyword args so I can also
set other things in config.ini such as sslmode='verify-full'
2. Empty db detection fails as the query in
gramps/plugins/db/dbapi/postgresql.py uses the wrong parameter type
3. Gramps trying to create BLOB columns - I added a _hack_query line to
convert to bitea
The error I'm getting now is this - I haven't had time to dig deeper...
2017-07-13 09:36:36.451: ERROR: dbloader.py: line 110: relation "person"
does not exist
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
919, in _create_secondary_columns
% (field, table_name))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: column "gender" does not exist
LIGNE 1 : SELECT gender FROM person LIMIT 1
^
File "/usr/lib/python3/dist-packages/gramps/gui/dbloader.py", line
196, in read_file
force_python_upgrade)
File "/usr/lib/python3/dist-packages/gramps/gen/db/generic.py", line
591, in load
self._initialize(directory)
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
124, in _initialize
self._create_schema()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
223, in _create_schema
self._create_secondary_columns()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
927, in _create_secondary_columns
% (table_name, field, sql_type))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: relation "person" does not exist
Now back to locale, I see that gramps creates a collation for the user's
locale... I assume multiple users with different locales may connect. Is
that OK or shall we force the locale, ex by adding it to
settings.ini/.py? (i.e. force everyone on a single collation)
If we allow multiple collations, then right now gramps drops the old
collation and creates a new one on connect, so I guess we should only
create new collations and leave the others alone? I haven't really
player with SQL collations anyway so I'd really like some advice here.
Regards,
--
Thomas
John Ralls
2017-07-13 14:55:18 UTC
Permalink
Post by Thomas Guyot-Sionnest
Hi,
Has anyone been successful at creating new PostgreSQL databases?
Besides Gramps insisting on replacing my en_CA locale to en_GB (which I
had to install for database creation to work, but more about this at the
1. Port being ignored - I assume no ones uses custom ports, but my fix
goes further and converts all database.* as keyword args so I can also
set other things in config.ini such as sslmode='verify-full'
2. Empty db detection fails as the query in
gramps/plugins/db/dbapi/postgresql.py uses the wrong parameter type
3. Gramps trying to create BLOB columns - I added a _hack_query line to
convert to bitea
The error I'm getting now is this - I haven't had time to dig deeper...
2017-07-13 09:36:36.451: ERROR: dbloader.py: line 110: relation "person"
does not exist
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
919, in _create_secondary_columns
% (field, table_name))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: column "gender" does not exist
LIGNE 1 : SELECT gender FROM person LIMIT 1
^
File "/usr/lib/python3/dist-packages/gramps/gui/dbloader.py", line
196, in read_file
force_python_upgrade)
File "/usr/lib/python3/dist-packages/gramps/gen/db/generic.py", line
591, in load
self._initialize(directory)
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
124, in _initialize
self._create_schema()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
223, in _create_schema
self._create_secondary_columns()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
927, in _create_secondary_columns
% (table_name, field, sql_type))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: relation "person" does not exist
Now back to locale, I see that gramps creates a collation for the user's
locale... I assume multiple users with different locales may connect. Is
that OK or shall we force the locale, ex by adding it to
settings.ini/.py? (i.e. force everyone on a single collation)
If we allow multiple collations, then right now gramps drops the old
collation and creates a new one on connect, so I guess we should only
create new collations and leave the others alone? I haven't really
player with SQL collations anyway so I'd really like some advice here.
It’s calling sqlite3_create_collation() which operates on the connection, not the database. Every run of Gramps gets its own for the locale in effect.

Gramps prevents you from using en_CA because it doesn’t have an en_CA translation. You could work around that by copying /usr/share/locale/en_GB/LC_MESSAGES/gramps.mo to /usr/share/locale/en_CA/LC_MESSAGES/.

Regards,
John Ralls
Nick Hall
2017-07-13 15:30:30 UTC
Permalink
Post by John Ralls
It’s calling sqlite3_create_collation() which operates on the connection, not the database. Every run of Gramps gets its own for the locale in effect.
For postgres, we use the SQL "CREATE COLLATION" command which is not
available in sqlite. It uses the locale specified in the LANG
environment variable, or en_US.utf8 if empty.

Can we get a better value from the GrampsLocale?

Nick.
John Ralls
2017-07-13 16:49:57 UTC
Permalink
It’s calling sqlite3_create_collation() which operates on the connection, not the database. Every run of Gramps gets its own for the locale in effect.
For postgres, we use the SQL "CREATE COLLATION" command which is not available in sqlite. It uses the locale specified in the LANG environment variable, or en_US.utf8 if empty.
Can we get a better value from the GrampsLocale?
Nick,

Oops, should have done a grep -i.

Grampslocale.py sets LC_COLLATE so you *could* use that (using LANG is wrong, it’s the fallback value used when neither LC_ALL nor the specific LC_FOO variable is set) but getenv() is relatively expensive compared to grampslocale.collation.

You don’t really want to have a single collation named “glocale”, especially in pgsql where they’re added to the database because the collation should depend on the locale being used for the query. Reports can be run in locales different from the UI locale and if the SQL query is doing the collation then it needs to use the locale for the report. A set of collations named for each locale would be better. Postgresql already provides that for all of the locales available on the machine it’s running on so you probably don’t need to use CREATE COLLATION. See https://www.postgresql.org/docs/current/static/collation.html <https://www.postgresql.org/docs/current/static/collation.html>.

Regards,
John Ralls
Thomas Guyot-Sionnest
2017-07-13 17:08:57 UTC
Permalink
Post by John Ralls
Post by Nick Hall
It’s calling sqlite3_create_collation() which operates on the
connection, not the database. Every run of Gramps gets its own for
the locale in effect.
For postgres, we use the SQL "CREATE COLLATION" command which is not
available in sqlite. It uses the locale specified in the LANG
environment variable, or en_US.utf8 if empty.
Can we get a better value from the GrampsLocale?
Nick,
Oops, should have done a grep -i.
Grampslocale.py sets LC_COLLATE so you *could* use that (using LANG is
wrong, it’s the fallback value used when neither LC_ALL nor the
specific LC_FOO variable is set) but getenv() is relatively expensive
compared to grampslocale.collation.
You don’t really want to have a single collation named “glocale”,
especially in pgsql where they’re added to the database because the
collation should depend on the locale being used for the query.
Reports can be run in locales different from the UI locale and if the
SQL query is doing the collation then it needs to use the locale for
the report. A set of collations named for each locale would be better.
Postgresql already provides that for all of the locales available on
the machine it’s running on so you probably don’t need to use CREATE
COLLATION.
See https://www.postgresql.org/docs/current/static/collation.html
I tried to match the system collations naively but it didn't seem to
work... This is the result of my idea mentioned elsewhere (hacking up
glocale only for postgresql), plus using LC_COLLATE as you suggested:

https://github.com/dermoth/gramps/commit/7963623574d4e6ecd568c4ed7a28c0981fde41b9

Regards,
--
Thomas
Nick Hall
2017-07-13 18:52:20 UTC
Permalink
Post by Thomas Guyot-Sionnest
Post by John Ralls
Grampslocale.py sets LC_COLLATE so you *could* use that (using LANG
is wrong, it’s the fallback value used when neither LC_ALL nor the
specific LC_FOO variable is set) but getenv() is relatively expensive
compared to grampslocale.collation.
You don’t really want to have a single collation named “glocale”,
especially in pgsql where they’re added to the database because the
collation should depend on the locale being used for the query.
Reports can be run in locales different from the UI locale and if the
SQL query is doing the collation then it needs to use the locale for
the report. A set of collations named for each locale would be
better. Postgresql already provides that for all of the locales
available on the machine it’s running on so you probably don’t need
to use CREATE COLLATION. See
https://www.postgresql.org/docs/current/static/collation.html
I tried to match the system collations naively but it didn't seem to
work... This is the result of my idea mentioned elsewhere (hacking up
https://github.com/dermoth/gramps/commit/7963623574d4e6ecd568c4ed7a28c0981fde41b9
I have created an alternative suggestion:

https://github.com/gramps-project/gramps/pull/434

Unfortunately the characters "." and "-" are not valid in a sqlite
collation, so I have kept it as "glocale".

For PostgreSQL, I have used glocale.collation as John suggested, but I
haven't installed PostgreSQL to test it.

Nick.
Thomas Guyot-Sionnest
2017-07-13 20:54:31 UTC
Permalink
Post by Nick Hall
https://github.com/gramps-project/gramps/pull/434
Unfortunately the characters "." and "-" are not valid in a sqlite
collation, so I have kept it as "glocale".
For PostgreSQL, I have used glocale.collation as John suggested, but I
haven't installed PostgreSQL to test it.
What does glocale.collation looks like exactly? (well I can find out
too...) I was concerned with some slight difference between system
collations names (ex utf8 vs UTF-8), and also even after installing all
locales and restarting PostgreSQL I do not see all locales in my system
catalog, but I can create them fine (with the one side effect I noticed:
I have to include the public schema name to avoid clashes with the
system catalog on create/drop...)

I think the safest option is still to create collations in PostgreSQL...
I will merge both patches when I have a chance and submit a PR.

For sqlite, have you tried quoting the collation names? The SQLite API
does not say anything about valid characters, and to use . and - in
PostgreSQL collations I had to double-quote them... That could unify
both interfaces...

Regards,
--
Thomas
Nick Hall
2017-07-13 22:34:00 UTC
Permalink
Post by Thomas Guyot-Sionnest
For sqlite, have you tried quoting the collation names? The SQLite API
does not say anything about valid characters, and to use . and - in
PostgreSQL collations I had to double-quote them... That could unify
both interfaces...
I can double-quote collation names in a SELECT statement, but the python
bindings prevent me from creating a collation name containing any
character that is not alphanumeric or an underscore.

Nick.
John Ralls
2017-07-14 00:08:57 UTC
Permalink
For sqlite, have you tried quoting the collation names? The SQLite API does not say anything about valid characters, and to use . and - in PostgreSQL collations I had to double-quote them... That could unify both interfaces...
I can double-quote collation names in a SELECT statement, but the python bindings prevent me from creating a collation name containing any character that is not alphanumeric or an underscore.
The locale.collation value should be something like fr_FR.UTF8 and you can lose the .UTF8 part, it's redundant.

Regards,
John Ralls
Thomas Guyot-Sionnest
2017-07-13 16:39:10 UTC
Permalink
Post by John Ralls
Post by Thomas Guyot-Sionnest
Now back to locale, I see that gramps creates a collation for the user's
locale... I assume multiple users with different locales may connect. Is
that OK or shall we force the locale, ex by adding it to
settings.ini/.py? (i.e. force everyone on a single collation)
If we allow multiple collations, then right now gramps drops the old
collation and creates a new one on connect, so I guess we should only
create new collations and leave the others alone? I haven't really
player with SQL collations anyway so I'd really like some advice here.
It’s calling sqlite3_create_collation() which operates on the connection, not the database. Every run of Gramps gets its own for the locale in effect.
Gramps prevents you from using en_CA because it doesn’t have an en_CA translation. You could work around that by copying /usr/share/locale/en_GB/LC_MESSAGES/gramps.mo to /usr/share/locale/en_CA/LC_MESSAGES/.
This is on PostgreSQL, so what it does is:

locale = os.environ.get('LANG', 'en_US.utf8')
self.execute("DROP COLLATION IF EXISTS glocale")
self.execute("CREATE COLLATION glocale (LOCALE = '%s')" % locale)

This is global, then the dbapi uses that glocale collation on queries,
ex from dbapi.py:

"ORDER BY page COLLATE glocale"

I think if we wanted to support per-user collations, we should name each
collation the same as the system's collation being used, ex the
postgresql connect code would do:

locale = os.environ.get('LANG', 'en_US.utf8')
self.execute('DROP COLLATION IF EXISTS "%s"' % locale)
self.execute("""CREATE COLLATION "%s" (LOCALE = '%s')" %
(locale, locale))

Then each query would embed the collation name in the query.

I haven't checked how compatible this is with sqlite. A less intrusive
approach is to use _hack_query to replace glocale only for postgresql.

Which option do you think sounds the best? I could quickly hack up the
2nd approach and submit a PR if you'd like to evaluate something more
concrete...

Regards,
--
Thomas
Nick Hall
2017-07-13 15:51:23 UTC
Permalink
Post by Thomas Guyot-Sionnest
1. Port being ignored - I assume no ones uses custom ports, but my fix
goes further and converts all database.* as keyword args so I can also
set other things in config.ini such as sslmode='verify-full'
2. Empty db detection fails as the query in
gramps/plugins/db/dbapi/postgresql.py uses the wrong parameter type
3. Gramps trying to create BLOB columns - I added a _hack_query line to
convert to bitea
Yes. At least one other person uses the port setting. However, the
configuration involves editing the "setting.py" file copied into the
database directory. I have never likes this and prefer your approach.
We should only need the "settings.ini" file.

Also, I think that all the common options should be configurable from
the GUI.

Errors in (2) and (3) have already been fixed in PR 421. See:

https://github.com/gramps-project/gramps/pull/421


Nick.
Thomas Guyot-Sionnest
2017-07-13 17:27:34 UTC
Permalink
Post by Nick Hall
Post by Thomas Guyot-Sionnest
1. Port being ignored - I assume no ones uses custom ports, but my fix
goes further and converts all database.* as keyword args so I can also
set other things in config.ini such as sslmode='verify-full'
2. Empty db detection fails as the query in
gramps/plugins/db/dbapi/postgresql.py uses the wrong parameter type
3. Gramps trying to create BLOB columns - I added a _hack_query line to
convert to bitea
Yes. At least one other person uses the port setting. However, the
configuration involves editing the "setting.py" file copied into the
database directory. I have never likes this and prefer your
approach. We should only need the "settings.ini" file.
Also, I think that all the common options should be configurable from
the GUI.
https://github.com/gramps-project/gramps/pull/421
Thanks for pointing it out, I was developing on master instead of
maintenance/gramps50.

My branch is now based on the maintenance branch, so the two duplicate
commits were dropped.

https://github.com/dermoth/gramps/commits/pgsql_fix

I wan submit a PR if we're all happy with it... I'm still having the
other issue though, could be caused because of my change to the SQL
isolation mode, let me try that.

Regards,
--
Thomas
Thomas Guyot-Sionnest
2017-07-13 18:04:59 UTC
Permalink
Post by Thomas Guyot-Sionnest
The error I'm getting now is this - I haven't had time to dig deeper...
2017-07-13 09:36:36.451: ERROR: dbloader.py: line 110: relation "person"
does not exist
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
919, in _create_secondary_columns
% (field, table_name))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: column "gender" does not exist
LIGNE 1 : SELECT gender FROM person LIMIT 1
^
File "/usr/lib/python3/dist-packages/gramps/gui/dbloader.py", line
196, in read_file
force_python_upgrade)
File "/usr/lib/python3/dist-packages/gramps/gen/db/generic.py", line
591, in load
self._initialize(directory)
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
124, in _initialize
self._create_schema()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
223, in _create_schema
self._create_secondary_columns()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
927, in _create_secondary_columns
% (table_name, field, sql_type))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: relation "person" does not exist
Following up on this, the fix I found so far is to commit after table
creation and again after secondary column creation... I think the
exception that are part of the normal setup process causes PostgreSQL to
rollback and drop previously created tables.

The problem I see with this is that the Person table is used to detect
if the database is complete, so a partial configuration will leave
begins a broken database.

The solutions I'm looking into:

1. Make PostgreSQL driver not rollback on exceptions, if possible.

2. Add a check_index method to test indexes as well.

3. Use savepoints

I will submit a PR when ready.

Regards,
--
Thomas
Thomas Guyot-Sionnest
2017-07-14 06:12:12 UTC
Permalink
Post by Thomas Guyot-Sionnest
Post by Thomas Guyot-Sionnest
The error I'm getting now is this - I haven't had time to dig deeper...
2017-07-13 09:36:36.451: ERROR: dbloader.py: line 110: relation "person"
does not exist
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
919, in _create_secondary_columns
% (field, table_name))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: column "gender" does not exist
LIGNE 1 : SELECT gender FROM person LIMIT 1
^
File "/usr/lib/python3/dist-packages/gramps/gui/dbloader.py", line
196, in read_file
force_python_upgrade)
File "/usr/lib/python3/dist-packages/gramps/gen/db/generic.py", line
591, in load
self._initialize(directory)
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
124, in _initialize
self._create_schema()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
223, in _create_schema
self._create_secondary_columns()
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/dbapi.py", line
927, in _create_secondary_columns
% (table_name, field, sql_type))
File
"/usr/lib/python3/dist-packages/gramps/plugins/db/dbapi/postgresql.py",
line 92, in execute
self.__cursor.execute(sql, args, **kwargs)
psycopg2.ProgrammingError: relation "person" does not exist
Following up on this, the fix I found so far is to commit after table
creation and again after secondary column creation... I think the
exception that are part of the normal setup process causes PostgreSQL to
rollback and drop previously created tables.
The problem I see with this is that the Person table is used to detect
if the database is complete, so a partial configuration will leave
begins a broken database.
1. Make PostgreSQL driver not rollback on exceptions, if possible.
2. Add a check_index method to test indexes as well.
3. Use savepoints
I will submit a PR when ready.
I could not beat it - even using savepoints and removing the rollback in
postgresql's execute() did not help. I ran a trace and there were no
other rollbacks, but the transaction was still gone after the first
exception.

I'm not sure if this is a bug, but for now the only option besides
having backend-specific code to detect columns is to commit early.
Option #2 could help explaining the error but the current code does not
skip existing tables anyway.

Added to PR #435 (https://github.com/gramps-project/gramps/pull/435)
along with the other PostgreSQL fixes.

Regards,
--
Thomas
Nick Hall
2017-07-14 22:03:03 UTC
Permalink
Post by Thomas Guyot-Sionnest
I'm not sure if this is a bug, but for now the only option besides
having backend-specific code to detect columns is to commit early.
This isn't a good idea. The schema creation should be within a single
transaction.

When the schema is created we create the tables, then we add some
columns dynamically, and finally we create the indexes. The problem is
that some of the columns created dynamically are already in the tables.
I suggest that we remove them.

Nick.

Loading...