Discussion:
Gramps 5.0: Tree Views
(too old to reply)
Doug Blank
2015-08-15 13:37:17 UTC
Permalink
Devs,

I've finally had some time to explore Nick's view-models branch [1] on
master. I think that this is the way forward; let me explain why.

Currently and for some time, all tree views in Gramps work through a
complicated set of callbacks. The idea is that instead of loading the
entire set of data into memory (the model), it will load and unload only
the visible data. Thus, as you hover your mouse, or scroll up and down, the
database is being constantly queried to show the visible data.

Nick's prototype removes those complications, trading a longer start up
time (all data is loaded into the model first) for no further
activity---there is no reason for any callbacks as you view the data.

The original choice was fine at the time: Gtk models/trees weren't very
good, computers didn't typically have as much memory as they do now, and we
had a single, local database connection. All of these are now different.

Also:

1. I think that there are bugs in the current system; it has a
Least-Recently-Used (LRU) cache that is supposed to prevent further
database access, but I see a lot of activity. But even with that, there
would be a lot of code running as you hover/scroll. Nick's prototype
removes all of this.

2. We can optimize these model loads at the DB-level. Currently, we must
access BSDDB one record at a time. But with other backends, we can get
groups of data with one db access. So, we could add methods like
db.load_person_model(model) to the database. It wouldn't change the access
time for BSDDB, but others could substantially be sped up.

3. It may be the case that loading all of the data into memory might not be
a good/viable solution for everyone (limited memory) or for certain sizes
(millions of records). We can have a max-count option over which we could
present the view in "pages" so that we only load a subset of the data, with
"prev" and "next" buttons. We could also have a "select" column to keep
track of selected items no longer on the current page.

4. I turned on some messaging in master with Nick's prototype whenever the
database was accessed. Briefly, we access the database way too often,
sometimes repeatedly for the same information. We have treated db access as
if it were free, and it isn't, even when it is pretty fast. Removing these
unnecessary queries will make the system faster, for all backends.

5. Nick's prototype isn't done---there are still some linear scans that we
can improve on, either with better code, or with additional database
indexes. For example, we don't need to scan the full people table for a
list of surnames, we can get that from db.get_surname_list().

This is a big change, and one that we can keep refining. (The two-pane
views are interesting, but that is not the main point.) I've brought the
branch over to a master branch [2].

Let's merge this and keep refining.

-Doug

[1] - https://github.com/Nick-Hall/gramps/tree/view-models
[2] - https://github.com/gramps-project/gramps/tree/view-models
Nick Hall
2015-08-15 14:42:12 UTC
Permalink
My prototype was written as a proof of concept. It was never intended
to merge it into master.

A model-view-controller architecture is used by Gtk. The TreeView
widget is the view-controller component. Two models are provided:
ListStore and TreeStore.

https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller

At the moment we display every object in the Gramps list views. Using
the standard models used large amounts of memory and the views were slow
to load. This is why we wrote our own models. The "callbacks" mentioned
in recent posts are just the communication between the model and the
TreeView. The only difference is the model used. The number of calls
will be the same. Having said that, the Gtk models are written in C,
better tested, and likely to have less bugs. The Gtk models also access
data in memory rather than in the database.

I would like to see us switch to using the standard models. However,
there are some points to consider:

1. The selectors use the same models as the views. Slow loading for
selectors is not acceptable.
2. The time taken to build a model is proportional to both the number or
rows and the number of columns.
3. The search bar searches the column contents. Filters search the
database.

To get good performance from Gtk models, we should limit the number of
rows displayed. Perhaps we should only display the results of a search?

Displaying two flat views instead of one tree view is also a way of
reducing the rows displayed. Indexes can be used to speed up
performance here. I remember that Tim raised concerns about this approach.

Should the search bar use filters? It would certainly make the code neater.

Perhaps we should also update the selectors. Would adding tabs for
bookmarks and recently used objects be useful?

I think we need some proper design here, rather than just merging code
from a quickly written prototype.


Nick.
Post by Doug Blank
I've finally had some time to explore Nick's view-models branch [1] on
master. I think that this is the way forward; let me explain why.
Currently and for some time, all tree views in Gramps work through a
complicated set of callbacks. The idea is that instead of loading the
entire set of data into memory (the model), it will load and unload
only the visible data. Thus, as you hover your mouse, or scroll up and
down, the database is being constantly queried to show the visible data.
Nick's prototype removes those complications, trading a longer start
up time (all data is loaded into the model first) for no further
activity---there is no reason for any callbacks as you view the data.
The original choice was fine at the time: Gtk models/trees weren't
very good, computers didn't typically have as much memory as they do
now, and we had a single, local database connection. All of these are
now different.
1. I think that there are bugs in the current system; it has a
Least-Recently-Used (LRU) cache that is supposed to prevent further
database access, but I see a lot of activity. But even with that,
there would be a lot of code running as you hover/scroll. Nick's
prototype removes all of this.
2. We can optimize these model loads at the DB-level. Currently, we
must access BSDDB one record at a time. But with other backends, we
can get groups of data with one db access. So, we could add methods
like db.load_person_model(model) to the database. It wouldn't change
the access time for BSDDB, but others could substantially be sped up.
3. It may be the case that loading all of the data into memory might
not be a good/viable solution for everyone (limited memory) or for
certain sizes (millions of records). We can have a max-count option
over which we could present the view in "pages" so that we only load a
subset of the data, with "prev" and "next" buttons. We could also have
a "select" column to keep track of selected items no longer on the
current page.
4. I turned on some messaging in master with Nick's prototype whenever
the database was accessed. Briefly, we access the database way too
often, sometimes repeatedly for the same information. We have treated
db access as if it were free, and it isn't, even when it is pretty
fast. Removing these unnecessary queries will make the system faster,
for all backends.
5. Nick's prototype isn't done---there are still some linear scans
that we can improve on, either with better code, or with additional
database indexes. For example, we don't need to scan the full people
table for a list of surnames, we can get that from db.get_surname_list().
This is a big change, and one that we can keep refining. (The two-pane
views are interesting, but that is not the main point.) I've brought
the branch over to a master branch [2].
Let's merge this and keep refining.
------------------------------------------------------------------------------
Doug Blank
2015-08-15 15:19:15 UTC
Permalink
Nick, thanks for the additional info. Some comments below.
Post by Nick Hall
My prototype was written as a proof of concept. It was never intended
to merge it into master.
Ok, I guess I would ammend the suggestion to merge a version of the
prototype. It doesn't seem that far away from what is actually needed for
the basic implementation.
Post by Nick Hall
A model-view-controller architecture is used by Gtk. The TreeView
ListStore and TreeStore.
https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
At the moment we display every object in the Gramps list views. Using
the standard models used large amounts of memory and the views were slow
to load. This is why we wrote our own models. The "callbacks" mentioned
in recent posts are just the communication between the model and the
TreeView. The only difference is the model used. The number of calls
will be the same. Having said that, the Gtk models are written in C,
better tested, and likely to have less bugs. The Gtk models also access
data in memory rather than in the database.
There are *a lot* of repeated calls and database access in the current
implementation, even for just hovering over the visible rows. I have
verified with master and different database backends that there is no
additional access with your new views, and there is no delay as one hovers
or scrolls. So, the number of calls is not the same.
Post by Nick Hall
I would like to see us switch to using the standard models. However,
1. The selectors use the same models as the views. Slow loading for
selectors is not acceptable.
2. The time taken to build a model is proportional to both the number or
rows and the number of columns.
What is the solution to these two issues?
Post by Nick Hall
3. The search bar searches the column contents. Filters search the
database.
To get good performance from Gtk models, we should limit the number of
rows displayed. Perhaps we should only display the results of a search?
I don't think we can do that. At least, at first, I would rather just see a
drop-in replacement for current views, but using the standard models.
Post by Nick Hall
Displaying two flat views instead of one tree view is also a way of
reducing the rows displayed. Indexes can be used to speed up
performance here. I remember that Tim raised concerns about this approach.
Those are nice additions, but fundamentally change the way that one uses
the views (can't select two people with different surnames, for example).
But I think that they make nice additions.
Post by Nick Hall
Should the search bar use filters? It would certainly make the code neater.
Yes... the search bar needs some major rethink, both in UI and
implementation.
Post by Nick Hall
Perhaps we should also update the selectors. Would adding tabs for
bookmarks and recently used objects be useful?
Yes.
Post by Nick Hall
I think we need some proper design here, rather than just merging code
from a quickly written prototype.
Yes, of course, but let's do this step-by-step. First, let's replace the
current views as they currently work, but with better UI loading
indicators. Then we can work on a variety of ways to limit the size of the
in-memory data.

How far away is the prototype from just replacing the current views? What
other design needs to be made for that? Is this something that you can
implement? This is really important, and will solve many users issues.

Thanks!

-Doug
Post by Nick Hall
Nick.
Post by Doug Blank
I've finally had some time to explore Nick's view-models branch [1] on
master. I think that this is the way forward; let me explain why.
Currently and for some time, all tree views in Gramps work through a
complicated set of callbacks. The idea is that instead of loading the
entire set of data into memory (the model), it will load and unload
only the visible data. Thus, as you hover your mouse, or scroll up and
down, the database is being constantly queried to show the visible data.
Nick's prototype removes those complications, trading a longer start
up time (all data is loaded into the model first) for no further
activity---there is no reason for any callbacks as you view the data.
The original choice was fine at the time: Gtk models/trees weren't
very good, computers didn't typically have as much memory as they do
now, and we had a single, local database connection. All of these are
now different.
1. I think that there are bugs in the current system; it has a
Least-Recently-Used (LRU) cache that is supposed to prevent further
database access, but I see a lot of activity. But even with that,
there would be a lot of code running as you hover/scroll. Nick's
prototype removes all of this.
2. We can optimize these model loads at the DB-level. Currently, we
must access BSDDB one record at a time. But with other backends, we
can get groups of data with one db access. So, we could add methods
like db.load_person_model(model) to the database. It wouldn't change
the access time for BSDDB, but others could substantially be sped up.
3. It may be the case that loading all of the data into memory might
not be a good/viable solution for everyone (limited memory) or for
certain sizes (millions of records). We can have a max-count option
over which we could present the view in "pages" so that we only load a
subset of the data, with "prev" and "next" buttons. We could also have
a "select" column to keep track of selected items no longer on the
current page.
4. I turned on some messaging in master with Nick's prototype whenever
the database was accessed. Briefly, we access the database way too
often, sometimes repeatedly for the same information. We have treated
db access as if it were free, and it isn't, even when it is pretty
fast. Removing these unnecessary queries will make the system faster,
for all backends.
5. Nick's prototype isn't done---there are still some linear scans
that we can improve on, either with better code, or with additional
database indexes. For example, we don't need to scan the full people
table for a list of surnames, we can get that from db.get_surname_list().
This is a big change, and one that we can keep refining. (The two-pane
views are interesting, but that is not the main point.) I've brought
the branch over to a master branch [2].
Let's merge this and keep refining.
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Nick Hall
2015-08-15 15:32:28 UTC
Permalink
This post might be inappropriate. Click to display it.
Doug Blank
2015-08-15 15:50:21 UTC
Permalink
Post by Nick Hall
A model-view-controller architecture is used by Gtk. The TreeView
Post by Nick Hall
ListStore and TreeStore.
https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
At the moment we display every object in the Gramps list views. Using
the standard models used large amounts of memory and the views were slow
to load. This is why we wrote our own models. The "callbacks" mentioned
in recent posts are just the communication between the model and the
TreeView. The only difference is the model used. The number of calls
will be the same. Having said that, the Gtk models are written in C,
better tested, and likely to have less bugs. The Gtk models also access
data in memory rather than in the database.
There are *a lot* of repeated calls and database access in the current
implementation, even for just hovering over the visible rows. I have
verified with master and different database backends that there is no
additional access with your new views, and there is no delay as one hovers
or scrolls. So, the number of calls is not the same.
I think we are talking about something different here. The Gtk TreeView
uses the TreeModel interface. The calls that the TreeView makes to this
interface are not within our control. They will be made when the mouse
hovers over a row or when the widget is resized.
In the prototype, the only difference is that calls to get_value will
access memory instead of the database. The actual number of calls will
remain the same.
Yes, that makes sense. The current system treats the database as memory
(with the LRU attempting to mitigate the number of actual db access calls).
We want to reduce the database access.

-Doug
Post by Nick Hall
Nick.
Nick Hall
2015-08-15 15:40:23 UTC
Permalink
Post by Nick Hall
I think we need some proper design here, rather than just merging code
from a quickly written prototype.
Yes, of course, but let's do this step-by-step. First, let's replace
the current views as they currently work, but with better UI loading
indicators. Then we can work on a variety of ways to limit the size of
the in-memory data.
How far away is the prototype from just replacing the current views?
What other design needs to be made for that? Is this something that
you can implement? This is really important, and will solve many users
issues.
The code needs to be neatened up. Also we should only load the columns
that are necessary.

As I said earlier, the prototype code was never intended to be merged
into master. It was never actually finished.


Nick.
Doug Blank
2015-08-15 15:53:16 UTC
Permalink
Post by Nick Hall
I think we need some proper design here, rather than just merging code
Post by Nick Hall
from a quickly written prototype.
Yes, of course, but let's do this step-by-step. First, let's replace the
current views as they currently work, but with better UI loading
indicators. Then we can work on a variety of ways to limit the size of the
in-memory data.
How far away is the prototype from just replacing the current views? What
other design needs to be made for that? Is this something that you can
implement? This is really important, and will solve many users issues.
The code needs to be neatened up. Also we should only load the columns
that are necessary.
As I said earlier, the prototype code was never intended to be merged into
master. It was never actually finished.
I understand... let's finish it :) The branch in master should make it easy
to refine, test, and apply. Is that something that you can work into your
plans?

-Doug
Post by Nick Hall
Nick.
Doug Blank
2015-08-15 19:25:30 UTC
Permalink
Post by Doug Blank
Post by Nick Hall
I think we need some proper design here, rather than just merging code
Post by Nick Hall
from a quickly written prototype.
Yes, of course, but let's do this step-by-step. First, let's replace the
current views as they currently work, but with better UI loading
indicators. Then we can work on a variety of ways to limit the size of the
in-memory data.
How far away is the prototype from just replacing the current views? What
other design needs to be made for that? Is this something that you can
implement? This is really important, and will solve many users issues.
The code needs to be neatened up. Also we should only load the columns
that are necessary.
As I said earlier, the prototype code was never intended to be merged
into master. It was never actually finished.
I understand... let's finish it :) The branch in master should make it
easy to refine, test, and apply. Is that something that you can work into
your plans?
Hmmm, using a Very Large Tree, loading into the prototype views are really
bad, performance-wise. I'm getting over 2 minutes for 153,529 people in the
prototype and just about 20 seconds in the old version. I think we should
only make this change if we can get the load time down to something closer
to the old version. I wonder now if the LRU isn't the source of the
issues...

-Doug
Post by Doug Blank
-Doug
Post by Nick Hall
Nick.
Doug Blank
2015-08-15 21:50:30 UTC
Permalink
Post by Doug Blank
Post by Doug Blank
Post by Nick Hall
I think we need some proper design here, rather than just merging code
Post by Nick Hall
from a quickly written prototype.
Yes, of course, but let's do this step-by-step. First, let's replace the
current views as they currently work, but with better UI loading
indicators. Then we can work on a variety of ways to limit the size of the
in-memory data.
How far away is the prototype from just replacing the current views?
What other design needs to be made for that? Is this something that you can
implement? This is really important, and will solve many users issues.
The code needs to be neatened up. Also we should only load the columns
that are necessary.
As I said earlier, the prototype code was never intended to be merged
into master. It was never actually finished.
I understand... let's finish it :) The branch in master should make it
easy to refine, test, and apply. Is that something that you can work into
your plans?
Hmmm, using a Very Large Tree, loading into the prototype views are really
bad, performance-wise. I'm getting over 2 minutes for 153,529 people in the
prototype and just about 20 seconds in the old version. I think we should
only make this change if we can get the load time down to something closer
to the old version. I wonder now if the LRU isn't the source of the
issues...
Here are some timing experiments for my Very Large Tree, using various
backends, with Nick's prototype compared to the old implementation:

Prototype Tree Views:
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s

Old Tree Views:
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s

This is really hard to beat, except for the current slow-scrolling problem
(and a lot of complicated code). I'm wondering if we shouldn't look into
the root of the slow-scrolling problem...

Now I understand your focus, Nick, on trying to get the loaded data into a
smaller subset.

-Doug
Post by Doug Blank
-Doug
Post by Doug Blank
-Doug
Post by Nick Hall
Nick.
Josip
2015-08-15 22:24:28 UTC
Permalink
Post by Doug Blank
Here are some timing experiments for my Very Large Tree, using various
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s
Isn't all that too slow?
Can't data be added to tree from separate thread or multiple threads in
smaller logical chunks. For example in People view add list of surnames
and when they are displayed for each one add firstname etc.
--
Josip

------------------------------------------------------------------------------
Doug Blank
2015-08-16 02:12:44 UTC
Permalink
Post by Josip
Post by Doug Blank
Here are some timing experiments for my Very Large Tree, using various
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s
Isn't all that too slow?
Can't data be added to tree from separate thread or multiple threads in
smaller logical chunks. For example in People view add list of surnames
and when they are displayed for each one add firstname etc.
I think you are right, Josip. Currently, these views are a bit of a hit or
miss as to whether each column is cached or not. If they are not, then it
can be a big slowdown. For example, getting the color for each column, for
each row was hitting the database dozens of times for every scroll or mouse
hover over the data.

But I think we can make it so that only visible data is retrieved, and then
only once.

Nick's point about focusing on what one wants to see is valid, but let's
fix the caching issue and see how bad it is when it is working like it is
supposed to.

What I have found so far:

* only the people view had any caching at all
* tag_color was not cached, and was producing most of the traffic
* people view had only specific columns cached... if you included others,
then no caching for them
* I think we can cache all columns for all tables for all view purposes

There is still the fact that there are a lot of database accesses before
the table is made. I haven't looked into that yet. But it should be the
case (as Josip suggests) that each view should be able to load instantly...
just looking up the few records that are showing.

-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Nick Hall
2015-08-15 23:17:32 UTC
Permalink
Post by Doug Blank
Here are some timing experiments for my Very Large Tree, using various
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s
This is really hard to beat, except for the current slow-scrolling
problem (and a lot of complicated code). I'm wondering if we shouldn't
look into the root of the slow-scrolling problem...
Now I understand your focus, Nick, on trying to get the loaded data
into a smaller subset.
Yes. The current implementation is fast with large views. We use the
LRU cache in an attempt to minimise database access.

Does the user really want to see a list with 50,000 or more rows in it?
Any list with more than 1000 rows should start ringing alarm bells.
Have we really got the UI right? What is the user actually trying to
achieve in the view?

The selectors will have different requirements to the views, but they
use the same models at the moment.

Some years ago someone posted about The Conficius Challenge:

http://www.tamurajones.net/TheConfuciusChallenge.xhtml

We should aim to make Gramps work well with really large trees.


Nick.
Doug Blank
2015-08-15 23:35:53 UTC
Permalink
Post by Doug Blank
Here are some timing experiments for my Very Large Tree, using various
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s
This is really hard to beat, except for the current slow-scrolling problem
(and a lot of complicated code). I'm wondering if we shouldn't look into
the root of the slow-scrolling problem...
Now I understand your focus, Nick, on trying to get the loaded data into a
smaller subset.
Yes. The current implementation is fast with large views. We use the LRU
cache in an attempt to minimise database access.
I've just found a couple of errors in using the LRU's on the people
views... two missing cached items, one on tag_color, and one on person.
I'll fix those in Gramps 4.2 as soon as I get this all tested. Should make
a big difference.
Post by Doug Blank
Does the user really want to see a list with 50,000 or more rows in it?
Any list with more than 1000 rows should start ringing alarm bells. Have
we really got the UI right? What is the user actually trying to achieve in
the view?
Good point.
Post by Doug Blank
The selectors will have different requirements to the views, but they use
the same models at the moment.
http://www.tamurajones.net/TheConfuciusChallenge.xhtml
We should aim to make Gramps work well with really large trees.
Indeed! I'd like to try to meet that challenge with Gramps 5.0.

-Doug
Post by Doug Blank
Nick.
Nick Hall
2015-08-16 11:24:18 UTC
Permalink
Post by Nick Hall
Yes. The current implementation is fast with large views. We use
the LRU cache in an attempt to minimise database access.
I've just found a couple of errors in using the LRU's on the people
views... two missing cached items, one on tag_color, and one on
person. I'll fix those in Gramps 4.2 as soon as I get this all tested.
Should make a big difference.
The columns chosen to be cached are the ones that involve the most
database access. They are the birth and death columns in the person
view which involve accessing the event table.

You could experiment by caching the entire row or expanding the cache
size. However, when the user scrolls out of the cached rows the
scrolling will slow down again.

Nick.
Doug Blank
2015-08-16 12:41:08 UTC
Permalink
Post by Doug Blank
Yes. The current implementation is fast with large views. We use the LRU
Post by Nick Hall
cache in an attempt to minimise database access.
I've just found a couple of errors in using the LRU's on the people
views... two missing cached items, one on tag_color, and one on person.
I'll fix those in Gramps 4.2 as soon as I get this all tested. Should make
a big difference.
The columns chosen to be cached are the ones that involve the most
database access. They are the birth and death columns in the person view
which involve accessing the event table.
You could experiment by caching the entire row or expanding the cache
size. However, when the user scrolls out of the cached rows the scrolling
will slow down again.
I have a solution that caches all of the database accesses; each item
visible is only looked up once (with the last 250 rows in the Least
Recently Used queue).

Like you say, as you move the scrollbar, you will see new data, it is
looked up, and cached.

I am also experimenting with making the LRU size much, much larger (10k or
100k) than it was previously (250). I think that this makes sense, because
Nick's prototype loads all of the data first into the model. In this
version (with a large cache size) the model is small, but the data will be
moved in and out of cache. Of course, we can make this a config option, and
the user can decide how much space-time tradeoff they wish to make.

Currently, there is a linear scan to get the raw data of the table into the
model. But along Josip's suggestion, these are not required before
loading... only that there is space to put them later. In that way, we only
need a mapping from row number to the handle (in sorted order) to get the
raw data. This can be done on demand (like the cached items) or in the
background, with the first N being pre-populated. That should give an
instant view in the GUI. We can try that next.

I think this large-cache fix (all db data, large number of entries) will
work, but will need a lot of testing to see if anything breaks (making sure
cache is cleaned, updated, etc). Probably a Gramps 5.0 only "fix".

-Doug
Post by Doug Blank
Nick.
Doug Blank
2015-08-16 14:46:28 UTC
Permalink
Post by Doug Blank
Post by Nick Hall
Yes. The current implementation is fast with large views. We use the
Post by Nick Hall
LRU cache in an attempt to minimise database access.
I've just found a couple of errors in using the LRU's on the people
views... two missing cached items, one on tag_color, and one on person.
I'll fix those in Gramps 4.2 as soon as I get this all tested. Should make
a big difference.
The columns chosen to be cached are the ones that involve the most
database access. They are the birth and death columns in the person view
which involve accessing the event table.
You could experiment by caching the entire row or expanding the cache
size. However, when the user scrolls out of the cached rows the scrolling
will slow down again.
I have a solution that caches all of the database accesses; each item
visible is only looked up once (with the last 250 rows in the Least
Recently Used queue).
Like you say, as you move the scrollbar, you will see new data, it is
looked up, and cached.
I am also experimenting with making the LRU size much, much larger (10k or
100k) than it was previously (250). I think that this makes sense, because
Nick's prototype loads all of the data first into the model. In this
version (with a large cache size) the model is small, but the data will be
moved in and out of cache. Of course, we can make this a config option, and
the user can decide how much space-time tradeoff they wish to make.
Currently, there is a linear scan to get the raw data of the table into
the model. But along Josip's suggestion, these are not required before
loading... only that there is space to put them later. In that way, we only
need a mapping from row number to the handle (in sorted order) to get the
raw data. This can be done on demand (like the cached items) or in the
background, with the first N being pre-populated. That should give an
instant view in the GUI. We can try that next.
I think this large-cache fix (all db data, large number of entries) will
work, but will need a lot of testing to see if anything breaks (making sure
cache is cleaned, updated, etc). Probably a Gramps 5.0 only "fix".
I have committed the new large cache fix to master [1] and noted in the bug
report [2]. Details:

1. All database access, and some other costly operations
(serialize/unserialize, etc), are now cached in all views.
2. The cache has a dict for each object handle that saves LRU columns (by
col number), and other values (by name)
3. LRU Cache size is currently set to 10k
4. There should be no database access when hovering over visible data;
moving to data not seen before will trigger all of the standard database
access
5. There is still a linear startup for all views. This means that the
larger the table, the longer it takes to load. These times won't have
changed from those reported yesterday (using old views).
6. This is basically a "paged view" with a scrollbar.
6. It would trivial now to "next", "prev" buttons and limit the records
shown to chunks, with a faster start time (linear, but only for a limited
number). Nothing else would need to change.

If this works well, it *could* be considered for going into 4.2.1... there
are a lot of changes, but they are all self-contained
to gramps/gui/views/treemodels/.

Thanks all for explaining the issue, and persistently following the
problem. This is a better solution than attempting to load all up front.

-Doug

[1]
https://github.com/gramps-project/gramps/commit/9b93a812d39bad20d254c781cde2ab42c569800e
[2] https://gramps-project.org/bugs/view.php?id=8377
Post by Doug Blank
-Doug
Post by Nick Hall
Nick.
Josip
2015-08-16 16:04:43 UTC
Permalink
Post by Doug Blank
I have committed the new large cache fix to master [1] and noted in the
1. All database access, and some other costly operations
(serialize/unserialize, etc), are now cached in all views.
2. The cache has a dict for each object handle that saves LRU columns
(by col number), and other values (by name)
3. LRU Cache size is currently set to 10k
4. There should be no database access when hovering over visible data;
moving to data not seen before will trigger all of the standard database
access
5. There is still a linear startup for all views. This means that the
larger the table, the longer it takes to load. These times won't have
changed from those reported yesterday (using old views).
6. This is basically a "paged view" with a scrollbar.
6. It would trivial now to "next", "prev" buttons and limit the records
shown to chunks, with a faster start time (linear, but only for a
limited number). Nothing else would need to change.
If this works well, it *could* be considered for going into 4.2.1...
there are a lot of changes, but they are all self-contained
to gramps/gui/views/treemodels/.
I think this doesn't fix slow scroling problem.
TreeView is still slow if node have more then few hundred children, with
bigger data view is unusable.
--
Josip

------------------------------------------------------------------------------
Doug Blank
2015-08-16 18:46:51 UTC
Permalink
Post by Josip
Post by Doug Blank
I have committed the new large cache fix to master [1] and noted in the
1. All database access, and some other costly operations
(serialize/unserialize, etc), are now cached in all views.
2. The cache has a dict for each object handle that saves LRU columns
(by col number), and other values (by name)
3. LRU Cache size is currently set to 10k
4. There should be no database access when hovering over visible data;
moving to data not seen before will trigger all of the standard database
access
5. There is still a linear startup for all views. This means that the
larger the table, the longer it takes to load. These times won't have
changed from those reported yesterday (using old views).
6. This is basically a "paged view" with a scrollbar.
6. It would trivial now to "next", "prev" buttons and limit the records
shown to chunks, with a faster start time (linear, but only for a
limited number). Nothing else would need to change.
If this works well, it *could* be considered for going into 4.2.1...
there are a lot of changes, but they are all self-contained
to gramps/gui/views/treemodels/.
I think this doesn't fix slow scroling problem.
TreeView is still slow if node have more then few hundred children, with
bigger data view is unusable.
Can you describe a bit more about how you are testing?

You using an example gramps tree? Which one?
What OS? What kind of computer/harddrive? BSDDB backend? How does it work
with DictionaryDB or DBAPI?
Which view exactly? What Gtk libraries do you have?
If you don't scroll, but just click on a place in the scrollbar, does it
load that view reasonably?

We may have to disable the scrollbar when the size is deemed to be too big,
mostly to manage expectations. I am fairly certain that it is doing the
minimum number of disk accesses.

On the bug report there is an additional conversation about "real_path"...
I didn't explore that, nor any Gtk issue.

-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Doug Blank
2015-08-16 18:53:03 UTC
Permalink
Post by Doug Blank
Post by Josip
Post by Doug Blank
I have committed the new large cache fix to master [1] and noted in the
1. All database access, and some other costly operations
(serialize/unserialize, etc), are now cached in all views.
2. The cache has a dict for each object handle that saves LRU columns
(by col number), and other values (by name)
3. LRU Cache size is currently set to 10k
4. There should be no database access when hovering over visible data;
moving to data not seen before will trigger all of the standard database
access
5. There is still a linear startup for all views. This means that the
larger the table, the longer it takes to load. These times won't have
changed from those reported yesterday (using old views).
6. This is basically a "paged view" with a scrollbar.
6. It would trivial now to "next", "prev" buttons and limit the records
shown to chunks, with a faster start time (linear, but only for a
limited number). Nothing else would need to change.
If this works well, it *could* be considered for going into 4.2.1...
there are a lot of changes, but they are all self-contained
to gramps/gui/views/treemodels/.
I think this doesn't fix slow scroling problem.
TreeView is still slow if node have more then few hundred children, with
bigger data view is unusable.
Can you describe a bit more about how you are testing?
You using an example gramps tree? Which one?
What OS? What kind of computer/harddrive? BSDDB backend? How does it work
with DictionaryDB or DBAPI?
Which view exactly? What Gtk libraries do you have?
If you don't scroll, but just click on a place in the scrollbar, does it
load that view reasonably?
Also, you may want to close sidebars and bottombars... those can cause
additional events to fire.

-Doug
Post by Doug Blank
We may have to disable the scrollbar when the size is deemed to be too
big, mostly to manage expectations. I am fairly certain that it is doing
the minimum number of disk accesses.
On the bug report there is an additional conversation about "real_path"...
I didn't explore that, nor any Gtk issue.
-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Josip
2015-08-16 20:31:32 UTC
Permalink
Post by Josip
I think this doesn't fix slow scroling problem.
TreeView is still slow if node have more then few hundred children, with
bigger data view is unusable.
Can you describe a bit more about how you are testing?
You using an example gramps tree? Which one?
What OS? What kind of computer/harddrive? BSDDB backend? How does it
work with DictionaryDB or DBAPI?
Which view exactly? What Gtk libraries do you have?
If you don't scroll, but just click on a place in the scrollbar, does it
load that view reasonably?
We may have to disable the scrollbar when the size is deemed to be too
big, mostly to manage expectations. I am fairly certain that it is doing
the minimum number of disk accesses.
On the bug report there is an additional conversation about
"real_path"... I didn't explore that, nor any Gtk issue.
Gramps Settings:
----------------
python : 3.4.3
gramps : 5.0.0-9b93a81
gtk++ : 3.16.6
pygobject : 3.16.2
pango : 1.37.2
bsddb : 6.1.0
bsddb.db : 6.0.30
cairo : 1.14.2
pycairo : 1.10.1
osmgpsmap : 1.0
GExiv2 : 0.10
ICU : 55.1
PyICU : 1.9.2
o.s. : win32

sqlite3-3.8.11.1
sqlite3-2.6.0

Windows-8
i7-4700MQ CPU @ 2.40GHz
ST1000LM024 1TB HN M101MBB SATA
(http://www.seagate.com/files/staticfiles/support/docs/samsung-ds/100698122c.pdf)


Testing with yours places2 on bsddb and dbapi backend.
Biggest problem is scrolling by dragging slider.
Problem is visible in any category which have treeview list:
people 2K, sources 4K and places 60K; with range from noticable lag,
jerky to ui freeze.

Changing flatview/treview as changing category which have them is very
slow. The problem is even greater as that blocks ui.
Gramps completely freeze for long time if closed for place view.

Sidebar and bottombar have no any big impact on scrollbar scrolling.
Even when scrolling thru rows with arrow keys. That should also be faster.

So after all i think that problem lies in model/tree implementation as
in "real_path" bug-ticket
--
Josip

------------------------------------------------------------------------------
Doug Blank
2015-08-17 11:17:11 UTC
Permalink
On Sun, Aug 16, 2015 at 4:31 PM, Josip <***@pisoj.com> wrote:

[snip]

So after all i think that problem lies in model/tree implementation as
Post by Josip
in "real_path" bug-ticket
Yes, I agree that this could be the dominating factor for many
environments... that code was probably causing more trouble than any of the
other problems. Latest master should help with this issue. So now, master
has code to limit database access while scrolling/mouse-hover, limit node
traversal while scrolling/mouse-over, and faster treeview load times
(especially when no filter is in place).

-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Enno Borgsteede
2015-08-16 22:56:47 UTC
Permalink
Doug,
Post by Doug Blank
On the bug report there is an additional conversation about
"real_path"... I didn't explore that, nor any Gtk issue.
Right, and IMO that's where you need to start, because the loops in
treebasemodel:do_get_path() are causing the delay. They're not a problem
when do_get_path() is only called when a user updates or deletes a a
row, but since 4.0, it's called for every line that is scrolled, and you
can't really compensate for that with a faster database, no matter how
big the cash.

When a Gtk.TreeStore is used like Nick built in the early version of his
prototype, before splitting the person view, there is no need for this
implementation of do_get_path() or no need for it to be called that
often, and the scroll delay is gone.

Load times are larger then though, and like Josip I wonder if there is a
way to delay loading, or put it in a thread or something. When I open
the person tree view, only one surname is expanded, so in theory, all
persons under other surnames can be loaded later, when those are
expanded, so that a long load will only occur when I choose expand all.
This assumes that there is a callback for the expansion of a surname
node, or the equivalent action in citations, or places.

regards,

Enno


------------------------------------------------------------------------------
Doug Blank
2015-08-17 09:52:52 UTC
Permalink
Post by Enno Borgsteede
Doug,
Post by Doug Blank
On the bug report there is an additional conversation about
"real_path"... I didn't explore that, nor any Gtk issue.
Right, and IMO that's where you need to start, because the loops in
treebasemodel:do_get_path() are causing the delay.
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results. I wouldn't be surprised if this is the
reason that Gramps 4.0 crashes for some---that code was being run so much
by the system. Please test it out and see if it improves things, and make
sure it doesn't cause any unwanted side-effects.

-Doug
Post by Enno Borgsteede
They're not a problem
when do_get_path() is only called when a user updates or deletes a a
row, but since 4.0, it's called for every line that is scrolled, and you
can't really compensate for that with a faster database, no matter how
big the cash.
When a Gtk.TreeStore is used like Nick built in the early version of his
prototype, before splitting the person view, there is no need for this
implementation of do_get_path() or no need for it to be called that
often, and the scroll delay is gone.
Load times are larger then though, and like Josip I wonder if there is a
way to delay loading, or put it in a thread or something. When I open
the person tree view, only one surname is expanded, so in theory, all
persons under other surnames can be loaded later, when those are
expanded, so that a long load will only occur when I choose expand all.
This assumes that there is a callback for the expansion of a surname
node, or the equivalent action in citations, or places.
regards,
Enno
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Josip
2015-08-17 11:22:18 UTC
Permalink
Post by Doug Blank
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results
It is much better now for big trees.
Before with places2.gramps which have over 60K places scrolling was
imposible as Gramps was quickly get frozen.

Question still remain if we need to override do_get_path in such a way
as we do now or to simply return Gtk.TreePath(0) and real path
calculation put in another method which be called only when needed.

If we don't need real path in scrolling then your solution just masking
the problem. We still make huge amount of unnecessary calculations just
this time we do it a lot faster.

I will try #8377 now in current gramps42 to see how it works.
(It seems to me that you refuse to go that way)
--
Josip

------------------------------------------------------------------------------
Doug Blank
2015-08-17 11:35:38 UTC
Permalink
Post by Josip
Post by Doug Blank
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results
It is much better now for big trees.
Before with places2.gramps which have over 60K places scrolling was
imposible as Gramps was quickly get frozen.
Great! Thanks for testing!
Post by Josip
Question still remain if we need to override do_get_path in such a way
as we do now or to simply return Gtk.TreePath(0) and real path
calculation put in another method which be called only when needed.
If we don't need real path in scrolling then your solution just masking
the problem. We still make huge amount of unnecessary calculations just
this time we do it a lot faster.
I will try #8377 now in current gramps42 to see how it works.
(It seems to me that you refuse to go that way)
That patch on #8377 suggested the caching solution. The patch is a bit
invasive, and would require us always to remember which method to use. The
caching solution in master should be almost as fast (just a dictionary/hash
lookup more) and will work in all situations, and always gives the correct
answer rather than no answer---which could cause problems. Neither of these
solutions keep the method from being called, but both make the calls fast.

-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Josip
2015-08-17 11:36:04 UTC
Permalink
Post by Josip
Post by Doug Blank
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results
It is much better now for big trees.
Before with places2.gramps which have over 60K places scrolling was
imposible as Gramps was quickly get frozen.
Question still remain if we need to override do_get_path in such a way
as we do now or to simply return Gtk.TreePath(0) and real path
calculation put in another method which be called only when needed.
If we don't need real path in scrolling then your solution just masking
the problem. We still make huge amount of unnecessary calculations just
this time we do it a lot faster.
I will try #8377 now in current gramps42 to see how it works.
(It seems to me that you refuse to go that way)
I forgot to say that memory usage quickly grows with each scrolling
--
Josip

------------------------------------------------------------------------------
Doug Blank
2015-08-17 11:44:13 UTC
Permalink
Post by Josip
Post by Josip
Post by Doug Blank
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results
It is much better now for big trees.
Before with places2.gramps which have over 60K places scrolling was
imposible as Gramps was quickly get frozen.
Question still remain if we need to override do_get_path in such a way
as we do now or to simply return Gtk.TreePath(0) and real path
calculation put in another method which be called only when needed.
If we don't need real path in scrolling then your solution just masking
the problem. We still make huge amount of unnecessary calculations just
this time we do it a lot faster.
I will try #8377 now in current gramps42 to see how it works.
(It seems to me that you refuse to go that way)
I forgot to say that memory usage quickly grows with each scrolling
Yes, by design. I'll turn that into a user-configurable setting so one can
control the space-time tradeoff. Cache is set to 10k items currently.

-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Doug Blank
2015-08-17 12:18:25 UTC
Permalink
Post by Doug Blank
Post by Josip
Post by Josip
Post by Doug Blank
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results
It is much better now for big trees.
Before with places2.gramps which have over 60K places scrolling was
imposible as Gramps was quickly get frozen.
Question still remain if we need to override do_get_path in such a way
as we do now or to simply return Gtk.TreePath(0) and real path
calculation put in another method which be called only when needed.
If we don't need real path in scrolling then your solution just masking
the problem. We still make huge amount of unnecessary calculations just
this time we do it a lot faster.
I will try #8377 now in current gramps42 to see how it works.
(It seems to me that you refuse to go that way)
I forgot to say that memory usage quickly grows with each scrolling
Yes, by design. I'll turn that into a user-configurable setting so one can
control the space-time tradeoff. Cache is set to 10k items currently.
Done; default cache size is now 1k, but you can change it with:

./Gramps.py --config=interface.treemodel-cache-size:5000

The larger it is, the more view-data and iter-paths are stored in memory
without having to hit the database or query the nodes for iter paths.

If anyone has suggestions on what a good default size would be, please
advise.

-Doug
Post by Doug Blank
-Doug
Post by Josip
--
Josip
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Enno Borgsteede
2015-08-17 13:11:55 UTC
Permalink
Doug,
Post by Enno Borgsteede
Doug,
Post by Doug Blank
On the bug report there is an additional conversation about
"real_path"... I didn't explore that, nor any Gtk issue.
Right, and IMO that's where you need to start, because the loops in
treebasemodel:do_get_path() are causing the delay.
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results. I wouldn't be surprised if this is
the reason that Gramps 4.0 crashes for some---that code was being run
so much by the system. Please test it out and see if it improves
things, and make sure it doesn't cause any unwanted side-effects.
Like Josip, I see that scrolling is faster. I imported my own tree a
couple of times so that I have about 110,000 persons now. 3.4.9 still
works much smoother though, meaning that the slider follows the mouse
pointer much faster, making much smaller increments when I drag it.

I did some bulk deletes of a few dozen persons to see what happens,
trying to corrupt the cache, and also a couple of surname changes, so
that persons would be moved to another place in the tree. And during one
Post by Enno Borgsteede
g_value_get_int: assertion 'G_VALUE_HOLDS_INT (value)' failed
Gtk.main()
/tmp/buildd/gtk+3.0-3.10.8~8+qiana/./gtk/gtktreeview.c:5411
(gtk_tree_view_bin_draw): assertion `has_next' failed.
There is a disparity between the internal view of the GtkTreeView,
and the GtkTreeModel. This generally means that the model has changed
without letting the view know. Any display from now on is likely to
be incorrect.
2015-08-17 14:38:37.139: ERROR: grampsapp.py: line 107: Unhandled
exception
File
"/home/enno/gramps-enno/gramps/gui/views/treemodels/treebasemodel.py",
line 955, in do_get_iter
node = self.nodemap.node(node.children[_index][1])
IndexError: list index out of range
2015-08-17 14:38:37.181: ERROR: grampsapp.py: line 107: Unhandled
exception
File
"/home/enno/gramps-enno/gramps/gui/views/treemodels/treebasemodel.py",
line 955, in do_get_iter
node = self.nodemap.node(node.children[_index][1])
IndexError: list index out of range
^\Afgesloten
I also noticed that on a surname change, the focus is lost. When I have
a person with Mac Alpine in the suffix, and I move that to the surname,
I expect the view to move to the changed person that's now in the Mac
Alpine surname group, as it does in 3.4.9. That never happens, and
reminds me of things I saw in my earlier experiments, when I messed with
empty callbacks myself. This suggests that on such a change an empty or
non existing path is returned.

I also think that, with the right use of Gtk.TreeStore, we should be
able to eliminate the sort of lookup that's done here, or at least make
it more efficient. I agree with Josip that adding a cache is not a real
cure, and Nick's early prototype shows that it can be avoided.

The only part that I don't know yet is how to speed up loading.

regards,

Enno
Doug Blank
2015-08-17 13:38:27 UTC
Permalink
Post by Doug Blank
Doug,
Post by Enno Borgsteede
Doug,
Post by Doug Blank
On the bug report there is an additional conversation about
"real_path"... I didn't explore that, nor any Gtk issue.
Right, and IMO that's where you need to start, because the loops in
treebasemodel:do_get_path() are causing the delay.
Indeed... I just applied a fix to master that hopeful alleviates this
somewhat by caching the results. I wouldn't be surprised if this is the
reason that Gramps 4.0 crashes for some---that code was being run so much
by the system. Please test it out and see if it improves things, and make
sure it doesn't cause any unwanted side-effects.
Like Josip, I see that scrolling is faster. I imported my own tree a
couple of times so that I have about 110,000 persons now. 3.4.9 still works
much smoother though, meaning that the slider follows the mouse pointer
much faster, making much smaller increments when I drag it.
Thanks for trying this.
Post by Doug Blank
I did some bulk deletes of a few dozen persons to see what happens, trying
to corrupt the cache, and also a couple of surname changes, so that persons
would be moved to another place in the tree. And during one of these
[snip]

That should be fixed in latest master (may have also been an issue prior to
these changes).
Post by Doug Blank
I also noticed that on a surname change, the focus is lost. When I have a
person with Mac Alpine in the suffix, and I move that to the surname, I
expect the view to move to the changed person that's now in the Mac Alpine
surname group, as it does in 3.4.9. That never happens, and reminds me of
things I saw in my earlier experiments, when I messed with empty callbacks
myself. This suggests that on such a change an empty or non existing path
is returned.
Yes, something wrong with that... I will take a look. In may be that I need
to clear the path cache earlier, and maybe disable it for a short period.
Post by Doug Blank
I also think that, with the right use of Gtk.TreeStore, we should be able
to eliminate the sort of lookup that's done here, or at least make it more
efficient. I agree with Josip that adding a cache is not a real cure, and
Nick's early prototype shows that it can be avoided.
Yes, I agree that if we can prevent Gtk from making all of these spurious
requests, that would be better. Even with this cache, if I just wave my
mouse over a treeview, I can get my CPU usage up to about 35%. Not ideal,
but at least it isn't 100%.

I'll leave the Gtk development up to someone with more experience with the
models, stores, and views.
Post by Doug Blank
The only part that I don't know yet is how to speed up loading.
I'm looking over the loading details, and have polished some. Three
possibilities to explore: start with a limited set of data (two panes,
alphabetical subset, etc), page view (prev/next), and bulk model loads by
the DBMS layer.

-Doug
Post by Doug Blank
regards,
Enno
Enno Borgsteede
2015-08-18 12:03:20 UTC
Permalink
Doug,
Post by Enno Borgsteede
I also noticed that on a surname change, the focus is lost. When I
have a person with Mac Alpine in the suffix, and I move that to
the surname, I expect the view to move to the changed person
that's now in the Mac Alpine surname group, as it does in 3.4.9.
That never happens, and reminds me of things I saw in my earlier
experiments, when I messed with empty callbacks myself. This
suggests that on such a change an empty or non existing path is
returned.
Yes, something wrong with that... I will take a look. In may be that I
need to clear the path cache earlier, and maybe disable it for a short
period.
I tried again after you updated master yesterday, and found that focus
change is OK.
Post by Enno Borgsteede
I also think that, with the right use of Gtk.TreeStore, we should
be able to eliminate the sort of lookup that's done here, or at
least make it more efficient. I agree with Josip that adding a
cache is not a real cure, and Nick's early prototype shows that it
can be avoided.
Yes, I agree that if we can prevent Gtk from making all of these
spurious requests, that would be better. Even with this cache, if I
just wave my mouse over a treeview, I can get my CPU usage up to about
35%. Not ideal, but at least it isn't 100%.
I think that gtk-3 does what it was designed for, and we can't easily
prevent that. The big difference that I see with gtk-2 is that on mouse
over, I only see calls to on_get_value in my 3.4.9, and do_get_path in
4.x. The former requires a single row access, while the latter loops
trough persons and surnames to find the path. So, when you move the
mouse over the 10th person in the 100th person (surname) tree, you need
110 row accesses to find that persons path as [99,9].

This happens on mouse over and scrolling, while in 3.4.9 on_get_path is
only called when the focus changes to a person, which is after an edit,
and for some reason that I still don't understand, also on expand all.
It doesn't happen on a single expand, so I have no idea why it would be
necessary for all.
Post by Enno Borgsteede
I'll leave the Gtk development up to someone with more experience with
the models, stores, and views.
Me too, sort of, but I find that having no experience may have helped,
because it made me add print statements everywhere, and be amazed.

If I understand things right, the tree view is a nested double linked
list, and that means that occasionally you need a linear search to find
a person in that tree, where the search time is proportional to the
persons position. What I question is the amount or frequency of such
searches. If Gramps 3.4.9 only calls get_value on mouse over, I see no
reason why Gramps 4.x would need a get_path there. I assume that gtk
knows the path, because it knows what's on screen, and if there is
something like an object handle stored in a display row, all you need is
retrieve the person or citation with that handle. That's get_value, not
get_path.

If you delete a person, you also know where that person is on screen
(path), and because of the double linked list, that delete is a simple
operation. In my mind, only an insert needs a (linear) search, and that
happens on the creation of a new person, or a name change, where a
change of the given name(s) is the simplest, because the person stays in
the same surname node. Here too, a get_path does not seem to be
necessary, but because it's only done once (or twice) in 3.4.9, it's
cost is low, so I don't worry about that.
Post by Enno Borgsteede
The only part that I don't know yet is how to speed up loading.
I'm looking over the loading details, and have polished some. Three
possibilities to explore: start with a limited set of data (two panes,
alphabetical subset, etc), page view (prev/next), and bulk model loads
by the DBMS layer.
When I start 3.4.9 with the person tree view, I see all surnames, and
all persons named Borgsteede, because I set myself as the home person.
And IMO that itself is a limited set of data, because the persons with
the other surnames don't need to be loaded. As far as I'm concerned,
they can be easily loaded when I expand another surname, or click expand
all, in which case, I think it's perfectly acceptable for Gramps to show
a sort of progress bar on the latter, like it already does sometimes.

This idea is based on the assumption that there is a callback for an
expand event, which looks quite logical to me, but I don't know enough
of gtk-3 to be sure. An alternative is that, knowing the home person,
Gramps itself adds the surname nodes, and all persons for the home
person's surname, and delays the other ones, filling the rest of the
tree in another thread, like Josip demonstrated in his file selector.
That is much like what a file manager is doing, when one looks at a
directory that is being filled by Dropbox downloading new files, so it
must be in gtk-3 somewhere, I guess. I know that the file manager that I
use is based on gtk-3, and it can handle that, and scroll fast, even
when thumbnails need to be shown in my pictures directories.

regards,

Enno
Doug Blank
2015-08-18 12:32:24 UTC
Permalink
Thanks, Enno... this is a great summary of what is wrong and how we could
fix it.

To summarize where master is currently:

* added caching to the db-layer in all views
* added gtk do_get_path caching in all treeviews
* rewrote view loading with no-filters to be 2x faster
* removed extra signal that caused GUI to be re-loaded twice when switching
with views open

These changes probably make Gramps more usable with tables up to about 500k
records, depending on computer, and database backend.

Comments inline below:

On Tue, Aug 18, 2015 at 8:03 AM, Enno Borgsteede <***@gmail.com> wrote:

[snip]

I think that gtk-3 does what it was designed for, and we can't easily
Post by Enno Borgsteede
prevent that. The big difference that I see with gtk-2 is that on mouse
over, I only see calls to on_get_value in my 3.4.9, and do_get_path in 4.x.
The former requires a single row access, while the latter loops trough
persons and surnames to find the path. So, when you move the mouse over the
10th person in the 100th person (surname) tree, you need 110 row accesses
to find that persons path as [99,9].
This may be a bug in gtk3. Like you describe, many of the calls to
do_get_path shouldn't be needed. Can someone report/explore that
possibility?

[snip]
Post by Enno Borgsteede
When I start 3.4.9 with the person tree view, I see all surnames, and all
persons named Borgsteede, because I set myself as the home person. And IMO
that itself is a limited set of data, because the persons with the other
surnames don't need to be loaded. As far as I'm concerned, they can be
easily loaded when I expand another surname, or click expand all, in which
case, I think it's perfectly acceptable for Gramps to show a sort of
progress bar on the latter, like it already does sometimes.
This idea is based on the assumption that there is a callback for an
expand event, which looks quite logical to me, but I don't know enough of
gtk-3 to be sure. An alternative is that, knowing the home person, Gramps
itself adds the surname nodes, and all persons for the home person's
surname, and delays the other ones, filling the rest of the tree in another
thread, like Josip demonstrated in his file selector. That is much like
what a file manager is doing, when one looks at a directory that is being
filled by Dropbox downloading new files, so it must be in gtk-3 somewhere,
I guess. I know that the file manager that I use is based on gtk-3, and it
can handle that, and scroll fast, even when thumbnails need to be shown in
my pictures directories.
I think that this is the ultimate solution: make a treeview that looks like
it does today (expand/collapse) but make it work like Nick's two-level
selection. That is, only load the expanded portions when they are needed,
but leave them in the view.

I think that it could be really fast by doing the following:

1. Load the surnames from db.get_surname_list(). This is precomputed in all
database backends, so doesn't require a scan through all people. Similar
first-level browse data could be maintained for all treeviews.

2. Make all of the surnames be expandable, but don't put anything in them.

3. As a person is selected, add them to the treeview

For flat views, the only other item (besides a possible Gtk3 fix) is to
make the initial loading of handles faster or delayed. Of course, all of
the changes in master will help with flat views (and treeviews), but
perhaps not enough to make Confucius happy.

I'm about out of time for this summer, so if anyone is interested in
working on any aspect of this, please let us know.

-Doug
Post by Enno Borgsteede
regards,
Enno
Tim Lyons
2015-08-18 16:43:00 UTC
Permalink
Post by Doug Blank
I'm about out of time for this summer, so if anyone is interested in
working on any aspect of this, please let us know.
I appreciate all the work you have done, but could you please publish a
timing comparison for the latest version of Gramps master, with comparisons
of before and after similar to this:

Prototype Tree Views:
DBAPI BSDDB DictDB
People (153k) 31s 11s 29s
Family (56k) 16s 55s 1m12s
Events (362k) 7m33s 7m03s 8m00s

Old Tree Views:
DBAPI BSDDB
People (153k) 13s 11s 13s
Family (56k) 4s 2s 3s
Events (362k) 37s 9s 9s


and this:

Here are the corrected stats, on new and old XML:

XML/old places XML/new places JSON/new
----------------------------------------------------------
gramps42 4m20s
bsddb 45m27s 4m37s 2m10s
dbapi/sqlite 50s 48s 30s
dictionary 24s 20s 26s







--
View this message in context: http://gramps.1791082.n4.nabble.com/Gramps-5-0-Tree-Views-tp4672378p4672490.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Doug Blank
2015-08-18 18:53:21 UTC
Permalink
Post by Tim Lyons
Post by Doug Blank
I'm about out of time for this summer, so if anyone is interested in
working on any aspect of this, please let us know.
I appreciate all the work you have done, but could you please publish a
timing comparison for the latest version of Gramps master, with comparisons
So far, the *view load* changes don't affect timings on my development
environment too much... I have an i7 with SSD. However, those with slower
hard drive response will probably see bigger differences. I could try to
find a larger GEDCOM... perhaps the GedFan ones... just takes a long time
to import.

Scroll/mouse-movement timings will be much different between gramps42 and
master. In fact, Enno has suggested that we try to port this to 4.2 [1]. I
am trying to figure out how to measure the scroll improvements changes...

In any event here are timings on my machine:

Views
People/Group People/Flat Events
gramps42 11s 9s 13s
gramps50/bsddb 10s 7s 10s
gramps50/dbapi 19s 14s 24s
gramps50/dictdb 12s 10s 20s

Records
People: 153,529
Events: 362,661

Note: why is dictdb so slow comparatively speaking? Possibly because it has
to do all sorting itself, whereas the others have DB structures for that.

-Doug

[1] - https://gramps-project.org/bugs/view.php?id=8377#bugnotes
Doug Blank
2015-08-21 15:23:58 UTC
Permalink
On Tue, Aug 18, 2015 at 2:53 PM, Doug Blank <***@gmail.com> wrote:

Earlier, I posted these timings on a large database, with the current/past
Post by Doug Blank
Views
People/Group People/Flat Events
gramps42 11s 9s 13s
gramps50/bsddb 10s 7s 10s
gramps50/dbapi 19s 14s 24s
gramps50/dictdb 12s 10s 20s
Records
People: 153,529
Events: 362,661
Tim noticed that the dbapi (in these tests using sqlite) were "twice as
slow" as the bsddb row. At first, I thought that that could be some
constant overhead, but after looking more deeply, it looks suspiciously
very close to *exactly* twice as slow, perhaps because something is being
done twice (and even the in-memory dictdb is slower than BSDDB).

I looked at the dbapi code at three levels: the generic level (same for all
backends other than BSDDB), the dbapi level (where SQL queries are
defined), and the sqlite-specific level. I found optimizations to make at
all levels, including 21 queries that were doing a full table scan (rather
than using indexes). After making some changes, I verified that no further
unnecessary full scans are being made, and some of number of repetitive
queries was reduced by 66%. Here is the latest timings on master [a93872a]
with my large test database:

People/Group People/Flat Family Events
gramps42 10s 26s 2s 11s
gramps50/bsdbb 8s 6s 2s 8s
gramps50/dbapi 18s 16s 4s 23s
gramps50/dictdb 10s 10s 2s 10s


Unfortunately, the times are largely the same. [The gramps42 People/Flat
may be an anomaly, but I have seen some times fluctuate in a bimodal
fashion, either quick or slow.] The view loading is one of the few places
where our bsddb code does better than the new code. I suspect that this may
be at the algorithm level (in the generic code), and will keep exploring.
For example, I know it isn't a linear search at the DB-layer, but it might
be a linear search in Python code somewhere.

Gramps 4.2.1 will have new code (currently in git) for dealing with the
scrolling and mouse hover Gtk.TreeView issues reported in #8377 [1].

-Doug

[1] - https://gramps-project.org/bugs/view.php?id=8377
Enno Borgsteede
2015-08-28 13:56:12 UTC
Permalink
Doug,
Post by Doug Blank
Tim noticed that the dbapi (in these tests using sqlite) were "twice
as slow" as the bsddb row. At first, I thought that that could be some
constant overhead, but after looking more deeply, it looks
suspiciously very close to *exactly* twice as slow, perhaps because
something is being done twice (and even the in-memory dictdb is slower
than BSDDB).
I looked at the dbapi code at three levels: the generic level (same
for all backends other than BSDDB), the dbapi level (where SQL queries
are defined), and the sqlite-specific level. I found optimizations to
make at all levels, including 21 queries that were doing a full table
scan (rather than using indexes). After making some changes, I
verified that no further unnecessary full scans are being made, and
some of number of repetitive queries was reduced by 66%. Here is the
People/Group People/Flat Family Events
gramps42 10s 26s 2s 11s
gramps50/bsdbb 8s 6s 2s 8s
gramps50/dbapi 18s 16s 4s 23s
gramps50/dictdb 10s 10s 2s 10s
Unfortunately, the times are largely the same. [The gramps42
People/Flat may be an anomaly, but I have seen some times fluctuate in
a bimodal fashion, either quick or slow.] The view loading is one of
the few places where our bsddb code does better than the new code. I
suspect that this may be at the algorithm level (in the generic code),
and will keep exploring. For example, I know it isn't a linear search
at the DB-layer, but it might be a linear search in Python code somewhere.
I'm working back through my email, which gave me a week to think about
this, and I wonder whether this is caused by extra copying of data in
the DB API layer. This is just a wild guess, without looking at the code.

As far as I know, views are filled by reading all
persons/families/events sequentially, so indexes don't play a role at
all. It's just getting blobs from the database, copying column data to
the view, and let the view do the sorting, is that right?

If it is, I see no other way than to minimize or even eliminate the DB
API, and think about direct access to sqlite, or dictdb, if that can be
written to disc. And for sqlite, I would really like to see real columns
in that, so that I can use real SQL to look for duplicate places, etc.

I'm still using 3.4, because my places are too chaotic for 4.2, and find
that in practice BSDDB is quite stable here, as long as I don't start to
instances in Gramps. And even if I do, recovery actually works.
Converting to 5.0 is therefore only interesting if it gives me a
functional advantage, like SQL for all columns.

regards,

Enno
Doug Blank
2015-08-28 14:25:24 UTC
Permalink
This post might be inappropriate. Click to display it.
Tim Lyons
2015-08-16 20:45:26 UTC
Permalink
Post by Nick Hall
Displaying two flat views instead of one tree view is also a way of
reducing the rows displayed. Indexes can be used to speed up
performance here. I remember that Tim raised concerns about this approach.
Yes, I really don't like the idea of changing to two flat views. As Doug
points out, it would "fundamentally change the way that one uses the views
(can't select two people with different surnames, for example)". It reminds
me of the one of the MS Windows File Explorer views that I always found very
hard to use.

Tim.



--
View this message in context: http://gramps.1791082.n4.nabble.com/Gramps-5-0-Tree-Views-tp4672378p4672423.html
Sent from the GRAMPS - Dev mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Nick Hall
2015-08-16 21:07:54 UTC
Permalink
Post by Tim Lyons
Post by Nick Hall
Displaying two flat views instead of one tree view is also a way of
reducing the rows displayed. Indexes can be used to speed up
performance here. I remember that Tim raised concerns about this approach.
Yes, I really don't like the idea of changing to two flat views. As Doug
points out, it would "fundamentally change the way that one uses the views
(can't select two people with different surnames, for example)". It reminds
me of the one of the MS Windows File Explorer views that I always found very
hard to use.
Thanks for reminding me.

It was one of the reasons I abandoned the prototype. I regarded it as a
failed experiment and was surprised that it was being considered for
merging into master.


Nick.


------------------------------------------------------------------------------
Enno Borgsteede
2015-08-16 21:21:57 UTC
Permalink
Nick,
It was one of the reasons I abandoned the prototype. I regarded it as
a failed experiment and was surprised that it was being considered for
merging into master.
In an early version, you built a basic single pane tree view that loaded
all persons/citations in a Gtk Tree, and that one did not have the
scroll delay. Is that code still available? Can I find it by going back
through some file history?

I remember that it increased loading time for my person tree quite a
bit, but the tree view itself scrolled as fast as it does in 3.4.9, i.e.
no slowdown while scrolling down, regardless of tree size.

regards,

Enno


------------------------------------------------------------------------------
Nick Hall
2015-08-16 21:54:41 UTC
Permalink
Post by Enno Borgsteede
It was one of the reasons I abandoned the prototype. I regarded it as
a failed experiment and was surprised that it was being considered for
merging into master.
In an early version, you built a basic single pane tree view that loaded
all persons/citations in a Gtk Tree, and that one did not have the
scroll delay. Is that code still available? Can I find it by going back
through some file history?
I remember that it increased loading time for my person tree quite a
bit, but the tree view itself scrolled as fast as it does in 3.4.9, i.e.
no slowdown while scrolling down, regardless of tree size.
Yes. It is still available in my fork.

The load times were slow, but the scrolling was fast. The two listview
experiment was an attempt to reduce to number of rows loaded.

Nick.


------------------------------------------------------------------------------
Enno Borgsteede
2015-08-17 13:34:23 UTC
Permalink
Nick,
Post by Nick Hall
Post by Enno Borgsteede
It was one of the reasons I abandoned the prototype. I regarded it as
a failed experiment and was surprised that it was being considered for
merging into master.
In an early version, you built a basic single pane tree view that loaded
all persons/citations in a Gtk Tree, and that one did not have the
scroll delay. Is that code still available? Can I find it by going back
through some file history?
I remember that it increased loading time for my person tree quite a
bit, but the tree view itself scrolled as fast as it does in 3.4.9, i.e.
no slowdown while scrolling down, regardless of tree size.
Yes. It is still available in my fork.
In the view-models branch? I noticed that you changed the tree base
model to use Gtk.TreeStore, and then changed the person and citation
views. Can I test it by reversing the changes in the latter, and leave
the base model change in place?
Post by Nick Hall
The load times were slow, but the scrolling was fast. The two listview
experiment was an attempt to reduce to number of rows loaded.
I remember that, and I hope that there is another way for that, like
only loading rows when a user expands a top level element. I mean, when
I open the person tree view, I see all surnames, and only my own
expanded, so the number of display rows is probably just a few hundred
or thousands at that time, and I wouldn't have a problem waiting for an
expand all, as long as single surname expands are reasonably fast. Same
for other tree views of course.

regards,

Enno


------------------------------------------------------------------------------
Doug Blank
2015-08-16 21:38:10 UTC
Permalink
Post by Tim Lyons
Post by Tim Lyons
Post by Nick Hall
Displaying two flat views instead of one tree view is also a way of
reducing the rows displayed. Indexes can be used to speed up
performance here. I remember that Tim raised concerns about this approach.
Yes, I really don't like the idea of changing to two flat views. As Doug
points out, it would "fundamentally change the way that one uses the
views
Post by Tim Lyons
(can't select two people with different surnames, for example)". It
reminds
Post by Tim Lyons
me of the one of the MS Windows File Explorer views that I always found
very
Post by Tim Lyons
hard to use.
Thanks for reminding me.
It was one of the reasons I abandoned the prototype. I regarded it as a
failed experiment and was surprised that it was being considered for
merging into master.
I only considered the two-pane views as part of the prototype. We may still
need to have something like that, just to limit the data. If there were a
checkbox column in the first column for selection, it would be different
for users, but perhaps not too different (would work like GMail, if you are
familiar with it).

As to the other part of the prototype (load all data on start), that would
take a while to load, but would operate fast afterwords (and would get the
expected behaviour of a scrollbar). It is just that taking 8 minutes on a
COE i7 with SSD drive and local database, is too long to load for a view
with 362k records. I think it would be smart to remove the scrollbar on
large tables, to indicate that this is a page oriented view. I think that
part of the prototype may be useful, considering:

I am now looking at view load times. I just cut in half the time to load
all treeviews in master (people tree, citation tree, and places). Before,
it was traversing the data twice, and having to look up the data three
times when a filter is in effect (it is down to only two lookups per item
for filters, and only once for no filters.) I think the view loading code
can be further refined (eg, we can pass in raw data to a filter to match on
it, rather than looking it up again.)

And, we could add db methods to allow those databases that can operate on
multiple records (SQL) to operate all at once.

-Doug
Post by Tim Lyons
Nick.
------------------------------------------------------------------------------
_______________________________________________
Gramps-devel mailing list
https://lists.sourceforge.net/lists/listinfo/gramps-devel
Loading...