Oakies Blog Aggregator

Jonathan Lewis's picture

Massive Delete

The question of how to delete 25 million rows from a table of one billion came up on the ODC database forum recently. With changes in the numbers of rows involved it’s a question that keeps coming back and I wrote a short series for AllthingsOracle a couple of years ago that discusses the issue. This is note is just a catalogue of links to the articles:

There is an error in part 2 in the closing paragraphs – it says that the number of index entries deleted varies “from just one to 266″, it actually varies from 181 to 266.

 

connor_mc_d's picture

The AskTOM data model

I popped out a tweet yesterday in Throwback Thursday style showing the date of the first question we took AskTOM – 18 years ago! Many entire computer systems don’t even last that long, and AskTOM hasn’t really needed to change that much in those 18 years. We’ve added a new skin, added the ability to have multiple AskTOM experts on the answer team, and of course, our new-for-2018 Office Hours program, which gives everyone free access to experts inside Oracle Corporation.

John replied to my tweet asking if we could show the ERD.

image

So here it is – I’ve taken out a few things here and there that don’t have any impact on the public facing part of AskTOM. And I’ve included some notes underneath about each table’s purpose in AskTOM.

Relational_1

ATE_ADMINS

The administrators on AskTOM. There are various privilege levels such as being able to see the admin pages, view unanswered questions and actually answer them.

ATE_HOME_MESSAGES

Any messages that we want to appear on the Questions home screen, such as upcoming conferences etc.

ATE_SUPPORTING_FILES

Files that support the answering of questions. We’ll sometimes link in scripts, images etc to better answer a question. They are all stored in here, in the database of course.

ATE_PRESENTATIONS

The presentations and other resources that you can download from the Resources page.

image

ATE_FEEDBACK

Our recently added facility to allow people to give us feedback on the AskTOM application.

image

ATE_SUBMITTED_QUESTIONS

The bread and butter of AskTOM. Your questions are stored here, along with our answer.

ATE_POSTING_STATUS

image

When you submit a question, we record it as “New, Never Read”. Once we start working on it, we will then mark it as “Read, Not Answered” so that another AskTOM administrator does not work on it. We might then have a little back-and-forth with our original submitter if we need more detail (which is a polite way of saying they didn’t read the Question Guidelines, and hence we didn’t get a test case or sample data or nicely formatted code etc etc) Smile. So that is “Need More Info” and “More Info Supplied” statuses.

Once we’ve answered the question, if we think the general community will benefit it will be “Answered Publish” which puts it up on the home page, otherwise its “Answered, Not Published”, which means the original submitter will see it, but no-one else. As you can see from the numbers in the picture (taken today), for the 18,500 questions you can see on AskTOM, there’s another 5,500 we’ve answered that don’t make the front page.

“No Further Action” is what happens when we ask for more information and the submitter decides to ghost us, or if a submitter decides the best way to communicate with us is via a torrent of insults. Either way, we ain’t bothering with that question again Smile

ATE_QUESTION_REVIEWS

The review comments from the community, and our follow-ups for each question that we answer.

image

ATE_REVIEW_STATUSES

We read every single review that comes into AskTOM. They come in with a status of “New”, we’ll “Assign” them to an administrator, and once we’re done with it, we mark it “Complete”. Yup, that’s over 125,000 reviews we’ve taken.

image

ATE_QUESTION_STATUS_HIST

We track our own performance on AskTOM so that we can better serve you all. So every time we change a question status, eg from “New” to “Need More Info” to “Answered” etc, we log the time at which the status change occurred so we can review it later to see how we’re tracking.

ATE_OBSOLETE_QUESTIONS

18 years of taking and answering questions means that some of the answers may no longer be correct or up to date. If a reviewer brings this to our attention, we’ll record it here as an action for an administrator to look at and rectify. If you’re wondering why it is not just a flag on the question, that’s because a single question may become obsolete more than once in as time marches on.

image

ATE_QUESTION_VIEWS

Every time you click on a question in AskTOM, we record that fact. Why? Because that drives the metrics on knowing which questions are most sought after by the community. We get hundreds of thousands of question views every week in AskTOM.

ATE_QUESTIONS_TOP20

This drives our “Popular Questions” page on AskTOM.

image

ATE_QUESTIONS_SOCIAL_MEDIA

Whenever we tweet out about a particular AskTOM question, or pop it on Facebook, we log it here – because we don’t want to be spamming you. Smile

And that’s all there is to it. All driven by SQL, PL/SQL and of course the latest and greatest version of Application Express.

pete's picture

Grants WITH GRANT

The ability to make grants on objects in the database such as tables, views, procedures or others such as SELECT, DELETE, EXECUTE and more is the cornerstone of giving other users or schemas granular access to objects. I say granular....[Read More]

Posted by Pete On 07/06/18 At 06:58 PM

connor_mc_d's picture

“Call me!” Many many times!

Some readers might recall that classic Blondie track “Call me”.  Of course, some readers might be wishing that I wouldn’t harp on about great songs from the 80’s. But bear with me, there is a (very tenuous) link to this post. If you haven’t heard the song, you can jump to the chorus right here.  Go on, I’ll wait until you get back. Smile

This golden oldie is relevant when it comes to dealing with object types in PL/SQL, and in particular, when you are using them in a SQL-related context.  To set the scene, I’ll start with a trivial example – creating a simple object, utilizing that object in a function, and then using a simple SELECT-FROM-DUAL on that function to check that it works.


SQL> create or replace
  2  type three_values as object
  3   ( val_1 int,
  4     val_2 int,
  5     val_3 int
  6   );
  7  /

Type created.

SQL>
SQL> create or replace
  2  function f return three_values is
  3  begin
  4     return three_values(1,2,3);
  5  end;
  6  /

Function created.

SQL>
SQL> select f from dual;

F(VAL_1, VAL_2, VAL_3)
-----------------------------------------
THREE_VALUES(1, 2, 3)

So far so good. But as is often the case, the requirement is not to get the object as an entity in it’s own right, but to get at the scalar values within the object. Hence the wrapping of that query as an inline view to an outer one to get the individual columns:


SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

But let’s explore that a little more carefully. Observation of the SQL text might lead us to think that

  • we ran the inner query,
  • saved the result we’ve already seen into a temporary result of type THREE_VALUES,
  • then extracted the elements via the outer SELECT.

But now I’ll make a small amendment to the function as follows. I’ll introduce a package variable so we can do some tracking of executions


SQL> create or replace package pkg as
  2    x int := 0;
  3  end;
  4  /

Package created.

SQL>
SQL> create or replace
  2  function f return three_values is
  3  begin
  4     pkg.x:= pkg.x+1;
  5     return three_values(1,2,3);
  6  end;
  7  /

Function created.

As you would expect, a simple SELECT-FROM-DUAL results in the function being executed once.


SQL> select f from dual;

F(VAL_1, VAL_2, VAL_3)
-----------------------------------------------------
THREE_VALUES(1, 2, 3)

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

As before, I’ll now wrap that query in another SELECT to extract the column elements. I’ll reset my package variable to start the count again from zero.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=3

PL/SQL procedure successfully completed.

Notice the difference. The function was called three times. If that function was doing some “heavy lifting” then those additional executions might be a cause for concern. The number of calls to a function referenced from SQL has always been indeterminate, so this is nothing new, but it still might catch you by surprise. References throughout the SQL to those function scalar values can bump the count up even more:


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x
  4  where x.c.val_1 = 1 and x.c.val_2 = 2;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=5

but conversely, you cannot just simply assume that every reference will result in an additional execution. For example, adding an ORDER BY clause containing references does not increment the count.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x
  4  where x.c.val_1 = 1 and x.c.val_1+10 = 11
  5  order by x.c.val_1, x.c.val_2, x.c.val_3;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=5

PL/SQL procedure successfully completed.

A 10053 trace on any of these queries reveals the reason why we can expect to see multiple executions. When I parse my initial query


select x.c.val_1, x.c.val_2, x.c.val_3
from
(select f c from dual ) x

then scrolling through the 10053 trace, I’ll end up with this:


Final query after transformations:******* UNPARSED QUERY IS *******
SELECT SYS_OP_ATG("SCOTT"."F"(),1,2,2) "C.VAL_1",SYS_OP_ATG("SCOTT"."F"(),2,3,2) "C.VAL_2",SYS_OP_ATG("SCOTT"."F"(),3,4,2) "C.VAL_3" 
FROM "SYS"."DUAL" "DUAL"

You can see that the query has been transformed to have explicit calls of the function for each column we requested.

Can the multiple calls be avoided? Yes. There’s a couple of simple options to explore here. First of these, is that you can use a scalar subquery to take advantage of some query level caching that the database will do to avoid repeated executions of query sub-components.  Here is the previous examples repeated but with a scalar subselect to access the function.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select ( select f from dual ) c
  4   from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL>
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL>
SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select ( select f from dual ) c
  4   from dual ) x
  5  where x.c.val_1 = 1 and x.c.val_2 = 2;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL>
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

Alternately, since we are returning an object as a table, then we can reflect this by adding an additional nested table object type to handle this within the function:


SQL> create or replace
  2  type three_val_list as
  3   table of three_values
  4  /

Type created.

SQL>
SQL> create or replace
  2  function f return three_val_list is
  3  begin
  4     pkg.x:= pkg.x+1;
  5     return three_val_list(three_values(1,2,3));
  6  end;
  7  /

Function created.

SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select * from f();

     VAL_1      VAL_2      VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

Before you get carried away and start screaming “Performance Bug!” from the rooftops, the moral of the story here is probably to first ask yourself: Is there a valid reason for referencing my OBJECT data types via SQL? Very often I see the instantiation of any variable (complex or scalar) being done with a SELECT-FROM-DUAL. Don’t do it – it is almost never needed.

But, if you have truly embraced the Object-Relational features and are genuinely gathering object types from SQL statements, then consider some of the workarounds above to avoid excessive function calls.

fritshoogland's picture

Quick install of prometheus, node_exporter and grafana

This blogpost is a follow up of this blogpost, with the exception that the install method in this blogpost is way easier, it uses an Ansible playbook to do most of the installation.

1. Install git and ansible via EPEL:

# yum -y localinstall https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum -y install ansible git

2. Clone my ‘prometheus_node_exp_grafana_install’ repository:

# git clone https://gitlab.com/FritsHoogland/prometheus_node_exp_grafana_install.git

3. Run the prometheus.yml playbook to install prometheus, node_exporter and grafana:

# cd prometheus_node_exp_grafana_install
# ansible-playbook prometheus.yml

Now all you need to do is setup grafana (at port 3000) to talk to prometheus to fetch data.
If you are interested, the memory usage dashboard can be imported, dashboard id 2747.

In fact, it seems node_exporter has renamed the operating system statistics, so the old Linux memory dashboard didn’t work anymore. I updated it, and uploaded a new version of the memory usage dashboard.

dbakevlar's picture

Time is Gold

So many have asked me when I’m starting at Microsoft and the official date is Monday, June 11th now.  Many also wonder what my upper limits are on how much I can handle, well folks, it looks like we’ve reached them!

I was supposed to start this last Monday, but as my last day approached, it became all the more clear that there was a need to change my ending and start date.  Individual deadlines had pushed out, requirements changed and I was left feeling a bit overwhelmed because as most know, I don’t do anything the easy way-  major change?  I’ll make three or four at once!  I’m fearless…and sometimes a bit crazy in my endeavors.

As anyone knows, “The best laid plans of mice and [wo]men often go astray” and as such, this is for me as well.  I needed time to get one kid into her apartment, another moved to his Father’s before college and then we need to sell/donate 95% of our belongings.  Why?  Because as many of you know, Tim and I are selling our house and moving into a 42Ft. 5th wheel.  Oh, that brand new 5th wheel?  Yes, I wanted to remodel it beforehand to make sure it would support our lifestyle, including remote working.  This included chained generators, proper work areas and multiple WiFi options.  These don’t come easy and along with that, I decided to change jobs.

Limit reached! </p />
</p></div>
    <div class=»

pete's picture

GDPR

I posted a couple of days ago my slides from the recent UKOUG Northern Technology day in Leeds where I spoke about GPPR for the Oracle DBA. I said then that i am also preparing a service line for helping....[Read More]

Posted by Pete On 06/06/18 At 03:10 PM

jeremy.schneider's picture

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

jeremy.schneider's picture

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

jeremy.schneider's picture

This Week in PostgreSQL – May 31

Since last October I’ve been periodically writing up summaries of interesting content I see on the internet related to PostgreSQL (generally blog posts). My original motivation was just to learn more about PostgreSQL – but I’ve started sharing them with a few colleagues and received positive feedback.  Thought I’d try posting one of these digests here on the Ardent blog – who knows, maybe a few old readers will find it interesting? Here’s the update that I put together last week – let me know what you think!


Hello from California!

Part of my team is here in Palo Alto and I’m visiting for a few days this week. You know… for all the remote work I’ve done over the years, I still really value this in-person, face-to-face time. These little trips from Seattle to other locations where my teammates physically sit are important to me.

This is also part of the reason I enjoy user groups and conferences so much. They’re opportunities to meet with other PostgreSQL users in real life. In fact – at this very moment – one of the most important PostgreSQL conferences in the world is happening: PgCon! Having attended a few other conferences over the past year, I’m holding down the fort in the office this week in order to send a bunch of other teammates… but you can be sure I’m keeping an eye on twitter. :)

https://www.pgcon.org/2018/
https://twitter.com/search?l=&q=pgcon%20OR%20pgcon_org

=====

In the meantime, lets get busy with the latest updates from the postgresql virtual world. First of all, I think the biggest headline is that (just in time for pgcon) we have the first official beta version of PostgreSQL 11! The release announcement headlines with major partitioning enhancements, more parallelism, a feature to speed up SQL execution by compiling certain operations on-demand into native machine code (JIT/Just-In-Time compilation), and numerous SQL enhancements. You can also read the first draft of the release notes. This is the time to start testing and give feedback to the development community!

https://www.postgresql.org/about/news/1855/
https://www.postgresql.org/docs/devel/static/release-11.html

Closely related to this, there’s one other really big headline that I’m excited about: the new AWS RDS Preview Environment. You can now try out the new pg11 features ahead of time with a single click! In part because the development community is so awesome, the first database available in the RDS Preview Environment is PostgreSQL. And the official PostgreSQL 11 beta release is _already_ available on RDS!! Personally I’m hoping that this benefits the community by getting more people to test and give feedback on new features being built for PostgreSQL 11. I hope it will make a great database even better.

https://aws.amazon.com/rds/databasepreview/
https://forums.aws.amazon.com/ann.jspa?annID=5788 (pg11 beta announcement)

Outside of the RDS and PG11-related stuff, I saw one other headline that I thought might be worth mentioning. On May 29, IBM published a blog post that caught my attention, featuring EnterpriseDB as an IBM partner on their Private Cloud deployments. You might not realize just how much PostgreSQL is being used and sold by IBM… but there’s Compose, ElephantSQL, and now EDB in the mix.

https://www.ibm.com/blogs/think/2018/05/next-enterprise-platform/

Part of the reason I took note of this was that I remember just last November when HPE ran a similar announcement, partnering with EDB on their on-premise subscription-based GreenLake platform.

https://news.hpe.com/hpe-expands-pay-per-use-it-to-key-business-workloads/

So it seems to me that EDB is doing some nice work at building up the PostgreSQL presence in the enterprise world – which I’m very happy to see. To be clear, this isn’t necessarily new… does anyone remember vPostgres?

https://blogs.vmware.com/vsphere/2016/03/getting-comfortable-with-vpostgres-and-the-vcenter-server-appliance-part-1.html

Nonetheless, it feels to me like the ball is moving forward. It feels like PostgreSQL maturity and adoption are continually progressing at a very healthy pace.

=====

Moving on from headlines, lets get to the real stuff – the meaty technical articles. :)

First up, who likes testing and benchmarking? One of my favorite activities, truth be told! So I can’t quite convey just how excited I am about the preview release of Kevin Closson’s pgio testing kit. For those unfamiliar, Kevin has spent years refining his approach for testing storage through database I/O paths. Much work was done in the past with Oracle databases, and he calls his method SLOB. I’m excited to start using this kit for exploring the limits of storage through PostgreSQL I/O paths too.

https://kevinclosson.net/2018/05/23/sneak-preview-of-pgio-the-slob-method-for-postgresql-part-iv-how-to-reduce-the-amount-of-memory-in-the-linux-page-cache-for-testing-purposes/

Right after Kevin published that post, Franck Pachot followed up with a short article using pgio to look at the impact of the ext4 “sync” filesystem option (made relevant by the recently disclosed flaws in how PostgreSQL has been interacting with Linux’s implementation of fsync).

https://blog.dbi-services.com/postgres-the-fsync-issue-and-pgio-the-slob-method-for-postgresql/

In addition to Kevin’s release of PGIO, I also saw three other generally fun technical articles. First, Kaarel Moppel from Cybertec published an article showing much lower-than-expected impact of pl/pgsql triggers on a simple pgbench execution. Admittedly, I want to poke around at this myself, having seen a few situations myself where the impact seemed higher. Great article – and it certainly got some circulation on twitter.

https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/

Next, Sebastian Insausti has published an article explaining PostgreSQL streaming replication. What I appreciate the most about this article is how Sebastian walks through the history of how streaming replication was developed. That context is so important and helpful!

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

Finally, the requisite Vacuum post.  :)  This month we’ve got a nice technical article from Sourabh Ghorpade on the Gojek engineering team. Great high-level introduction to vacuuming in general, and a good story about how their team narrowly averted an “xid wraparound” crisis.

https://blog.gojekengineering.com/postgres-autovacuum-tuning-394bb99fe2c0

=====

We’ve been following Dimitri Fontaine’s series on PostgreSQL data types. Well sadly (but inevitably) he has brought the series to a close. On May 24, Dimitri published a great summary of the whole data type series – this definitely deserves to be on every developer’s short list of PostgreSQL bookmarks!

https://tapoueh.org/blog/2018/05/postgresql-data-types/

But while we’re talking about data types, there were two more related articles worth pointing out this time around. First, Berend Tober from SeveralNines published a nice article back on the 9th about the serial data type in PostgreSQL. This is an integer type that automatically comes with not-null constraints and auto-assignment from a sequence.

https://severalnines.com/blog/overview-serial-pseudo-datatype-postgresql

Secondly, Hans-Jürgen Schönig from Cybertec gives a nice overview of mapping data types from Oracle to PosgreSQL. He has a little paragraph in there about mapping Oracle numbers to PostgreSQL integers and numerics. That little paragraph probably deserves triple-bold-emphesis. Automatic mapping of every number column to PostgreSQL numeric has been cause for many, many performance woes in PostgreSQL databases!

https://www.cybertec-postgresql.com/en/mapping-oracle-datatypes-to-postgresql/

=====

For something that might be relevant to both developers and DBAs, I have a couple articles about SQL tuning. First, Brian Fehrle has written a great tuning introduction for the severalnines blog. His starting point is pg_stat_activity and the explain SQL command; exactly the same as my starting point. :)

https://severalnines.com/blog/why-postgresql-running-slow-tips-tricks-get-source

Next up, Louise Grandjonc from France has published a series of articles called “understanding EXPLAIN“. Did you ever wonder why there are _two_ numbers reported for execution time of each step? Louise answers this question and many more in the these four articles!

http://www.louisemeta.com/blog/explain/
http://www.louisemeta.com/blog/explain-2/
http://www.louisemeta.com/blog/explain-3/
http://www.louisemeta.com/blog/explain-4/

=====

Moving down the stack a little more, there were two articles about monitoring that seem worth passing along. Datadog has put out a lot of articles about monitoring recently. I hadn’t mentioned it before, but Emily Chang gave us yet another one back on April 12. As usual, I’m impressed with the level of quality in this thorough article which is specifically focused on PostgreSQL on RDS. I especially appreciated the key metrics, things I’ve used myself to characterize workloads.

https://www.datadoghq.com/blog/aws-rds-postgresql-monitoring/

Earlier I mentioned the severalnines blog post about replication – and the pgDash team published a nice article on May 2 about monitoring replication. The give another nice general architectural overview of replication as well.

https://pgdash.io/blog/monitoring-postgres-replication.html

=====

In my last update, I closed with a few articles about pgpool on the severalnines blog. It seemed worth mentioning that they have published a third, final article for their series.

https://severalnines.com/blog/guide-pgpool-part-three-hints-observations

Also, I spotted an article about pgpool on the AWS database blog too. While focused on Aurora PostgreSQL, there’s plenty to be learned about using pgpool with regular PostgreSQL here.

https://aws.amazon.com/blogs/database/a-single-pgpool-endpoint-for-reads-and-writes-with-amazon-aurora-postgresql/

Along those lines, most people know about the other popular PostgreSQL connection pooler, pgbouncer. This is the connection pooler which is used by Gulcin Yildirim’s near-zero-downtime ansible-based upgrade tool, pglupgrade. He’s written a few articles about his process recently, and being a big ansible fan I’ve been following along.

https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud/
https://blog.2ndquadrant.com/near-zero-downtime-automated-upgrades-postgresql-clusters-cloud-part-ii/

But I wonder if the landscape is about to change? Yesterday the Yandex team announced that they have built and released a new load balancer to address limitations in pgpool and pgbouncer. I’ll be very interested to see what happens with odyssey!

https://www.postgresql.org/message-id/C9D1137E-F2A7-4307-B758-E5ED5559FFCA@simply.name (announcement)
https://github.com/yandex/odyssey

And that wraps things up for this edition.

Have a great week and keep learning!

To prevent automated spam submissions leave this field empty.