Oakies Blog Aggregator

dbakevlar's picture

Use Azure CLI…I Beg You…

#333333; cursor: text; font-family: -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Oxygen-Sans,Ubuntu,Cantarell,'Helvetica Neue',sans-serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">Azure CLI made me feel right at home after working at Oracle in the Enterprise Manager CLI, (EMCLI)  The syntax is simple, powerful and allows an interface to manage Azure infrastructure from the command line, scripting out complex processing that would involve a lot of time in the user interface.
https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/azurecli.jpg?... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/azurecli.jpg?... 768w" sizes="(max-width: 383px) 100vw, 383px" data-recalc-dims="1" />

I’d love to start promoting it to more DBAs and infrastructure folks, but not just for creating databases and a few server/containers/vms, but for the entire STACK.  With that request, there’s going to be a lot of follow up blog posts on this one, but let’s just start with a few tips and tricks, along with a 101

1.Download the Azure CLI Client

Download Azure CLI to your desktop-  Its really easy.  Just following the defaults and install it on your desktop.  There’s no need to restart and it’s readily available from the command prompt, (cmd).

2.  Get a Real Script Editor

If you think you’ll get by with Notepad for your script editor, don’t even try it.  Get a proper text or script editor that tracks lines of code, can handle multiple scripting formats, etc.  If you need a suggestion, I am using Sublime Text and it does the trick.

3. Test Your Installation

Logging into Azure, if you’re using Azure Active directory is really easy.  Just open up a Command Prompt, (cmd from the start menu) and type in the following:

az login

An authorization window to prompt which one of your AD accounts you’re using for Azure you’d like to choose and then it will authorize and proceed.  You’ll see the following in the command prompt window once it’s finished.

"You have logged in. Now let us find all the subscriptions to which you have access..."

It will also show you your subscriptions that you have access to in Azure and then return to the prompt.  Congratulations, you’re now ready to deploy via the CLI!

4.  Perform a Few Test Deployments

#333333; cursor: text; font-family: -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Oxygen-Sans,Ubuntu,Cantarell,'Helvetica Neue',sans-serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">Then get started with it#333333; cursor: text; font-family: -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Oxygen-Sans,Ubuntu,Cantarell,'Helvetica Neue',sans-serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;"> by deploying a few test VMs, SQL Databases and maybe a container or two.

You can deploy a VM pretty easily with just a bit of information:

>C:\az vm create -n  -g  --image UbuntuLTS --generate-ssh-keys

SSH key files 'C:\Users\xxxxxxxxxxxx\.ssh\id_rsa' and 'C:\Users\xxxxxxxxxxxxxxxx\.ssh\id_rsa.pub' have been generated under ~/.ssh to allow SSH access to the VM. If using machines without permanent storage, back up your keys to a safe location.

- Running ..

C:\EDU_Docker>az vm list –n 

C:\EDU_Docker>az vm delete -n  -g  #I like to add the group, too.

The CLI will still ask you to verify if you want to delete the resource, but once you confirm, it will remove it and you’ll be back to clean.

The more people that use the CLI, the more robust it will become and the more powerful you become as an infrastructure specialist in Azure.  Come on, help a girl out here-  I can’t blog about this all on my own… </p />
</p></div>
    <div class=»

Jonathan Lewis's picture

Join Cardinality – 2

In the previous note I posted about Join Cardinality I described a method for calculating the figure that the optimizer would give for the special case where you had a query that:

  • joined two tables
  • used a single-column to join on equality
  • had no nulls in the join columns
  • had a perfect frequency histogram on the columns at the two ends of the join
  • had no filter predicates associated with either table

The method simply said: “Match up rows from the two frequency histograms, multiply the corresponding frequencies” and I supplied a simple SQL statement that would read and report the two sets of histogram data, doing the arithmetic and reporting the final cardinality for you. In an update I also added an adjustment needed in 11g (or, you might say, removed in 12c) where gaps in the histograms were replaced by “ghost rows” with a frequency that was half the lowest frequency in the histogram.

This is a nice place to start as the idea is very simple, and it’s likely that extensions of the basic idea will be used in all the other cases we have to consider. There are 25 possibilities that could need separate testing – though only 16 of them ought to be relevant from 12c onwards. Oracle allows for four kinds of histograms – in order of how precisely they describe the data they are:

  • Frequency – with a perfect description of the data
  • Top-N (a.k.a. Top-Frequency) – which describes all but a tiny fraction (ca. one bucket’s worth) of data perfectly
  • Hybrid – which can (but doesn’t usually, by default) describe up to 2,048 popular values perfectly and gives an approximate distribution for the rest
  • Height-balanced – which can (but doesn’t usually, by default) describe at most 1,024 popular values with some scope for misinformation.

Finally, of course, we have the general case of no histogram, using only 4 numbers (low value, high value, number of rows, number of distinct values) to give a rough picture of the data – and the need for histograms appears, of course, when the data doesn’t look anything like an even distribution of values between the low and high with close to “number of rows”/”number of distinct values” for each value.

So there are 5 possible statistical descriptions for the data in a column – which means there are 5 * 5 = 25 possible options to consider when we join two columns, or 4 * 4 = 16 if we label height-balanced histograms as obsolete and ignore them (which would be a pity because Chinar has done some very nice work explaining them).

Of course, once we’ve worked out a single-column equijoin between two tables there are plenty more options to consider:  multi-column joins, joins involving range-based predicates, joins involving more than 2 tables, and queries which (as so often happens) have predicates which aren’t involved in the joins.

For the moment I’m going to stick to the simplest case – two tables, one column, equality – and comment on the effects of filter predicates. It seems to be very straightforward as I’ll demonstrate with a new model

rem
rem     Script:         freq_hist_join_03.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1(
        id      number(8,0),
        n0040   number(4,0),
        n0090   number(4,0),
        n0190   number(4,0),
        n0990   number(4,0),
        n1      number(4,0)
)
;

create table t2(
        id      number(8,0),
        n0050   number(4,0),
        n0110   number(4,0),
        n0230   number(4,0),
        n1150   number(4,0),
        n1      number(4,0)
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   40) + 1                   n0040,
        mod(rownum,   90) + 1                   n0090,
        mod(rownum,  190) + 1                   n0190,
        mod(rownum,  990) + 1                   n0990,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   50) + 1                   n0050,
        mod(rownum,  110) + 1                   n0110,
        mod(rownum,  230) + 1                   n0230,
        mod(rownum, 1150) + 1                   n1150,
        trunc(30 * abs(dbms_random.normal))     n1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
end;
/

You’ll notice that in this script I’ve created empty tables and then populated them. This is because of an anomaly that appeared in 18.3 when I used “create as select”, and should allow the results from 18.3 be an exact match for 12c. You don’t need to pay much attention to the Nxxx columns, they were there so I could experiment with a few variations in the selectivity of filter predicates.

Given the purpose of the demonstration I’ve gathered histograms on the column I’m going to use to join the tables (called n1 in this case), and here are the summary results:


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS
-------------------- -------------------- --------------- ------------ -----------
T1                   N1                   FREQUENCY                119         119
T2                   N1                   FREQUENCY                124         124

     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2488       2619    6,516,072
         1       2693       2599    6,999,107
         2       2635       2685    7,074,975
         3       2636       2654    6,995,944
...
       113          1          3            3
       115          1          2            2
       116          4          3           12
       117          1          1            1
       120          1          2            2
                                 ------------
sum                               188,114,543

We’ve got frequencyy histograms, and we can see that they don’t have a perfect overlap. I haven’t printed every single line from the cardinality query, just enough to show you the extreme skew, a few gaps, and the total. So here are three queries with execution plans:


set serveroutput off

alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));


select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
and     t1.n0990 = 20
and     t2.n1150 = 25
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I’ve queried the pure join – the count was exactly the 188,114,543 predicted by the cardinality query, of course – then I’ve applied a filter to one table, then to both tables. The first filter n0990 = 20 will (given the mod(,990)) definition identify one row in 990 from the original 100,000 in t1; the second filter n1150 = 25 will identify one row in 1150 from t2. That’s filtering down to 101 rows and 87 rows respectively from the two tables. So what do we see in the plans:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:23.47 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:23.47 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    188M|    188M|00:00:23.36 |     748 |  6556K|  3619K| 8839K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.02 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.02 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    190K|    200K|00:00:00.02 |     748 |  2715K|  2715K| 1647K (0)|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    100K|    100K|00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T1"."N0990"=20)



-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |     748 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |     748 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |    165 |    165 |00:00:00.01 |     748 |  2715K|  2715K| 1678K (0)|
|*  3 |    TABLE ACCESS FULL| T2   |      1 |     87 |     87 |00:00:00.01 |     374 |       |       |          |
|*  4 |    TABLE ACCESS FULL| T1   |      1 |    101 |    101 |00:00:00.01 |     374 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")
   3 - filter("T2"."N1150"=25)
   4 - filter("T1"."N0990"=20)


The first execution plan shows an estimate of 188M rows – but we’ll have to check the trace file to confirm whether that’s only an approximate match to our calculation, or whether it’s an exact match. So here’s the relevant pair of lines:


Join Card:  188114543.000000 = outer (100000.000000) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 188114543 Computed: 188114543.000000

Yes, the cardinality calculation and the execution plan estimates match perfectly. But there are a couple of interesting things to note. First, Oracle seems to be deriving the cardinality by multiplying the individual cardinalities of the two tables with a figure it calls “sel” – the thing that Chinar Aliyev has labelled Jsel the “Join Selectivity”. Secondly, Oracle can’t do arithmetic (or, removing tongue from cheek) the value it’s reported for the join selectivity is reported at only 6 decimal places, but stored to far more. What is the Join Selectivity, though ? It’s the figure we derive from the cardinality SQL divided by the cardinality of the cartesian join of the two tables – i.e. 188,114,543 / (100,000 * 100,000).

With the clue from the first trace file, can we work out why the second and third plans show 190K and 165 rows respectively. How about this – multiply the filtered cardinalities of the two separate tables, then multiply the result by the join selectivity:

  • 1a)   n0990 = 20: gives us 1 row in every 990.    100,000 / 990 = 101.010101…    (echoing the rounded execution plan estimate).
  • 1b)   100,000 * (100,000/990) * 0.0188114543 = 190,014.69898989…    (which is in the ballpark of the plan and needs confirmation from the trace file).

 

  • 2a)   n1150 = 25: gives us 1 row in every 1,150.    100,000 / 1,150 = 86.9565217…    (echoing the rounded execution plan estimate)
  • 2b)   (100,000/990) * (100,000/1,150) * 0.0188114543 = 165.2301651..    (echoing the rounded execution plan estimate).

Cross-checking against extracts from the 10053 trace files:


Join Card:  190014.689899 = outer (101.010101) * inner (100000.000000) * sel (0.018811)
Join Card - Rounded: 190015 Computed: 190014.689899

Join Card:  165.230165 = outer (86.956522) * inner (101.010101) * sel (0.018811)
Join Card - Rounded: 165 Computed: 165.230165

Conclusion.

Remembering that we’re still looking at very simple examples with perfect frequency histograms: it looks as if we can work out a “Join Selectivity” (Jsel) – the selectivity of a “pure” unfiltered join of the two tables – by querying the histogram data then use the resulting value to calculate cardinalities for simple two-table equi-joins by multiplying together the individual (filtered) table cardinality estimates and scaling by the Join Selectivity.

Acknowledgements

Most of this work is based on a document written by Chinar Aliyev in 2016 and presented at the Hotsos Symposium the same year. I am most grateful to him for responding to a recent post of mine and getting me interested in spending some time to get re-acquainted with the topic. His original document is a 35 page pdf file, so there’s plenty more material to work through, experiment with, and write about.

 

connor_mc_d's picture

Partial Indexes–Take Care With Truncate

Partial indexes are a very cool feature that came along with Oracle 12c. The capability at partition level to control index existence allows for a couple of obvious use cases:

1) You index the data in recent partitions only, because small amounts of data are aggressively searched by applications and/or users, but not the older data because the queries for older data are either less frequent or are more analytical in nature.

2) You index the data in older partitions only, because you are putting the recent data into the in-memory store so almost any kind of access is very fast, but you do not want to swamp the entire in-memory store with older data that people rarely query.  But you still want reasonable access performance on that old data.

Whatever your use-case, there is currently an outstanding issue with partial indexes that you need to be aware of. (Note: This is not correct functionality and will be fixed in a future release). If you truncate a table, then the partial index dictionary definition may not be correctly observed for partitions.

Here’s an example:


SQL> create table t ( x int, y int )
  2  partition by range ( x )
  3  (
  4    partition p1 values less than ( 1000 ) indexing on,
  5    partition p2 values less than ( 2000 ) indexing off
  6  );
 
Table created.
 
SQL> insert into t
  2  select rownum, rownum from dual connect by level < 2000;
 
1999 rows created.
 
SQL> create index ix on t ( x ) local indexing partial;
 
Index created.
 
SQL> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'IX';
 
SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
IX                             P1
 
SQL> select partition_name, status
  2  from user_ind_partitions
  3  where index_name = 'IX';
 
PARTITION_NAME                 STATUS
------------------------------ --------
P1                             USABLE
P2                             UNUSABLE

So far so good… partial indexing is working as expected. Then I truncate the table:


SQL> truncate table t;
 
Table truncated.
 
SQL>
SQL> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'IX';
 
SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
IX                             P1
IX                             P2
 
SQL> select partition_name, status
  2  from user_ind_partitions
  3  where index_name = 'IX';
 
PARTITION_NAME                 STATUS
------------------------------ --------
P1                             USABLE
P2                             USABLE
 
SQL> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'IX';
 
SEGMENT_NAME                   PARTITION_NAME
------------------------------ ------------------------------
IX                             P1
IX                             P2
 
SQL> select partition_name, status
  2  from user_ind_partitions
  3  where index_name = 'IX';
 
PARTITION_NAME                 STATUS
------------------------------ --------
P1                             USABLE
P2                             USABLE

And now we seem to have dropped our knowledge of partial indexing and the second index partition springs into existence. The dictionary for the table remains unaffected


SQL> select partition_name, indexing
  2  from   user_tab_partitions
  3  where  table_name = 'T';
 
PARTITION_NAME                 INDE
------------------------------ ----
P1                             ON
P2                             OFF

The workaround is easy. You simply re-issue the partial indexing command


SQL> alter table t modify partition p2 indexing off;

Table altered.

SQL>
SQL> select segment_name, partition_name
  2  from user_segments
  3  where segment_name = 'IX';

SEGMENT_NAME                   PARTITION_NAME
------------------------------ ---------------------------
IX                             P1

But obviously, you need to remember to do this.

dbakevlar's picture

Using Microsoft Flows to Automate RSS Feeds

Now everyone knows how I like to automate everything and for those that have known me since I started sharing content, I pretty much cried a thousand tears when the personalized news source, Prism disappeared.

I’ve been working with RSS feeds aggregators to send me content each day to read, but I get frustrated with having to go find them sent to my spam folder or not being able to get to the links, so I wanted to try something new.

Microsoft Flow

I automated a lot of tasks at my previous jobs with IFTTT, (If this, then that) recipes and after arriving at Microsoft, was thrilled when I saw my first demo of Microsoft Flow.  It’s a growing product that has connectors to a ton of applications, features and has a custom API to build out automated workflows for whatever your heart desires.  What I’m about to do isn’t going to require anything THAT custom, so no worries.

Change Flow

With my time in the Analytics and AI space, my content source is going to change, so I have a few new blogs that I want to keep up to date with.  For our example, we’ll use one of my recent favorites, Chris Webb’s BI blog.  As we eat our own dogfood here at Microsoft, I’m would like to have emails notify me via email of when he has a new blog post and provide it to “my personal team of one” in Microsoft Teams, (I found Teams excellent as a self-project management program, like Trello, Outlook Calendar and Slack all in one.) You’ll need a Microsoft Team account to do one of the steps I’ve added, but it’s not necessary if you want to just email the notification of a new post to yourself.

After logging into my Microsoft Flow account, I choose a new template and type in the Search bar, “RSS”.  I double click on the RSS Feed Notification and it displays the following page:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow7.jpg?res... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow7.jpg?res... 768w" sizes="(max-width: 387px) 100vw, 387px" data-recalc-dims="1" />

Feed the URL

If you don’t know how to get the feed URL from a blog you like, Click on the browser settings and choose either “Developer Tools” or “Source Code”, (or it will be something similar, depending on the browser..)  Do a search in the code for “RSS” and you’ll quickly come across the URL for the RSS Feed.  By default, it’s commonly the URL/feed/, but it’s always better to check.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow2.jpg?res... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow2.jpg?res... 768w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow2.jpg?w=1... 1400w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow2.jpg?w=2... 2100w" sizes="(max-width: 501px) 100vw, 501px" data-recalc-dims="1" />

The rest of the information is just my Team name, which shows in the drop down list.  I could send this to any of the Teams I’m part of, but this is for me, so my own little team of one will do.  I dynamically populate the blog summary and the blog post URL to make it easy to go from Teams to the blog post.

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow3.jpg?res... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow3.jpg?res... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow3.jpg?w=1... 1400w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow3.jpg?w=2... 2100w" sizes="(max-width: 504px) 100vw, 504px" data-recalc-dims="1" />

Add Email

After filling out the fields, I can save, but instead I click Next Step to add the email to Outlook step.  If you don’t have Outlook, there are connectors for Gmail and event Mailchimp!  As I stated, I’m eating me own dogfood, so I have Outlook and just need to fill in the pertinent information as requested.  I dynamically populate some of the fields for the Blog Summary and URL link so I can easily connect to the site when the email arrives.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow4.jpg?res... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow4.jpg?res... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow4.jpg?w=1... 1400w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow4.jpg?w=2... 2100w" sizes="(max-width: 503px) 100vw, 503px" data-recalc-dims="1" />

Test and Savor

Now I click on the Test up in the upper right hand corner.  It tests all the steps and the connections, just to verify all the steps have been filled out correctly and then once that’s completed, you can then exit, the workflow creation is complete.

https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow5.jpg?res... 300w, https://i1.wp.com/dbakevlar.com/wp-content/uploads/2018/10/flow5.jpg?res... 768w" sizes="(max-width: 203px) 100vw, 203px" data-recalc-dims="1" />

Now, when there’s a new blog post from Chris, I’ll know about it, both via email and Teams!  I may have lost Prism, but that doesn’t mean I can’t get the information that’s important to me without seeing all the rubbish that’s put out on most sites!  Yes, I’m adding a ton more workflows to automate more of my day, including adding contacts from coworker emails, automating refreshes on datasets, etc.  Don’t let the preview status scare you on the product-  this is how the product comes up to speed fast is with people working with it and using it to its full potential.

 



Tags:  , , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Using Microsoft Flows to Automate RSS Feeds], All Right Reserved. 2018.

Kamil Stawiarski's picture

ODBV3 and ASM

At Trivadis Performance Days 2018 (awesome event by the way) I promised to deliver ODBV3 with support for ASM – and here it is! </p />
</p></div>
    <div class=»

Uwe Hesse's picture

Understanding Distribution in #Exasol

Exasol doesn’t need much administration but getting distribution right matters

Exasol uses a clustered shared-nothing architecture with many sophisticated internal mechanisms to deliver outstanding performance without requiring much administration. Getting the distribution of rows between cluster nodes right is one of the few critical tasks left, though. To explain this, let’s say we have two tables t1 and t2:

https://uhesse.files.wordpress.com/2018/10/tablest1t2.png?w=150&h=137 150w, https://uhesse.files.wordpress.com/2018/10/tablest1t2.png 497w" sizes="(max-width: 300px) 100vw, 300px" />

The two tables are joined on the column JoinCol, while WHERE conditions for filtering are done with the column WhereCol. Other columns are not shown to keep the sketches small and simple. Now say these two tables are stored on a three-node cluster. Again, for simplicity only active nodes are on the sketch – no reserve nodes or license nodes. We also ignore the fact that small tables will be replicated across all active nodes.

Distribution will be random if no distribution key is specified

Without specifying a distribution key, the rows of the tables are distributed randomly across the nodes like this:

https://uhesse.files.wordpress.com/2018/10/randomdistribution.png?w=150&... 150w, https://uhesse.files.wordpress.com/2018/10/randomdistribution.png?w=300&... 300w, https://uhesse.files.wordpress.com/2018/10/randomdistribution.png?w=768&... 768w, https://uhesse.files.wordpress.com/2018/10/randomdistribution.png 888w" sizes="(max-width: 620px) 100vw, 620px" />

Absence of proper distribution keys: global joins

The two tables are then joined:

SELECT  FROM t1 JOIN t2 ON t1.JoinCol = t2.JoinCol;

Internally, this is processed as a global join which means network communication between the nodes on behalf of the join is required. This is the case because some rows do not find local join partners on the same node:

https://uhesse.files.wordpress.com/2018/10/globaljoin.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2018/10/globaljoin.png?w=300&h=117 300w, https://uhesse.files.wordpress.com/2018/10/globaljoin.png?w=768&h=299 768w, https://uhesse.files.wordpress.com/2018/10/globaljoin.png 889w" sizes="(max-width: 620px) 100vw, 620px" />

Distribution on join columns: local joins

If the two tables were distributed on their join columns with statements like these

ALTER TABLE t1 DISTRIBUTE BY JoinCol;

ALTER TABLE t2 DISTRIBUTE BY JoinCol;

then the same query can be processed internally as a local join:

https://uhesse.files.wordpress.com/2018/10/localjoin.png?w=150&h=59 150w, https://uhesse.files.wordpress.com/2018/10/localjoin.png?w=300&h=119 300w, https://uhesse.files.wordpress.com/2018/10/localjoin.png?w=768&h=304 768w, https://uhesse.files.wordpress.com/2018/10/localjoin.png 891w" sizes="(max-width: 620px) 100vw, 620px" />

Here every row finds a local join partner on the same node so no network communication between the nodes on behalf of the join is required. The performance with this local join is much better than with the global join although it’s the same statement as before.

Why you shouldn’t distribute on WHERE-columns

While it’s generally a good idea to distribute on JOIN-columns, it’s by contrast a bad idea to distribute on columns that are used for filtering with WHERE conditions. If both tables would have been distributed on the WhereCol columns, it would look like this:

https://uhesse.files.wordpress.com/2018/10/wherecols.png?w=150&h=59 150w, https://uhesse.files.wordpress.com/2018/10/wherecols.png?w=300&h=119 300w, https://uhesse.files.wordpress.com/2018/10/wherecols.png?w=768&h=304 768w, https://uhesse.files.wordpress.com/2018/10/wherecols.png 883w" sizes="(max-width: 620px) 100vw, 620px" />

This distribution is actually worse than the initial random distribution! Not only does this cause global joins between the two tables as already explained, statements like e.g.

 WHERE t2.WhereCol='A';

will utilize only one node (the first with this WHERE condition) and that effectively disables one of Exasol’s best strengths, the Massive Parallel Processing (MPP) functionality. This distribution leads to poor performance because all other nodes in the cluster have to stand by being idle while one node has to do all the work alone.

Examine existing distribution with iproc()

The function iproc() helps investigating the existing distribution of rows across cluster nodes. This statement shows the distribution of the table t1:

SELECT iproc(),COUNT(*) FROM t1 GROUP BY 1 ORDER BY 1;

Evaluate the effect of distribution keys with value2proc()

The function value2proc() can be used to display the effect that a (new) distribution key would have:

SELECT home_node,COUNT(*) FROM (SELECT value2proc(JoinCol) AS home_node FROM t1) GROUP BY 1 ORDER BY 1;

Conclusion

Distribution on JOIN-columns leads to local joins which perform better than global joins: Do that!

Distribution on WHERE-columns leads to global joins and disables the MPP functionality, both causing poor performance: Don’t do that!

Richard Foote's picture

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part I (The Jean Genie)

When it comes to indexing a partitioned table, many automatically opt for Local Indexes, as it’s often assumed they’re simply easier to manage and more efficient than a corresponding Global Index. Having smaller index structures that are aligned to how the table is partitioned certainly has various advantages. The focus in this little series is on […]

martin.bach's picture

Ansible tips’n’tricks: a different output option

When running ansible scripts, occasionally you wonder why a given task has failed. I found out more than once that it’s commonly a problem with the script, not the engine ;) Finding out exactly where in the script I made the mistake can be more of a challenge.

With the default ansible settings, output can be a bit hard to read. Consider this example: I do quite a bit of patching in my lab, and this almost always requires an upgrade of OPatch (d’oh!). So instead of connecting to each of my hosts and performing the same unzip command over and over again, I thought of using something else. Why not use ansible for this task? It won’t get tired copying/unzipping OPatch to all the destinations I indicate in my configuration. And it won’t introduce a mistake when dealing with the fifth ORACLE_HOME on the third server…

Before replacing $ORACLE_HOME/OPatch with the new version, I want to take a backup of the current OPatch just in case. I don’t want to keep more than 1 backup around in this particular lab environment, so I decided to check for an existing backup first, before creating a new one. If one exists, I remove it. Or at least, that’s the plan.

So I was happily coding away and in my usual trial-and-error approach was ready to test the script I wrote for the first time. Here’s the result (as shown in my 80×24 terminal):

[martin@controller environment]$ ansible-playbook -i inventory.yml broken.yml

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [check if there is an old backup] *****************************************
ok: [server1]

TASK [remove old OPatch backup] ************************************************
fatal: [server1]: FAILED! => {"msg": "The conditional check 'backup_present.exis
ts' failed. The error was: error while evaluating conditional (backup_present.ex
ists): 'dict object' has no attribute 'exists'\n\nThe error appears to have been
 in '/home/martin/ansible/blogpost/environment/broken.yml': line 20, column 11, 
but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe 
offending line appears to be:\n\n\n        - name: remove old OPatch backup\n  
        ^ here\n"}

PLAY RECAP *********************************************************************
server1                    : ok=2    changed=0    unreachable=0    failed=1   

[martin@controller environment]$ 

It doesn’t really matter what I was trying to do here, what matters though is the somewhat illegible formatting of the output. The listing above really shows how the error displayed in my terminal. I haven’t quite understood yet why there are linebreaks (\n) in the output that don’t result in a carriage return on screen.

So I did a little bit of digging around and found a global setting named stdout_callback. This is usually defined in /etc/ansible/ansible.cfg which would be bad news for developers if we couldn’t override it. Thankfully you can – using $HOME/.ansible.cfg or even an ansible.cfg file in your project directory. Setting stdout_callback to “debug” reveals a much more readable version of the error:

TASK [remove old OPatch backup] ************************************************
fatal: [server1]: FAILED! => {}

MSG:

The conditional check 'backup_present.exists' failed. The error was: error while
 evaluating conditional (backup_present.exists): 'dict object' has no attribute 
'exists'

The error appears to have been in '/home/martin/ansible/blogpost/environment/bro
ken.yml': line 20, column 11, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


        - name: remove old OPatch backup
          ^ here


PLAY RECAP *********************************************************************
server1                    : ok=2    changed=0    unreachable=0    failed=1

I find this much easier to read, and by setting stdout_callback to a non-default value in my project directory I don’t break anything inadvertently. It also immediately revealed I wasn’t checking backup_exists.stat.exists, I used backup_exists.exists. Pretty-printing the output helped me debug the mistake much quicker. Later on, when your script is ready to be deployed it’s probably a good idea not to use the debug callback ;)

The ansible version in this post is 2.6.4 by the way.

Happy scripting!

Jonathan Lewis's picture

Join Cardinality

Following up my “Hacking for Skew” article from a couple of days ago, Chinar Aliyev has written an article about a method for persuading the optimizer to calculate the correct cardinality estimate without using any undocumented, or otherwise dubious, mechanisms. His method essentially relies on the optimizer’s mechanism for estimating join cardinality when there are histograms at both ends of the join, so I thought I’d write a short note describing the simplest possible example of the calculation – an example where the query is a single column equi-join with no nulls in either column and a perfect frequency histograms at both ends of the join.  (For a detailed description of more general cases I always refer to the work done by Alberto Dell’Era a few years ago). We start with two data sets that exhibit a strong skew in their data distributions:

rem
rem     Script:         freq_hist_join_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem

execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

create table t2
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(3 * abs(dbms_random.normal))      n1
from
        generator       v1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 254'
        );
end;
/


I’ve generated two tables of 10,000 randomly generated values using the dbms_random.normal() function, but I’ve scaled the value up by a factor of three and taken the absolute value – which has given me a range of 12 distinct integer values with a nicely skewed distribution. Then I’ve gathered stats requesting histograms of up to 254 buckets. Since I’ve tested this only on versions from 11.2.0.4 onwards this means I’ll get a perfect histogram on the n1 columns on both tables.

Now I’m going run a query that reports the values and frequencies from the two tables by querying user_tab_histograms using a variant of an analytic query I published a long time ago to convert the cumulative frequencies recorded as the endpoint values into simple frequencies. If, for some reason, this query doesn’t run very efficiently in your tests you could always /*+ materialize */ the two factored subqueries (CTEs – common table expressions):


prompt  =======================================================================
prompt  Multiply and sum matching frequencies. An outer join is NOT needed
prompt  because rows that don't match won't contributed to the join cardinality
prompt  =======================================================================

break on report skip 1
compute sum of product on report
column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         4        856        845      723,320
         5        513        513      263,169
         6        294        249       73,206
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                18,746,698

As you can see, the two columns do have a highly skewed data distribution. The pattern of the two data sets is similar though the frequencies aren’t identical, of course. The total I get from this calculation is (I claim) the cardinality (rows) estimate that the optimizer will produce for doing an equi-join on these two tables – so let’s see the test:


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the resulting output:

Session altered.
Session altered.


  COUNT(*)
----------
  18746698


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:03.23 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:03.23 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     18M|     18M|00:00:02.96 |      40 |  2616K|  2616K| 2098K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

As we can see, the estimate for the hash join is “18M” which is in the right ballpark but, in its current format, isn’t entirely helpful which is why I’ve enabled the 10053 trace to get an exact figure from the trace file, and this is what we see:


***********************
Best so far:  Table#: 0  cost: 4.352468  card: 9487.000000  bytes: 28461.000000
              Table#: 1  cost: 378.482370  card: 18467968.000000  bytes: 110807808.000000
***********************

The optimizer’s estimate is exactly the sum of the products of the frequencies of matching values from the (frequency) histogram data. There is a simple rationale for this – it gets the right answer. For each row in t1 with value ‘X’ the (frequency) histogram on t2 tells Oracle how many rows will appear in the join, so multiplying the frequency of ‘X’ in t1 by the frequency of ‘X’ in t2 tells Oracle how many rows the ‘X’s will contribute to the join. Repeat for every distinct value that appears in both (frequency) histograms and sum the results.

As a refinement on this (very simple) example, let’s delete data from the two tables so that we have rows in t1 that won’t join to anything in t2, and vice versa – then re-gather stats, query the histograms, and check the new prediction. We want to check whether a value that appears in the t1 histogram contributes to the join cardinality estimate even if there are no matching values in the t2 histogram (and vice versa):


delete from t1 where n1 = 4;
delete from t2 where n1 = 6;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 254', no_invalidate=>false)
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 254', no_invalidate=>false)

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T1'
and     column_name = 'N1'
order by
        endpoint_value
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T2'
and     column_name = 'N1'
order by
        endpoint_value
)
select
        f1.value, f1.frequency, f2.frequency, f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
;


set serveroutput off
alter session set statistics_level = all;
alter session set events '10053 trace name context forever';

select
        count(*)
from
        t1, t2
where
        t1.n1 = t2.n1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set statistics_level = typical;
alter session set events '10053 trace name context off';

And the output – with a little cosmetic tidying:


856 rows deleted.
249 rows deleted.

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.


     VALUE  FREQUENCY  FREQUENCY      PRODUCT
---------- ---------- ---------- ------------
         0       2658       2532    6,730,056
         1       2341       2428    5,683,948
         2       1828       1968    3,597,504
         3       1305       1270    1,657,350
         5        513        513      263,169
         7        133        117       15,561
         8         40         54        2,160
         9         23         17          391
        10          5          5           25
        11          4          2            8
                                 ------------
sum                                17,950,172


Session altered.
Session altered.


  COUNT(*)
----------
  17950172


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0wxytnyqs4b5j, child number 0
-------------------------------------
select  count(*) from  t1, t2 where  t1.n1 = t2.n1

Plan hash value: 906334482
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:02.89 |      40 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:02.89 |      40 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |     17M|     17M|00:00:02.61 |      40 |  2616K|  2616K| 2134K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |   9144 |   9144 |00:00:00.01 |      20 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |   9751 |   9751 |00:00:00.01 |      20 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")


From the 10053 trace file:
***********************
Best so far:  Table#: 0  cost: 4.340806  card: 9144.000000  bytes: 27432.000000
              Table#: 1  cost: 368.100010  card: 17950172.000000  bytes: 107701032.000000
***********************

You can see from the frequency histogram report that we “lost” values 4 and 6 from the report; then the total from the report matches the actual number of rows returned by the query, and the cardinality estimate in the plan is again in the right ballpark – with the trace file showing an exact match.

I’ve run this test on 11.2.0.4,  12.1.0.2,  12.2.0.1 and  18.3.0.0 (which generated a different set of random values) – and there’s an anomaly that appears in 11.2.0.4 (though maybe that should be “disappeared from”): the optimizer’s estimate for the cardinality was a little larger than the value generated in the query against user_tab_histograms. [Now explained (probably)]

Conclusion:

For an incredibly simple class of queries with perfect frequency histograms there’s a very simple way to calculate the cardinality estimate that the optimizer will predict. Match up rows from the two frequency histograms, multiply the corresponding frequencies (making sure you don’t multiply the cumulative frequencies), and sum.

This is, of course, only a tiny step in the direction of seeing how Oracle uses histograms and covers only a type of query that is probably too simple to appear in a production system, but it’s a basis on which I may build in future notes over the next few weeks.

Update (5th Oct)

The “error” in the 11g calculation irritated me a little, and I woke up this morning with an idea about the solution. In 10.2.0.4 Oracle changed the way the optimizer calculated for a predicate that used a value that did not appear in the frequency histogram: it did the arithmetic for  “half the least frequently occurring value”. So I thought I’d run up a test where for my “sum of products” query I emulated this model. I had to change my query to an “ANSI”-style full outer join, and here it is:

with f1 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'N1'
),
f2 as (
select 
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'N1'
)
select
        f1.value, f2.value,
        nvl(f1.frequency, 0)                t1_frequency,
        nvl(f2.frequency, 0)                t2_frequency,
        nvl(f1.frequency, &t1_least / 2) *
        nvl(f2.frequency, &t2_least / 2)    product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
order by
        coalesce(f1.value, f2.value)
;

Running this code, and noting that the least frequent value in t1 was 4, while the least frequence in t2 was 2, I got the following results (with the 10053 trace file summary following the output)


     VALUE      VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ---------- ------------ ------------ ------------
         0          0         2658         2532    6,730,056
         1          1         2341         2428    5,683,948
         2          2         1828         1968    3,597,504
         3          3         1305         1270    1,657,350
                    4            0          845        1,690
         5          5          513          513      263,169
         6                     294            0          294
         7          7          133          117       15,561
         8          8           40           54        2,160
         9          9           23           17          391
        10         10            5            5           25
        11         11            4            2            8
                      ------------ ------------ ------------
sum                           9144         9751   17,952,156


Join Card:  17952157.000000 = outer (9751.000000) * inner (9144.000000) * sel (0.201341)
Join Card - Rounded: 17952157 Computed: 17952157.00
 

That’s a pretty good match to the trace file result – and the difference of 1 may simply be a rounding error (despite the trace files text suggesting it is accurate to 6 d.p.)

Footnote

Following an exchange of email with Chinar Aliyev, it’s fairly clear that the “half the least frequency” can actually be derived as “table.num_rows * column.density”.

 

connor_mc_d's picture

Easy as pi…. hole.

A slight digression from my normal database-focussed content today Smile

In internet bandwidth and latency strapped Perth (Western Australia), every last drop of internet counts. Recently I stumbled upon this blog post by Troy Hunt about using a Raspberry Pi to serve as a local DNS to filter out unnecessary content. Personally, I don’t care about the content as such (I don’t like ads, but I do acknowledge that they are generally a necessary “evil”), but for me it is about getting the most performance out of my lowly internet connection until the technology evolves in Australia.

So I headed over to https://pi-hole.net/ and downloaded the installation script. I don’t have a Raspberry Pi, which is its intended platform, but I do have Virtualbox floating around, because I use that extensively for the Oracle VM’s that I need for tackling AskTOM questions on various versions of the database. I initially tried to install pi-hole on an existing Oracle Enterprise Linux installation, but the script makes a fair few assumptions about platform, and I had a lot of dramas. So I just downloaded a fresh install of Fedora 28, got it up and running in a small VM and voila! Pi-hole installed pretty much straight out of the gate.

Subjectively, things definitely seem a lot snappier now. My console suggests that a quarter of my traffic is being filtered out!

image

And I’m only using the pi-hole on a few machines at home currently. I’ll give it a few more days before I take the big jump and make it the DNS for my router so that all my devices can get the benefit.

But so far, so good. If you’re laden with slow clunky internet like I am, perhaps give pi-hole a look.

To prevent automated spam submissions leave this field empty.