Super Scaling Queues Using the LMax Disruptor Pattern

One of the many things I hope to get around to blogging about, time permitting, are the challenges of building “Web scale” platforms using SQL Server. Of the many challenges this presents, one is coming up with a platform which can service OLTP and OLAP workloads equally well when the platform cannot tolerate the latency of traditional style ETL processes. SQL Server 2016 introduces operational analytics a potential solution to this, however this has two key limitations:

  • The in-memory OTLP engine does not have feature parity with the legacy engine (yet).
  • The batch engine will never out perform queries using the legacy engine against pre-aggregated data.

A different tact is to use some form of queue to copy the data from a normalized to a de-normalized data model. Great care has to be taken when creating this queueing mechanism in order to avoid it becoming a bottleneck in itself.

Thomas Kejser has blogged ( here ) on the use of the LMax disruptor pattern as a means to implement a high performance queue on a relational database engine. In this post I want to explore this in depth, this will include:

  • How well a LMax disruptor based queuing mechanism scales on SQL Server.
  • What the bottle necks are for this approach.
  • Where a SQL Server LMax disruptor queue burns its CPU time.

Implementing A Queue: The Naive Approach

The naive approach to implementing a queue is to use a table with a clustered index, push the data using a conventional INSERT and pop it off using a DELETE with an OUTPUT clause. Unfortunately this creates the perfect “Last page problem” storm in which all the insert threads contend on the right most leaf node in the b-tree and throughput is throttled by page latching:

last page

This approach might scale well with in-memory OLTP engine which is latch free and uses hashing to distribute rows across buckets.

Introducing The “LMax Disruptor Pattern”

One solution to this proposed by Thomas is the use of the LMAX disruptor pattern, the crux of which is the implementation of a ring buffer:

queue

Firstly we need to create the queue table:

CREATE TABLE dbo.MyQLmax (
     [Slot]            [bigint]    NOT NULL
    ,[message_id]      [bigint]    NULL
    ,[time]            [datetime]  NOT NULL
    ,[message]         [char](300) NOT NULL
    ,[reference_count] [tinyint]   NOT NULL
)

This will be seeded with 2000,000 messages:

SET NOCOUNT ON
DECLARE  @QueueSize int = 2000000
        ,@i         int = 1;
WHILE @i <= @QueueSize
BEGIN
    INSERT MyQLmax ( Slot
                    ,time
                    ,message
                    ,reference_count)
            VALUES ( @i
                    ,'2050-01-01'
                    ,'dummy'
                    ,0)
    SET @i = @i + 1
END;

To complete the queue, a clustered index will be created on the queue table:

CREATE UNIQUE CLUSTERED INDEX CIX ON dbo.MyQLmax (Slot)
WITH (FILLFACTOR = 100)

A stored procedure and sequence object are required to pop messages onto the queue:

CREATE SEQUENCE dbo.PushSequence AS BIGINT
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 2000000
CYCLE
CACHE 1000;
GO

CREATE TABLE [dbo].[PushStats] (
     [SPID]               [int] NULL
    ,[PushedMessageCount] [int] NULL
)
GO

CREATE PROCEDURE [dbo].[LmaxPush] AS
BEGIN
    DECLARE  @PushedMessageCount int = 0
            ,@QueueSize          int = 2000000
            ,@Slot               bigint
            ,@i                  int = 1;

    SET NOCOUNT ON;

    INSERT INTO PushStats ( [SPID]
                           ,[PushedMessageCount] )
                   VALUES ( @@SPID
                           ,0);

    WHILE @i <= @QueueSize
    BEGIN
        SET @Slot = NEXT VALUE FOR dbo.PushSequence

        UPDATE dbo.MyQLmax WITH (ROWLOCK)
        SET     [time]          = GETDATE()
               ,[message]       = 'Hello World'
               ,[message_id]    = @Slot
               ,reference_count = reference_count + 1
        WHERE  Slot = @Slot
        AND    reference_count = 0 /* Don’t overwrite! */;

        IF @@ROWCOUNT = 1
        BEGIN
            SET @PushedMessageCount += 1;
        END
        ELSE
        BEGIN
            SET @i += @QueueSize;
        END;
        SET @i += 1;
    END;

    UPDATE PushStats
    SET    [PushedMessageCount] = @PushedMessageCount
    WHERE  [SPID]               = @@SPID;
END;

The final pieces in the jigsaw are a sequence and stored procedure for popping messages off the queue:

CREATE SEQUENCE dbo.PopSequence AS BIGINT
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 2000000
CYCLE
CACHE 1000;
GO

CREATE TABLE [dbo].[PopStats](
     [SPID]               [int] NULL
    ,[PushedMessageCount] [int] NULL
) ON [FG_01]
GO

CREATE PROCEDURE [dbo].[LmaxPop] AS
BEGIN
    DECLARE  @MessagesPopped int    = 0
            ,@OutputMessage  char(300)
            ,@QueueReady     int    = 0
            ,@QueueSize      int    = 2000000
            ,@Slot           bigint = 0
            ,@i              int    = 1

    SET NOCOUNT ON;
    /*
     * Allow the push procedure to build up a head of steam
     * in front of the pop procedure.
     */
    WHILE <= 20000
    BEGIN
        SELECT @QueueReady = SUM([reference_count])
        FROM   dbo.MyQLmax WITH (NOLOCK);

        IF @QueueReady <= 20000
        BEGIN
            WAITFOR DELAY '00:00:00.001';
        END;
    END;

    INSERT INTO PopStats ( [SPID]
                          ,[MessagesPopped] )
                  VALUES ( @@SPID
                          ,0);

    WHILE @i &amp;amp;lt;= @QueueSize
    BEGIN
        SET @Slot = NEXT VALUE FOR dbo.PopSequence
        /*
         * Find slot
         */
        UPDATE dbo.MyQLmax WITH (ROWLOCK)
        SET     [time]            = GETDATE()
               ,@OutputMessage    = [message]
               ,[message_id]      = NULL
               ,[reference_count] = [reference_count] - 1
        WHERE  [Slot]             = @Slot
        AND    [reference_count]  > 0;

        IF @@ROWCOUNT = 0
        BEGIN
            WAITFOR DELAY '00:00:00.001';
        END;
        ELSE
        BEGIN
            SET @MessagesPopped += 1;
        END;

        SET @i += 1;
    END;

    UPDATE PopStats
    SET    MessagesPopped = @MessagesPopped
    WHERE  [SPID]         = @@SPID;
END;

Disclaimer

The code provided in this blog post is free for anyone to use as long as the following two points are observed:

  • The exact performance you will get is entirely dependent on the infrastructure you use for running SQL Server and on whatever else might be running on your instance at the time, the expectation for obtaining identical performance figures as those cited in this post is an unrealistic one.
  • The initial ‘Cut’ of the code in this blog was written with the intention to provide a means of stress testing the LMax disruptor queue pattern and not as finished code fit for production code, however should you require this, please read the addendum below:

Addendum 12/01/2015

For anyone wishing to use this code in anger, it is is recommended that it be modified thus, my initial version was written purely as a means to stress test the LMax disruptor pattern and not with production purposes in mind, similarly in the code which follows which uses a memory optimised table and IDENTITY column, IDENT_CURRENT should be used to obtain the initial slot value, the memory optimised table should only then be inserted into if the ‘Pop’ update results in a @@ROWCOUNT value of 1. Similarly, a version of the push code which is fit for production purposes should iterate with a delay in the event that the ‘Queue’ is full until a free slot becomes available.

CREATE PROCEDURE [dbo].[LmaxPop] AS
BEGIN
    DECLARE  @MessagesPopped int    = 0
            ,@OutputMessage  char(300)
            ,@QueueReady     int    = 0
            ,@QueueSize      int    = 2000000
            ,@Slot           bigint = 0
            ,@i              int    = 1

    SET NOCOUNT ON;

    INSERT INTO PopStats ( [SPID]
                          ,[MessagesPopped] )
                  VALUES ( @@SPID
                          ,0);

    WHILE 1=1
    BEGIN
        SELECT @Slot = current_value
        FROM   sys.sequences
        WHERE  name  = 'PopSequence';
        /*
         * Find slot
         */
        UPDATE dbo.MyQLmax WITH (ROWLOCK)
        SET     [time]            = GETDATE()
               ,@OutputMessage    = [message]
               ,[message_id]      = NULL
               ,[reference_count] = [reference_count] - 1
        WHERE  [Slot]             = @Slot
        AND    [reference_count]  > 0;

        IF @@ROWCOUNT = 0
        BEGIN
            WAITFOR DELAY '00:00:00.001';
        END;
        ELSE
        BEGIN
            SET @Slot = NEXT VALUE FOR PopSequence;
            SET @MessagesPopped += 1;
        END;

        SET @i += 1;
    END;

    UPDATE PopStats
    SET    MessagesPopped = @MessagesPopped
    WHERE  [SPID]         = @@SPID;
END;

The pop procedure has been coded with logic to allow the push procedure to create 20,000 messages in the queue before it will start removing them. Anyone wishing to use this code in anger for a practical application may want to reduce this number or remove this logic completely.

Time For A Test Drive !

Lets give this queue a test drive, with this test environment:

  • 2 x 10 core Xeon E5 2660 2.2 Ghz hyperthreading enabled
  • 64GB DDR4
  • 4 x SanDisk Extreme Pro 480GB SSD
  • Windows server 2012 R2
  • SQL Server 2016 CTP 2.4
  • Trace flags 8008 and 8048 enabled

ostress -E -S2016CTP24 -dLmax -Q”exec LmaxPush” -n1

This first run takes 6 minutes and 59 seconds to complete and this is what sys.dm_os_wait_stats has to say:

waits

Overcoming The WRITELOG Barrier

For purely academic purposes, lets repeat this test with delayed durability. Were the queue to be used in anger in a real world application all message pushes to the queue should be durable. Had my server had a flash PCIe card with a NVMe driver, I would have got closer to eliminating WRITELOG waits without having to go down the delayed durability path.

This change results in a dramatic drop in elapsed time; a single threaded execution drops to 1 minutes and 5 seconds. sys.dm_os_wait_stats now reports:

waits

But how well does this scale ?, the answer is not very well 😦

graph

page latching which is the whole idea behind avoiding the naive approach has come back to haunt us:

waits

Re-Thinking The Message Slot Key

The PAGELATCH_EX wait event accrues time when more than one thread requires access to the same page in the buffer cache for data modification purposes:

pageiolatch_sh

The resolution to this is to prevent this situation from taking place to begin with. Lets try changing the physical order of the queue pages on disk whilst preserving their logical order, this will require a new push procedure:

CREATE PROCEDURE [dbo].[LmaxPushV2] AS
BEGIN
    DECLARE  @PushedMessageCount int    = 0
            ,@QueueSize          int    = 2000000
            ,@1billion           bigint = 1000000000
            ,@2billion           bigint = 2000000000
            ,@3billion           bigint = 3000000000
            ,@4billion           bigint = 4000000000
            ,@5billion           bigint = 5000000000
            ,@6billion           bigint = 6000000000
            ,@7billion           bigint = 7000000000
            ,@8billion           bigint = 8000000000
            ,@9billion           bigint = 9000000000
            ,@Slot               bigint
            ,@i                  int    = 1;

    SET NOCOUNT ON;

    INSERT INTO PushStats ( [SPID]
                           ,[PushedMessageCount])
                   VALUES ( @@SPID
                           ,0 );

    WHILE @i &lt;= @QueueSize
    BEGIN
        SET @Slot = NEXT VALUE FOR dbo.PushSequence

        SET @Slot = ( CASE RIGHT(@Slot, 1)
                          WHEN 0
                              THEN @Slot
                          WHEN 1
                              THEN @Slot + @5billion
                          WHEN 2
                              THEN @Slot + @1billion
                          WHEN 3
                              THEN @Slot + @8billion
                          WHEN 4
                              THEN @Slot + @2billion
                          WHEN 5
                              THEN @Slot + @7billion
                          WHEN 6
                              THEN @Slot + @3billion
                          WHEN 7
                              THEN @Slot + @6billion
                          WHEN 8
                              THEN @Slot + @4billion
                          WHEN 9
                              THEN @Slot + @9billion
                      END );

        UPDATE dbo.MyQLmax WITH (ROWLOCK)
        SET     [time]          = GETDATE()
               ,[message]       = 'Hello World'
               ,[message_id]    = @Slot
               ,reference_count = reference_count + 1
        WHERE  Slot             = @Slot
        AND    reference_count  = 0 /* Don’t overwrite! */

        IF @@ROWCOUNT = 1
        BEGIN
            SET @PushedMessageCount += 1;
        END
        ELSE
            SET @i += @QueueSize;
        END;

        SET @i += 1;
    END;

    UPDATE PushStats
    SET    [PushedMessageCount] = @PushedMessageCount
    WHERE  [SPID]               = @@SPID;
END;

In addition to this a new INSERT statement is also required to seed create the slots in the queue table:

TRUNCATE TABLE MyQLmax
GO
NOCOUNT ON
DECLARE  @QueueSize int    = 2000000
        ,@Slot      bigint
        ,@i         int = 1;
WHILE @i &lt;= @QueueSize
BEGIN
    INSERT MyQLmax ( Slot
                    ,[time]
                    ,[message]
                    ,reference_count)
    SELECT (CASE RIGHT(@i, 1)
                WHEN 0
                    THEN @i
                WHEN 1
                    THEN @i + 5000000000
                WHEN 2
                    THEN @i + 1000000000
                WHEN 3
                    THEN @i + 8000000000
                WHEN 4
                    THEN @i + 2000000000
                WHEN 5
                    THEN @i + 7000000000
                WHEN 6
                    THEN @i + 3000000000
                WHEN 7
                    THEN @i + 6000000000
                WHEN 8
                    THEN @i + 4000000000
                WHEN 9
                    THEN @i + 9000000000
            END )
           ,'2050-01-01'
           ,'dummy'
           ,0
    SET @i = @i + 1
END;

Eliminating PAGELATCH_EX waits results in much improved scalability !!!:

graph.png

Takeaway #1 a naive approach to implementing the lmax disruptor pattern with slot numbers based on monotonically increasing values does not scale well.

SQL Server and Reverse Key Indexes

The method used to preserve the logical slot order but disrupt the physical order is a crude one and it needs  to be deterministic, i.e. each value generated by the push sequence always has to correspond to the same pseudo random value, thus ruling out the use of GUIDs. A more elegant approach is to use a “Reverse key” index, this is a conventional b-tree index which stores the values of the indexed column(s) after they have under gone bit reversion. Oracle furnishes reverse key indexes out of the box, with SQL Server we have to craft reverse key indexes ourselves. I would refer anyone wishing to create reverse key indexes with SQL Server do this blog post which details how to do it.

Where Are Our CPU Cycles Being Burned ?

Based on what we know, we get diminishing returns as the number of threads pushing messages onto the queue increases beyond 8:

graph 1

Therefore. lets take a look at where the CPU time is going for 8 threads, at module level to begin with:

module weight

sqlmin.dll encapsulates the database engine run time and sqllang.dll is used for T-SQL language processing and query optimization. Given that the push stored procedure is quite simple, sqllang.dll is burning up quite a lot of CPU time ( weight = sampled CPU time in milliseconds ). This should provide a hint as to what in-memory OLTP natively compiled procedures might provide in terms of a performance boost 😉

Lets drill down into sqlmin.dll:

function weight

Nothing in particular stands out as an excessive consumer of CPU time. SQLServerLogMgr::AppendLogRequest is the function used to copy logging information into the log buffer. The three function calls I have highlighted relate to locking and latching, two things that use of the in memory OTLP engine should eliminate.

Where Has The Bottleneck Moved To ?

The graph begs the question; why does throughput drop at 11 threads ?, wait statistics provides some clues:

waits

. . . and sys.dm_os_latch_stats  will reveal the exact latch:

latch

Taken on face value this suggests that the latch is used to synchronize access to the sequence, windows performance toolkit wait analysis can provide a definitive answer as to what the METADATA_SEQUENCE_GENERATOR latch protects:

wait analysis

Immediately after the call to:

sqlmin.dll!CMEDSeqGen::GenerateNewValue

there is a call to:

sqlmin.dll!LatchBase::Suspend

I was expecting this to be a call to sqlmin.dll!LatchBase::Acquire. Information on sqlmin.dll!LatchBase::Suspend in the public domain is non existent bar for this blog post, the post implies that suspend is used to release orphaned latches.

Performance and Scalability In Two Pictures

If I was to summarize all my material, be it slide decks or blog posts in two images it would be these two; firstly the CPU cycle latency’s of accessing different parts of the memory hierarchy:

The Good The Bad and The Ugly

For any piece of software running on a modern processor to run fast trips off the CPU to main memory need to be minimized as much as possible. Performance ( the ability for something to “Run fast” ) and scalability are related, but are not the same

The second image represents scale and anti-scale patterns, essentially whenever a shared resource is contended on, be it a lock, latch or spinlock, a situation exists which is not conducive to scalability:

anti scale

The same principle applies to execution plans when the gather and distribute streams exchange iterators are present in the plan.

Can We Do Better ?

At this point we have two options for increasing the scalability of our code, use:

  • Processors with a faster clock speed and / or superior single threaded performance which allows the latch protecting the sequence to be acquired and released faster.
  • Something other than a sequence object to obtain monotonically increasing values from.

The ideal solution would be the availability of an in-memory sequence object protected by compare and swap (CAS) instructions. All of this begs the question. if a sequence does not scale that well with the legacy database engine, has the SQL Server engine team made some special provision for the in memory OLTP engine ?, at the time of writing this blog post I do not have an answer for this.

Addendum 08/01/2016
SQL Server, up to and including version 2016 does support the retrieval of the NEXT value from a sequence in natively compiled procedure.

One potential solution is to create a single column memory optimized table with an IDENTITY like this:

CREATE TABLE [dbo].[NonBlockingSequence]
(
      [ID] [bigint] IDENTITY(1,1) NOT NULL
     ,PRIMARY KEY NONCLUSTERED HASH ( [ID] )
          WITH ( BUCKET_COUNT = 524288)
) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA )

a sequence number ( identity ) is obtained from using this code excerpt:

BEGIN TRANSACTION
    INSERT INTO dbo.NonBlockingSequence
    DEFAULT VALUES;
    SELECT @Slot = SCOPE_IDENTITY() % @QueueSize;
ROLLBACK TRANSACTION;

Note that this trick only works with an in-memory table due to its latch free nature.

How does this fly ?

graph 2Scalability is improved from 8 to 14 threads and the highest throughput obtained is 192,715 messages pushed / second with 20 threads. For completeness, below is the data table from which the graph was constructed:

data table

Takeaway #2 under load a sequence object has limited scalability which can be overcome using a memory optimized table with an identity column. However, even though this approach eliminates latching, access to the IDENTITY value is protected internally via the DBTABLE spinlock.

This is the wait activity for the test with 22 threads:

waits_n22

Microsoft does not give much away as to what the WAIT_XTP_OFFLINE_CKPT_LOG_IO wait event represents other than to state that it relates to reads on the log file required in order to perform offline check points. Because a row shows up in sysprocesses for a thread waiting on this wait type that accumulates CPU time whilst the in memory table is not in use, WAIT_XTP_OFFLINE_CKPT_LOG_IO can probably be safely ignored.

Can The Throughput Curve Be Pushed Still Further ?

I’m in the process of writing a continuation of this post involving the in memory OLTP engine, this allows the LMax disruptor pattern to scale to over 550,000 messages pushes per second:

graph 2

Wrap Up

In this post I’ve dug into using the lmax disruptor pattern with the legacy database engine and addressed several issues which inhibit its scalability:

  • log write performance
  • page latch contention
  • sequence generation scalability

Of these points, the last one is worthy of its own blog post. I suspect that there is a point at which the in memory table technique hits a scalability brick wall due to the sheer volume of DBTABLE spinlock spins. I also hope to cover the topic of queuing when using the in memory OLTP engine.

 

11 thoughts on “Super Scaling Queues Using the LMax Disruptor Pattern

  1. > This approach might scale well with in-memory OLTP engine

    Actually, it should have the same kind of scalability issues to a lesser extent. You can’t build a queue out of a hash table so you need a range index. And there should be contention on the last page of it.

    Also, CAS on the same data does not scale perfectly, either. CAS is merely a functionally constrained hardware lock.

    Great work in this blog post. Would be interesting to see the throughput curves on a 128 core box. Probably, the perfect scalability is no longer present.

    • If I can address each of your comments in turn:

      1. You can’t build a queue on a hash table.

      If you were building your own hash table and if your choice of hashing algorithm was a poor one, this would cause problems. However, if we are talking about getting the database engine to do the leg work for us, via the in memory OLTP engine which uses hash buckets I cannot see this being a problem. Also if a good implementation of a reverse key index behaves as I think it should this would work also, although bit reversion is not technically the same as a hashing algorithm, at a high level you are effectively doing the same thing as hashing values.

      2. Also CAS on the same data does not scale perfectly.

      Agreed, however because a CAS instruction is a single machine instruction ( which is obvious ) it is much more efficient than a latch in terms of the CPU cycles it burns compared to a latch.

      3. Would be interesting to see the throughput curves on a 128 core box.

      For the legacy engine I suspect a combination of heavy spin contention on the XDESMGR and LOGCACHE_ACCESS spinlocks would kill the scalability curve long before you got to 128 threads. For the in memory OLTP engine and natively compiled stored procedures, the XDESMGR is no longer used, however you would either hit high LOGBUFFER waits or heavy spin contention on the LOGCACHE_ACCESS spinlock first.

      Thanks for you comments and taking the time to read this blog post, I appreciate that my stuff is quite niche, but its still nice to have an audience.

  2. Your stuff is awesome and is at the far edge of SQL Server tuning competence in this niche.

    1. At the time of writing the comment I did not understand that you are not using range accesses to the queue. Equality predicates are enough. Hash indexes can therefore be used.

    This should have no contention whatsoever on the hash buckets, you are correct in that. But each retrieval will result in at least one cache miss because the write happened on a different core. Memory transfers should now be a machine-wide limiting resource. Unclear, when that comes into play in practice.

    2. True. Just want to emphasize my point: Concurrent writes, including CAS, on the same cache line do not scale at all with the number of threads doing them. I don’t think this contradicts anything of what you have said.

    3. Hm who knows… Would be a cool test to run. Maybe transaction aborts for sequence number generation will be the highest contention cost?! That is my guess.

  3. Fascinating article, thanks! Looking forward to the next installment. A couple minor questions/comments:

    * I interpreted Thomas Kejser’s pseudo-code to mean that both push and pop would try again at the same slot in cases where the queue is full (push) or empty (pop). Was your change to increment the slot in such cases intentional? Does that change impact performance?

    * In a real-world situation, it seems that pop would be constantly polling (with a small delay), but a push would occur only as new messages are generated. I think there may be some pretty bad worst-case behaviors that can come along with incrementing the pop slot even if there is no message to pop. For one, if the queue ever becomes empty, pop will quickly move in front of push. Once/if push catches up, the messages that were pushed in the meantime will be processed out of order because the first message to be processed will be the one where push has finally closed the gap on pop. Another edge case is that if the queue becomes empty and then only a single message is pushed, it will be ~30 minutes before the message is processed (2 million times the 1 ms wait) as the pop slot is in front of the pushed message and has to cycle all the way around to reach it. I could be misinterpreting, but in any case I suppose these are the sorts of issues that would need to be ironed out before using the code “in anger” (a new phrase for me as a non-Brit!).

    * I think the “CREATE SEQUENCE dbo.PushSequence” should read “CREATE SEQUENCE dbo.PopSequence” in the pop section of the code

    • Thanks for reading the post, “My change” on reflection was most probably an oversight ( bug ) that crept in in my haste to get the LMax disrupt-or pattern to scale. The pop procedure should ideally get the current value of the sequence from sys.sequences, however as this is approach is proven to have limited scalability, the use of the memory optimised table is recommended in which case IDENT_CURRENT should be used to the current identity value.

      • The code in the addendum makes sense, thanks! (Sometimes it’s hard to take off my code reviewer hat, sorry about that.)

        Overall, I love the blog posts and it’s great to see the level of detail of your analysis. Definitely inspirational for those of us who are very comfortable with basic query plan tuning and in some cases wait stats analysis, but not yet debugging or tuning at such an advanced level as these blog posts.

  4. Question regarding the use of a pseudo-hash on the slot value: this will speed up pushes under high traffic, as you prove, but won’t it slow down pops, particularly if they’re done in less-frequent batches? I.e., having to read lots of pages with a single message each vs. a small series of pages full of messages?

    • There is a trade off here between logical IO and page latching. The implicit assumption behind implementing a queue in this manner is that you messages will be pushed and popped at a high frequency, therefore the design errs on the side of reducing waits on pagelatch_ex at the expense of increasing logical IO. The other consideration is that as you increase the size of the queue you risk it spilling out of the CPU cache and thrashing the lookaside buffer (TLB).

Leave a Reply to datatake Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s