Kadane's solution to the Maximum Subarray problem

One interesting problem that you can come across occasionally is to find the contiguous sequence of numbers in a one-dimensional array of numbers that gives you the greatest sum value. Now that doesn’t sound particularly interesting until you are asked for a particular report like ‘How long was the period in which we were trading most profitably last year?’, or you maybe need to find the brightest part of an image. It is one of those techniques that like linear regression or ANOVA that you find all sorts of uses for once you understand it.

It is called the Maximum subarray problem, Kadane’s algorithm, The ‘Largest Sum Contiguous Subarray’ or the ‘Greatest subsequential sum’ algorithm. Learn it and you’ve aced four technical interview questions! I’ve never seen a SQL Solution.

Normally, the answer you’d give when you see a one-dimensional array of numbers like 45,12,32,90,48 would be ‘the whole lot contribute to the sum’, but that would assume that all the numbers were positive. What if they are not? They might, for example represent the difference from the mean. They might represent both input and output of a black-box system. Imagine that you were doing cashbook entries where cash was going in, and payments were going out, and you wanted to know the longest sequence of entries where you were accumulating money? What if you were doing just-in-time stocking of components and wanted to know those periods in which you were piling up components?

In the sequence of numbers −2, 1, −3, 4, −1, 2, 1, −5, 4 – the contiguous subarray with the largest sum is 4, −1, 2, 1, – with sum of 6.

for the sequence -1, 0, 15, 3, -9, 12, -4 – it would be 0, 15, 3, -9, 12 – with a sum of 21

for the sequence -1, -2, 3, 5, 6, -2, -1, 4, -4, 2, -2 – it would be 3, 5, 6, -2, -1, 4 – with a sum of 15

There are plenty of brute force ways of doing this but Kadane’s algorithm succeeds in being linear. This is given in the Wikipedia. Although it is used mostly for one-dimensional arrays, it will work with two-dimensional arrays as well if you simply sum up the rows, and then let the basic kadane algorithm run on the resulting one-dimensional array.

So how is this done in SQL? The simplest thing is to translate the algorithm into SQL directly, because it is very efficient. I’ve added to it slightly to give you the subarray as well as the sum. This works well-enough on small arrays and is linear. Ideally, you’d want a set-based method but I’ve never outgrown an iterative solution

IF(OBJECT_ID('tempDB..#MaxSubarray') IS NOT NULL)

DROP TABLE #MaxSubarray;

--if the temp tables exists then delete it

CREATE TABLE #MaxSubarray

(

TheOrder INT IDENTITY PRIMARY KEY,

value INT

);

--create a table representing the list with the sequence no. as the primary key

INSERT INTO #MaxSubarray (value)

SELECT x.VALUE FROM (VALUES (-2),(1),(-3),(4),(-1),(2),(1),(-5),(4)) x(VALUE);

--and now fill it with the list items

DECLARE @max_ending_here INT, --the total so far (the partial sum accumulator)

@max_so_far INT, --The Maximum of the current solution

@ii INT, --loop variable

@iiMax INT, --length of the array

@x INT, --the value of the current array element in the sequence (current row)

@start INT, --the start point of the current solution

@end INT; --the end - so far- of the current best solution

--so now we intialise all our variables

SELECT @start = -1, @end = -1, @max_so_far = 0, @ii = 1,

@iiMax = MAX(#MaxSubarray.TheOrder)

FROM #MaxSubarray;

--iterate through the array (I know!)

WHILE @ii <= @iiMax

BEGIN --get the value of the next value in the sequence

SELECT @x = #MaxSubarray.value FROM #MaxSubarray

WHERE #MaxSubarray.TheOrder = @ii;

--update the running sum total (partial sum accumulator)

SELECT @max_ending_here = COALESCE(@max_ending_here, 0) + @x;

--IF it IS greater value than anything so far update our maximum

--and update the index of the end of the run

IF @max_ending_here > @max_so_far

SELECT @max_so_far = @max_ending_here, @end = @ii;

SELECT @ii = @ii + 1;

--Now, if our accumulator has gone negative then we start again

--at the next value following

IF @max_ending_here < 0

SELECT @max_ending_here = 0, @start = @ii;

END;

SELECT @max_so_far; -- our highest positive sum fr4om a subsequence

SELECT TheOrder,value --and list the subsequence that gave the highest sum

FROM #MaxSubarray

WHERE #MaxSubarray.TheOrder BETWEEN @start AND @end

ORDER BY #MaxSubarray.TheOrder;

--if nothing in the array, the subarray is empty and the total is 0

Now there are several ways you might want to use this, but we want to have a version that we can unit-test easily as we work with the algorithm so we’d want to feed it lists whare the solution has been done already and see if it gives the correct solution. Normally, you’d probably take a table as a parameter and return a table result representing the subarray, but that is a bit messier to unit-test.

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.ROUTINES

WHERE ROUTINES.ROUTINE_TYPE='procedure'

AND ROUTINES.ROUTINE_SCHEMA='dbo'

AND ROUTINES.ROUTINE_NAME = 'MaximumValueSubarray')

DROP PROCEDURE MaximumValueSubarray

CREATE procedure dbo.MaximumValueSubarray

/**

summary: >

This function takes a sequence as a list of numbers and returns both a list

and a value. The list represents the subsequence as an ascll string representing

a sequenced list of integers in the array and the value is the maximum sum value

from aggregating that subsequence

Author: Phil Factor

Revision: 1.0

date: 19 Jan 2017

example: >

DECLARE @SubArray VARCHAR(MAX), @MaxValue int

EXECUTE MaximumValueSubarray '−2, 1, −3, 4, −1, 2, 1, −5, 4', @SubArray OUTPUT, @MaxValue OUTPUT

SELECT @SubArray as 'subarray', @MaxValue as 'Max_Value'

returns: >

The subsequence as an ascll string representing a sequenced list of integers in the array

The maximum sum value from aggregating a subsequence

**/

@Sequence VARCHAR(MAX), --eg '−2, 1, −3, 4, −1, 2, 1, −5, 4'

@SubArray VARCHAR(MAX) OUTPUT,

@MaxValue INT OUTPUT

SELECT @Subarray='', @MaxValue=0 --initialise the output variables

IF COALESCE(@Sequence,'')='' RETURN --nothing to do

DECLARE @XMLVersionOfList XML

--we convert the list into XML

SELECT @XMLVersionOfList =

'<list><i>' + REPLACE(@Sequence, ',', '</i><i>') + '</i></list>';

--we now take each component and insert it into a row of a table variable

DECLARE @MaxSubarray TABLE

(

TheOrder INT IDENTITY PRIMARY KEY,

value INT

);

INSERT INTO @MaxSubarray(Value)

SELECT x.y.value('.', 'varchar(10)') AS IDs

FROM @XMLVersionOfList.nodes('/list/i/text()') AS x(y);

DECLARE @max_ending_here INT, --the total so far (the partial sum accumulator)

@max_so_far INT, --The Maximum of the current solution

@ii INT, --loop variable

@iiMax INT, --length of the array

@x INT, --the value of the current array element in the sequence (current row)

@start INT, --the start point of the current solution

@end INT; --the end - so far- of the current best solution

--so now we intialise all our variables

SELECT @start = -1, @end = -1, @max_so_far = 0, @ii = 1,

@iiMax = MAX(TheOrder)

FROM @MaxSubarray;

--iterate through the array (I know!)

WHILE @ii <= @iiMax

BEGIN --get the value of the next value in the sequence

SELECT @x = value FROM @MaxSubarray

WHERE TheOrder = @ii;

--update the running sum total (partial sum accumulator)

SELECT @max_ending_here = COALESCE(@max_ending_here, 0) + @x;

--IF it IS greater value than anything so far update our maximum

--and update the index of the end of the run

IF @max_ending_here > @max_so_far

SELECT @max_so_far = @max_ending_here, @end = @ii;

SELECT @ii = @ii + 1;

--Now, if our accumulator has gone negative then we start again

--at the next value following

IF @max_ending_here < 0

SELECT @max_ending_here = 0, @start = @ii;

END;

SELECT @MaxValue=@max_so_far; -- our highest positive sum from a subsequence

SELECT @SubArray= @SubArray+','+CONVERT(VARCHAR(10),value) --and list the subsequence that gave the highest sum

FROM @MaxSubarray

WHERE TheOrder BETWEEN @start AND @end

ORDER BY TheOrder;

SELECT @SubArray=STUFF(@SubArray,1,1,'')

--if nothing in the array, the subarray is empty and the total is 0

--now we test it.

DECLARE @SubArray VARCHAR(MAX), @MaxValue INT

EXECUTE MaximumValueSubarray '−2, 1, −3, 4, −1, 2, 1, −5, 4', @SubArray OUTPUT, @MaxValue OUTPUT

IF @SubArray <>'4,-1,2,1' or @MaxValue <> 6

RAISERROR ('Test 1 failed with ''%s'' and ''%d''',16,1,@SubArray,@MaxValue)

EXECUTE MaximumValueSubarray '-1, 0, 15, 3, -9, 12, -4 ', @SubArray OUTPUT, @MaxValue OUTPUT

IF @SubArray <>'0,15,3,-9,12' or @MaxValue <>21

RAISERROR ('Test 2 failed with ''%s'' and ''%d''',16,1,@SubArray,@MaxValue)

EXECUTE MaximumValueSubarray ' -1, -2, 3, 5, 6, -2, -1, 4, -4, 2, -2', @SubArray OUTPUT, @MaxValue OUTPUT

IF @SubArray <>'3,5,6,-2,-1,4' or @MaxValue <>15

RAISERROR ('Test 3 failed with ''%s'' and ''%d''',16,1,@SubArray,@MaxValue)

DECLARE @SubArray VARCHAR(MAX), @MaxValue int

EXECUTE MaximumValueSubarray '1', @SubArray OUTPUT, @MaxValue OUTPUT

SELECT @SubArray as 'subarray', @MaxValue as 'Max_Value'

So there you have it. As you can see, it is iterative, but grows linearly with the length of the array. but you can use Quirky Update to speed it up considerably. I don’t know of a fast set-based algorithm but I’d be pleasantly surprised if you can come up with something faster than Kadane’s Algorithm.

Register for Simple Talk

Kadane’s solution to the Maximum Subarray problem

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases

Recommended

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases