Job-execution-rejected

javahippie · December 7, 2024, 10:00pm

PR started here (currently in draft, might be changing still): feat(engine): Add property for skipping locked jobs in the job executor query [#264] by javahippie · Pull Request #326 · operaton/operaton · GitHub

Tests with the Spring Boot config were successful, but I’m not happy with the properties name and location, and I will also have to adapt all the other runtimes which are not Spring Boot to accept the config:

operaton.bpm:
  job-executor-acquire-with-skip-locked: true

jradesenv · December 8, 2024, 6:29am

I think its ok if you follow the bpm.job-executor that already exists, something this way:

bpm:
job-executor:
acquire-doing- row-lock: true

Maybe “row lock” says more about what we do (we lock the row on acquisition).
If we only “skip locked” but never lock the row, it wouldnt do anything.
If the prop name talks about what we want to do, its easier to implement it in another way if some databases doesnt support or work this same way with the skip locked

jradesenv · December 10, 2024, 9:57pm

While we are talking about optimization of a large number of executions, i already found that mu cockpit is freezing with like 300k instances…
I think that cockpit needs only counts (maybe grouoed by activity to display the tokens)… did you ever found any changes to improve the cockpit load time?

jradesenv · December 10, 2024, 11:42pm

Was doing some tunnings and got better results already.
Was using a t3.medium mysql instance with 650k process instances, some of them with open incidents, and acess the main cockpit and a process definition page too.
There was 2 slow queries:

select count(distinct RES.ID_ ) from ACT_RU_EXECUTION RES inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_ WHERE ( RES.PARENT_ID_ is null and ( P.ID_ = “teste-processo:1:7ce661a4-b3ff-11ef-936a-fe1819a0d98d” ) ) and (RES.TENANT_ID_ is null )

It took 17s

Added this index and it goes to 1s now:

CREATE INDEX PROC_DEF_ID_PARENT_ID_TENANT_ID USING BTREE ON camunda.ACT_RU_EXECUTION (PROC_DEF_ID_,PARENT_ID_,TENANT_ID_);

Also found that the count query for the activities to show a process definition took 32s:

select distinct RES.* from ( select ACTID.ACT_ID_ as ID_, INSTANCE.INSTANCE_COUNT_ , INC.INCIDENT_TYPE_ , INC.INCIDENT_COUNT_ from ( select distinct ACT_ID_ from ( select distinct E.ACT_ID_ from ACT_RU_EXECUTION E

where E.PROC_DEF_ID_ = “teste-processo:1:7ce661a4-b3ff-11ef-936a-fe1819a0d98d” and E.IS_ACTIVE_ = 1 and E.ACT_ID_ is not null and (E.TENANT_ID_ is null ) UNION select distinct I.ACTIVITY_ID_ AS ACT_ID_ from ACT_RU_INCIDENT I where I.PROC_DEF_ID_ = “teste-processo:1:7ce661a4-b3ff-11ef-936a-fe1819a0d98d”

and I.ACTIVITY_ID_ is not null ) ACTID ) ACTID

left outer join ( select E.ACT_ID_ , count(distinct E.ID_) as INSTANCE_COUNT_ from ACT_RU_EXECUTION E where E.PROC_DEF_ID_ = “teste-processo:1:7ce661a4-b3ff-11ef-936a-fe1819a0d98d” and E.IS_ACTIVE_ = 1 and

(E.TENANT_ID_ is null ) group by E.ACT_ID_ ) INSTANCE on ACTID.ACT_ID_ = INSTANCE.ACT_ID_ left outer join

( select I.ACTIVITY_ID_ , I.INCIDENT_TYPE_ , count(distinct I.ID_) as INCIDENT_COUNT_ from ACT_RU_INCIDENT I where I.PROC_DEF_ID_ = “teste-processo:1:7ce661a4-b3ff-11ef-936a-fe1819a0d98d”

and (I.TENANT_ID_ is null ) group by I.ACTIVITY_ID_, I.INCIDENT_TYPE_ ) INC on ACTID.ACT_ID_ = INC.ACTIVITY_ID_ ) RES order by RES.ID_ asc LIMIT 2147483647 OFFSET 0

The main reason was that it always goes to a very bad index using TENANT_ID_ and my tenant is always null…

After i droped these two index it goes to only 6s:

alter table ACT_RU_INCIDENT drop index ACT_IDX_INC_TENANT_ID;
alter table ACT_RU_EXECUTION drop index ACT_IDX_EXEC_TENANT_ID;

Maybe ill find a better way instead of drop the default ones

jradesenv · December 11, 2024, 3:23am

Created these new indexes and did optimize these two tables, and now im running this query in 3s:

CREATE INDEX idx_execution ON ACT_RU_EXECUTION (PROC_DEF_ID_, IS_ACTIVE_, ACT_ID_, TENANT_ID_);
CREATE INDEX idx_incident ON ACT_RU_INCIDENT (PROC_DEF_ID_, ACTIVITY_ID_, TENANT_ID_);

OPTIMIZE TABLE ACT_RU_EXECUTION;
OPTIMIZE TABLE ACT_RU_INCIDENT;

paul · December 11, 2024, 11:34am

The web apps, and therefore the cockpit, are getting a full rewrite. So we’ll see how well they perform. But it is a good thing to remember to test it with 100k’s worth of processes sooner than later.

javahippie · December 11, 2024, 12:03pm

Adding indexes for performance reason was always “allowed” by Camunda, also in Enterprise settings. I did this for a client with 3 TB history data (please don’t ask). The optimizations you did seem reasonable to me, and should not impact write performance at all, thanks for sharing them!