Transport Failure Detection

Here today i try to explain in more general fassion, Transport failure detection depends on the deployment of the Network. I will explain this with the help of an Example.




Example

Suppose their are two nodes Node-1 and Node-2 , Peer connection is already  established between them and they are exchanging messages on that connection. Now Node-1 sends a message MESSAGE-X to Node-2 and doesnot receive the response for the MESSAGE-X. So how long Node-1 should WAIT for the Respose (say 10ms) or should Node-1  retry (say YES) or How many time NOde-1 should retry (say 2-Times) all these things are deployment specific.

After satisfying all the deployment specific conditions Node-1 would check whether there is break in network connection or not. So for this Node-1 send DWR message to Node-2 and does not receive the DWA in specific period of time then it will retry the DWR for 3 time (include in the first DWR). If DWA is not received for any the DWR then it will take this situation as the Connection Failure. and Send the Other Messages to the Secondary Peer.

If Node-1 will receive the DWA with the Error does not mean that Connection Failure, because Node-1 has received the DWA on that Network for which Node-1 was checking whether the Transport-connection was there or not. DWA with error may contain Diameter_too_Busy or any other Error message is just  to inform the Node-1 the status of Node-2.

Failover

The process of detecting the Transport connection failure with its peer and forwarding the all pending messages to the Secondary Peer Node (Alternate Node) is known as failover.

Avp Structure of DWR and DWA
Device-Watchdog-Request
<DWR> ::= < Diameter Header: 280, REQ >
                { Origin-Host }
                { Origin-Realm }
                [ Origin-State-Id ]

Device-Watchdog-Answer
<DWA> ::= < Diameter Header: 280 >
                { Result-Code }
                { Origin-Host }
                { Origin-Realm }
                [ Error-Message ]
                * [ Failed-AVP ]
                [ Original-State-Id ] =
[ Origin-State-Id ]
Avp Description

Failed-AVP:- is a grouped avp provide the Debugging information in case of reject or Error during the processing such as AVP not supported etc.

Error-Message:- provides the Error in human readable form.



Original-State-Id:- is misprinted in RFC. It is basically  Origin-State-Id.

Origin-State-Id :- Origin-State-Id is used to infer the session/connection between two nodes. Whenever there is  change is state due break/disconnection in session or transport because of reboot for instance, Then rebooted node will increase the value so that other node become aware of the fact that state of peer is changed and all previous session are no more valid. Origin-State-Id is stored on non-volatile memory on all nodes.




Every time the session fails or the node is rebooted this Origin-State-Id is monotonically increased. Both nodes that are communicating stores or maps this id for mapping the Answer-Message with proper Request-Message.



Your Comments /Suggestions and Questions are always welcome.I would try to clarify doubts with best of my knowledge. So feel free to put Questions.  

79 comments:

  1. Hi Vinay,

    Thanks for this article. I've a query though.

    Please let me know when no Origin-State-Id is sent in the DWR, then what Origin-State-Id value should we expect in the DWA message?

    I'm facing an issue, where invalid AVP bits of Origin-State-Id is received in DWA when NO Origin-State-Id is sent in the DWR. Error is shown below:-

    #### <> <> <> <1322126427209>
    180.20.100.90
    origin.com
    N/A
    2001


    Regards,
    Rishi

    ReplyDelete
    Replies
    1. Hi Rishi,

      If there is no Origin-State-Id in DWR then there should not be any Origin-State-Id in DWA.

      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
  2. Hi Vinay,
    Thank you for the article.
    Let's take peer 1 configured to send a DWR every 30 seconds if no traffic is detected.
    Peer 2 is configured the same way.
    I'd like to verify something:

    At t0 peer 1 sends DWR
    at t0+30 peer2 sends DWR
    at T0+60 peer1 sends DWR

    Do you think the DWR is considered as a traffic and in this case peer1 when receiveing the DWR at T0+30 would wait another 3à to send the second DWR, that is at T0+60?

    Thank you
    Nicolas.

    ReplyDelete
    Replies
    1. Hi Nicolas

      DWR message exchange happens when there is no traffic between two nodes for a given period of time (i.e suppose we have configured 30 secs as DWR time then if there no message is exchange between considered nodes for 30 secs then DWR will be triggered.)

      Hence in Load condition there will not be any case where message is not exchanged for such a long time (i.e. TIME configured for DWR generally 2-5 secs) Therefore DWR is not part of LOAD.

      Under Load condition system will be busted with the message there fore DWR will not occur.


      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
    2. Hi,

      I think Nicolas was asking about if DWR itself is considered a traffic message which could reset the other peer watchdog timer. If not, the DWR frecuency would be not influenced by the other peer and you would have something like this:

      At t0 peer 1 sends DWR
      at t0+10 peer2 sends DWR
      at T0+30 peer1 sends DWR
      at t0+40 peer2 sends DWR
      ...

      Could you clarify ?
      BRs

      Delete
    3. Hello,

      Both peers independently send DWR messages, in case that there is no traffic.
      BR
      Aleksandar

      Delete
  3. Hello Vinay,

    Thanks for the nice article. lets there is a x-request message and waiting for y-answer message. How long the device will wait for the answer, is it application specific or session specific(depends on particular session say IP-CAN session for Gx)?

    ReplyDelete
    Replies
    1. Hi Moumita Barman,

      It should wait till it timed-out.

      Operator shall mention a time (generally in milliseconds)at client node, that how long client should wait for reply from Server. If Client receives answer/reply from Server after a given time frame then it shall discard the answer because as soon as it timedout session id corresponding to Request message is no more valid.


      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
    2. Hi,

      I've a situations where DWR/DWA is happening b/w diameter stack and peer, but peer is not responding other messages like CCR.

      DWR is configured as 20sec, from stats I see DWR/DWA is happening but CCR is not responded. How this can happen as both DWR and CCR are TCP stream packets i.e how peer can respond DWR and not CCA ?

      Thanks,
      Achal

      Delete
    3. Another query:
      DWR is configured as 20sec.
      1st DWR is sent at T0sec
      Client shall wait for DWA till T20sec ? where can in RFC I found DWR retransmission logic.

      Delete
  4. Hi i am kavin, its my first time to commenting anyplace, when i read this
    post i thought i could also create comment due to this sensible paragraph.
    My web page ... piano lessons

    ReplyDelete
  5. Hi Vinay,

    Watchdog timer need to enable separately or DWR/DWA are triggered by default?

    ReplyDelete
    Replies
    1. Hi Kamal,

      It is Diameter Stack dependent thing. It totally depends on stack vendor, how they provide it. Generally there is a provision to change default time-span value of DWR/DWA message.

      Standard says two Nodes shall check whether Link is UP or Not.

      Delete
  6. Hi,

    For First DWR got DWA MESSAGE and after immediately getting DWA message client sending 2nd DWR again after that getting error as SCTP : ABORT : User Initiated Abort. issue will be at DWR timer vlaue or Association ?

    ReplyDelete
    Replies
    1. Hi Bharath

      DWR/DWA messages are used to check whether SCTP/TCP Link is UP or not Between two nodes (Specifically TCP Link because TCP has no mechanism of health check of link)

      There is no association of DWR time and with SCTP Abort. If for a certain period of time (DWR Time) no message is exchanged then node shall send DWR to check whether LINK and Other node is up or not

      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
  7. Hi,

    What if the node-1 do not send de DWR?? it only send CER and recive CEA and that all.

    ReplyDelete
  8. I have a problem, the node-1 does not send the DWR, someone know what happen? Node-1 send de CER and recive de CEA, but thats all, the conections does not establish.

    ReplyDelete
    Replies
    1. Hi Cruz,

      This issue happens because of the one of the following reasons.

      1) CEA doesn't come with DIAMETER_SUCCESS or No Common Application.

      Kindly check CEA, or post the trace using tshark, following link shall help you.
      http://diameter-protocol.blogspot.in/2013/04/capture-diameter-messages-without-wire.html

      2)Any two peer node of NODE-1 or NODE-2 shall have same DIAMETER Identity.
      In this case it shall toggle; basically it drops the earlier connection, now earlier connection retries then it drops new connection.

      Kindly check DIAMETER Identity of each Node.

      3) (Un-usual case) Receives any other message before the CEA; then some times goes in unknown state.


      If you could share some more details then it would be better for whole world to solve it. Some times these issues are implementation specific.

      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
  9. question on transport failure detection in Diameter.
    Say I have a Diameter peer connection established and my watchdog timer is 30seconds.
    Now if I do a ifconfig down on that IP interface over which the peer connection is established.
    How long will it take my local Diameter layer to detect that the IP interface has gone down? Will this be immediate or will it have to do the watchdog procedure

    thanks,
    Vijaya

    ReplyDelete
    Replies
    1. Hi VV,

      I consider following cases

      1) LOAD Condition: Under the Load condition, Watchdog request does not come into the picture, As state in article Watchdog happens only when there is no message exchange between Peers for 30 seconds(Watchdog Time). But system is heavily loaded there-fore; In this case Transport connection would be immediate.


      2) LEAN Hour Condition: If there is no message exchange between nodes for 30 seconds then failure would only be detected with DWR message. i.e. either DWR won't be initiated by STACK or DWR would timeout, Bcz DWA won't be received in expected time. SO then detection time would be 30secs + timeout sec.


      Regards
      Ajay

      Delete
  10. If Origin-State-Id is sent in CER with value 0, is it mandatory to send the Origin-State-Id set to value 0 in the CEA message?

    ReplyDelete
    Replies
    1. Hi Vijay,


      Origin-State-Id set to Zero shall be inferred as Origin-State-Id not present in request.

      Delete
  11. Hi Vinay, I've a couple of questions re: transport failure

    Lets say as per your example we have Node 1 and Node 2 connected and exchanging messages.

    If I understand the RF3539 correctly the Tw timer is reset (with Jitter) for every Answer message. So as you say in the busy hour the DWR is never sent.

    So lets say Node 1 has sent a CCR request to Node 2 and response-timeout (10ms in your example expires) Node 1 looks to see if it should retry ('Yes' & twice as per your example) so we would see two more attempts completed before Node 1 stops retrying, the request. Each retry would reset the Tw timer.

    Couple of things I need some help with
    - I'm not sure I understand why after 3 failures (as per local config) the DWR would be initiated? Assume this is because Tw is reset on Answers and not requests so although there may be more requests sent the lack of answers means that Tw will expire
    - How does the Credit Control Tx timer overlay onto the base response-timeout i.e. if Tx was 5ms and we set the Credit Control application to Terminate no further attempts are made, does this override the base config?
    - Lean hour vs Busy hour RFC 3539 suggests that in a busy hour it may take 2Tw to fail over I assume this is because only a DWR/DWA failure can be used to infeer the peer is down?

    Kind regards Jim





    ReplyDelete
    Replies
    1. Hi Jim

      We hope, that we are not deviating you from your point and correctly understood your point of view.

      If DWA is not received of a DWR in given time (TIME-OUT time), then it is implies that there is a transport layer failure between two adjacent node called as PEERs.

      In strict Implementation of RFC-6733
      If CCA is not received doesn't imply the transport failure between peer. because there can be a case in which there is an intermediate node is present between CCR client and CCR server. For CCR client peer is Intermediate node.

      following link can help you.
      http://diameter-protocol.blogspot.in/2013/08/diameter-connection-establishment.html

      Our team has also inserted an IMAGE on this blog explaining DWR

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
    2. Many thanks much appreciated
      Kind regards
      Jim

      Delete
  12. I want to understand how the DWR exchange is different from the SCTP HEARTBEAT mechanism? A diameter protocol using SCTP as transport layer will any how detect the transport failures using the HEARTBEAT messages exchanged between the two SCTP nodes, then why there is a need to exchage DWR/DWA messages still to detect transport failures?

    ReplyDelete
    Replies
    1. Yes Vijay

      You are right.
      If we are using TCP then there no heartbeat mechanism on TCP. DIAMETER Node can use any transport. that is why DWR is there in DIAMETER implementation.

      Delete
    2. Thank you Ethan for the clarification. Does this mean that a Diameter node using SCTP as transport layer need/should not use DWR/DWA messages? May I know if this is documented anywhere in the RFC?

      Delete
    3. @ Ethan

      Your clarification is correct.

      @ Vijay

      DWR is proactive solution to detect transport failure. No Reference document telling SCTP should not implement it.

      Being a server a NODE MUST support TCP and SCTP connection. Client can be TCP or SCTP.

      Delete
  13. I have a query regarding the Failed-AVP AVP content to be encoded whenever a diameter node returns DIAMETER_MISSING_AVP error. RFC describes the following:
    7.1.5. Permanent Failures
    DIAMETER_MISSING_AVP 5005
    The request did not contain an AVP that is required by the Command
    Code definition. If this value is sent in the Result-Code AVP, a
    Failed-AVP AVP SHOULD be included in the message. The Failed-AVP
    AVP MUST contain an example of the missing AVP complete with the
    Vendor-Id if applicable. The value field of the missing AVP
    should be of correct minimum length and contain zeroes.

    7.5. Failed-AVP AVP
    ……
    A Diameter message SHOULD contain one Failed-AVP AVP, containing the
    entire AVP that could not be processed successfully. If the failure
    reason is omission of a required AVP, an AVP with the missing AVP
    code, the missing Vendor-Id, and a zero-filled payload of the minimum
    required length for the omitted AVP will be added.

    I am confused about the value to be encoded as defined in the above two sections(one section says as it should be filled with zeros and other section says it should be a zero-filled payload??).
    May I know what is the expected result? Is it that the Value field be left empty or encode the value field with the value "00" which is one byte and append the padding bytes?

    ReplyDelete
    Replies
    1. Failed-AVP is a group AVP.
      It is implied that Data field of Missing AVP shall be filled with ZERO up-to minimum length.
      ::= < AVP Header: 279 >
      1* {Missing-AVP Header: - - - [Data]} Data shall be filled be ZERO

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
    2. Ok, can you confirm if the following encoding is correct, for example for "Origin-Realm" AVP this would look like as below:
      + Failed-AVP
      ::= < AVP Header: 279 >
      ::= Origin-Realm
      AVP Code: 296
      AVP Flags: 0x40
      AVP Length: 8

      ---> Data field is empty

      Delete
    3. Wireshark/tshark is the tool to check format.


      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
  14. Hello,

    In the example above, if there is an underlying transport link failure between Node-1 and Node-2, but Node-2 has not been seen as suspect Diameter peer by Node-1 because Tw has not expired between Node-1 and Node-2; also DWR/DWA process has not taken place to conclude that Node-2 is suspect and there is a transport link failure.

    Questions:

    1) I believe in Node-1 Tx timer keeps expiring and it will keep sending CCR to Node-2 setting T-bit at re-transmission each time, until the number of configurable re-transmission times is reached by Node-1?

    2) If during this time window, Tw expires and Node-1 starts to send DWR towards Node-2; and Node-1 has not exhausted the number of its configurable re-transmission times for CCR; can CCR and DWR be sent by Node-1 towards Node-2 simultaneously?

    Thanks.

    Sam

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Hi all ,
    can any one help on this

    1)have you ever used seagull tool as a client for pumping Sy call flow
    when i am using seagull as a client ,as per my requirement i need to put timeout .In that time DWR message is receiving from server to seagull client and seagull response back with DWA,after that subsequent DWR message is sending from server but seagull never sends DWA

    is any one faced this problem .kindly provide the solution for this

    2)actually when no traffic exchanged in between two nodes with in 30 min DWR and DWA will be initiated is this time configurable in both server and client ?

    point 2 is applicable to 3GPP standards ,can we configure time for DWR and DWA both client and server side ?

    plese correct me if i am wrong

    Thanks in advance

    ReplyDelete
  17. Hi Team-Diameter,

    I have two questions.

    1. If already a connection is established to diameter server. and if we try to open second connection to diameter server using same client identity. How will server react?

    2. If 'new Origin-State-Id > older Origin-State-Id' in CER, will the server clear any old socket with same diameter client (if any, and where server is using watchdog mechanism to figure out the connection state, but watchdog timer still has not expired).

    ReplyDelete
    Replies
    1. Hi Devesh,

      Implementation of our suggestion could be vary in different vendor's DIAMETER stack, here we would explain what RFC-6733 say,

      1) If a DIAMETER server receives CER message again on established connection with same DIAMETER identity, then server would respond to second CER with CEA and establish the diameter connection on the basis of second CER, It shall disconnect First connection created by first CER. Because in this scenario Server would assume that client might have been rebooted and sending a fresh request to create DIAMETER connection with same DIAMETER IDENTITY. As we know CER is the first message exchanged to establish a DIAMETER connection.

      Following things we have observed with different vendor stacks in context with above explanation, do share if any thing new you people have observed.

      a) Stack would not allow to connect another node with same DIAMETER IDENTITY.
      b) Diameter connection fluctuates between two clients, because second client breaks the connection of first by sending CER with same identity and first client retries for its broken connection shall break the connection created by second client.


      2) working on it.

      we hope our suggestions would help you,

      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
    2. Hi Devesh,

      If a Diameter entity receives, new Origin-State-Id higher than previous, it is an indication that all previous sessions don't exist now. Resources associated with previous sessions can be freed.

      Thanks for your query.
      Happy to help you again
      Team-Diameter



      Delete
    3. This is the same scenario I’m facing with one of my node.
      In case of failover with node-1, it will try to establish the connection with node-2(CER) with the same host name, but node-2 is not probably accepting the connection & sessions are dropping.
      The answer you posted based on RFC, can you please give the exact reference for that? (RFC/section?)

      “If a DIAMETER server receives CER message again on established connection with same DIAMETER identity, then server would respond to second CER with CEA and establish the diameter connection on the basis of second CER, It shall disconnect First connection created by first CER. Because in this scenario Server would assume that client might have been rebooted and sending a fresh request to create DIAMETER connection with same DIAMETER IDENTITY. As we know CER is the first message exchanged to establish a DIAMETER connection.”


      Delete
    4. Hi,

      After server restart, client is initiating CER . Can it use same Origin-State-Id or Client shall use incremental Origin state id.


      br,
      Neeraj Surana

      Delete
  18. Hi Team,

    Actually I am getting the "DIAMETER_LOGOUT" error.

    Could you please anyone let me know what would be the reason.

    Regards,
    Harish

    ReplyDelete
    Replies
    1. Hi Harish,

      As far as our understanding of scenario. you people are working on session based application, and client is logged out(sign-out) therefore sending client is send STR Session-Terminating-Request to server with reason in Termination-Cause AVP i.e User is logged out, indicating to server to close the session.

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
  19. HI Team ,

    I have a scenario where Node A sent Exchange capability request and Node B sent Exchange capability answer with diameter success result code .Now after 29.79 sec Node B initiates watchdog request and but Node A didnt send any response for the watchdog request.
    As well as after 30.28 sec Node initiates the SCTP abort with error code user-initiated ABORT.

    User Initiated Abort (12)

    Cause of error
    --------------

    This error cause MAY be included in ABORT chunks which are send
    because of an upper layer request. The upper layer can specify
    an Upper Layer Abort Reason which is transported by SCTP
    transparently and MAY be delivered to the upper layer protocol
    at the peer.

    now questions :)
    1. Why node A sent SCTP-abort ( user-initiated ) ?Is it because the uppe layer ie diameter didnt received watchdog-request ,so diameter request sctp to initiate SCTP abort.
    2. what can be the reason for diameter request SCTP to initiate SCTP abort ( is it transport layer failure dected by diameter ) ?
    3. After successful exchange capability request and answer which node will initiates the watchdog request if there is no diameter traffic .

    Thanks in advance .
    Regards
    Victor

    ReplyDelete
    Replies
    1. HI Team,

      I was really expecting a answer on this . It will be great help to me if i recive some comments .

      Regards
      Victor

      Delete
    2. Hi Victor

      Sorry for delayed response.

      3) it is immaterial that which node first initiates DWR. DWR is used to check the status of Transport.

      Here we see a strange thing, Why do you have DWR time set to so long 29.79 messages.

      As we know DIAMETER is an application layer protocol that runs over Transport Layer protocol (TCP or SCTP) so we need to first check whether Transport is working or not.

      So kindly tell us what all SCTP messages have been exchanged during 29.79 seconds.
      Check whether SCTP heart beat message is exchanged or not
      Kindly try to reduce DWR time to some milii-seconds.

      Kindly revert

      Thanks for your query.

      Happy to help you again.
      Team-Diameter



      Delete
  20. HI Team,

    Thanks for your reply . I agree with you that there is some problem with transport layer .
    Yes there is sctp heartbeat message sent from Node A which Node B didnt respond to.

    SCTP message exchanged between two nodes are

    node A Node B
    init-------------------------->
    < ------------------------init_ack
    cookie_echo--------------->
    <------------------------cookie_ack
    after this diameter establised
    CER-------------------------------->
    <----------------------------CEA

    SCTP heartbeat ------------->
    <-----------------------DWR
    SCTP abort ------------------>

    so from above as nodeB didnt responded to sctp heartbeat message that why Node a sends SCTP abort message .
    but just one last question :) why node A didnt responded to DWR is it because of transport layer that is node A didnt recived the DWR message and same could be the reason for Node B didnt responded to heartbeat message .

    Am i right ? kindly let me know your views too .

    Thanks and regards
    Victor

    ReplyDelete
  21. This is the one of the best and informatic blogspot i ever seen.thanks for such a nice and unique content with many tips , ideas and guide to other traveler.Thanks again.Car service in Fayetteville GA

    ReplyDelete
    Replies
    1. We appreciate you support.

      Thanks for your valuable time.
      Team-Diameter

      Delete
  22. Hi,
    Since DWR/DWA are not part of the load is it possible for a diameter peer to combine a DWA with another application message?
    I do see such combined messages in the same packet.

    Thank you

    ReplyDelete
    Replies
    1. Hi Dave

      Kindly share the usecase for above. So that we could understand your point of view.

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
  23. Thank you for your reply.
    It's a case where for example I see the ULR message and within the same packet I also see the DWR/A so it appears (in Wireshark) as follows:
    DIAMETER 970 cmd=3GPP-Update-Location Answer(316) flags=-P-- appl=3GPP S6a/S6d(16777251) h2h=1326b498 e2e=1326b498 | cmd=Device-Watchdog Answer(280) flags=---- appl=Diameter Common Messages(0) h2h=2987f1 e2e=2987f1 |

    Is this normal?

    ReplyDelete
    Replies
    1. Hi Dave,

      We see above mentioned issue as filter issue. We feel you have not applied a detailed filter. Kindly use below.


      tshark -R diameter -V | grep 'Frame\|Arrival Time:\|Internet Protocol Version\|Src Port:\|Diameter Protocol\|Request:\|Command Code:\|AVP:\|Result-Code'


      http://diameter-protocol.blogspot.in/2013/04/capture-diameter-messages-without-wire.html

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
  24. HI, What happens when you have a DRA in place? because NODE1 sends message to NODE2 through DRA, if NODE2 is down, NODE1 has no idea about NODE2?

    ReplyDelete
  25. HI, What happens when you have a DRA in place? because NODE1 sends message to NODE2 through DRA, if NODE2 is down, NODE1 has no idea about NODE2?

    ReplyDelete
    Replies
    1. Hi Naseem Rahman

      DRA shall return the reply with result code set to Unable to deliver. Following link shall help you.
      http://diameter-protocol.blogspot.in/2011/05/diameter-errors.html

      Thanks for your query.

      Happy to help you again.
      Team-Diameter

      Delete
  26. As per RFC 3539 section 3.4.1:

    Suppose there are 2 nodes - Node A and B. Now Node A has detected an inactivity (No request/response received upto Tw time) and it initiated DWR to Node B. Suppose Node A hasn't received anything in another Tw time (SO 2 Tw time has elapsed ). Now what should be the behaviour of Node A:
    1. Node A should fail-over the traffic towards secondary node(if available)
    2. Node A should fail-over the traffic and again initiate a DWR and won't break the transport connection (with primary node B)
    3. Node A should fail-over the traffic and directly break the transport connection ( and then it will try re-connecting this node)

    I presume that this behaviour is same for TCP and SCTP.

    ReplyDelete
    Replies
    1. Hi Gaurav,

      Any of the above mentioned behavior is possible. Behavior of Node-A totally depends on vendor's node configuration and deployment strategy.

      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
  27. Hi,
    Suppose node A send DWR to node B. Suppose Node A hasn't received anything in another Tw time and set pending flag.
    After that, node A send another DWR and hasn't receved anything in another Tw time but receved CCR message continuously in Twtime.

    In this case according to RFC implementation, Tw will be continuously reseted but pending flag still set. and will be failover sometime after.
    I think pending flag should be reset when receiving non-dwa messages. What do you think?

    ReplyDelete
    Replies
    1. Hi s.c Yang,


      If NodeA is receiving the CCR before Tw time expire then NodeA should not initiate next DWR message, rather work to process CCR.

      Basic idea of DWR is check whether Transport connection between node is up or not. if any node is continuously receiving messages from other node then DWR doesn't come in picture.


      Thanks for your query.
      Happy to help you again
      Team-Diameter

      Delete
  28. Yes, i know that.
    But i supposed first dwr failed and second dwr sent but not respond dwa but still receiving other msg at that time.
    Its the suppose of certain case.
    I am curious this rfc implementation logic.
    I am working in telecom company.

    ReplyDelete
    Replies
    1. Hi S.C.Yang,

      RFC-6733 gives us just constraints, implementation is deployment specific to a vendor.

      Here in above described situation, Kindly do following to reach to cause of mis-behaviour.
      1) Kindly capture trace and share
      2) How are you so sure DWR from node-A is reaching to Node-b.
      3) Kindly share logs of both nodes.


      If above is hypothetical situation then there can be multiple ways to handle it.
      1) Process CCR straight way and send response, as you are saying CCR is received continuously even before Tw time expire.
      Receiving other message continuously before Tw time means Transport is up. No DWR in Load case.


      Thanks for your query.
      Happy to help you again
      Team-Diameter

      Delete
  29. Really Thank you for the fast response.
    I didn't mean argue just curious and understand vender specific implementation. Just wonder RFC failover algorism.
    Here is hypothetical situation.


    This is just hypothetical situation for verifying failover algorism based on RFC.(not real situation)



    if two DWR fail then failover occurred



    Node-1 Node-2
    ----------------------------

    pending flag=0
    (no load on link)

    DWR_1 ------------->
    <---x(fail)--- DWA_1

    pending flag=1
    (Sudden load applied ex) continous CCR or CCA incoming)
    (continuous timer reset so no second DWR_2 will be triggered
    but pending flag still set to 1)




    (( 1 month later ))




    pending flag=1
    (no load on link)

    DWR_2 ------------->
    <---x(fail)--- DWA_2

    pending flag=2
    (failover occured because pending flag setted already
    1 month earlyer)




    Problem is, just one DWR fail cause failover situation because of the setted pending flag 1month earlyer.

    ReplyDelete
    Replies
    1. Hello S.C. Yang,

      your hypothetical situation is good, but again you missed the basic idea of DWR/DWA.
      When DWR is missed first time, it should be re-transmitted weather load arrives or not.
      Other logic: When you missed first DWR then on sudden load you must reset the flag as your transport has been established successfully and messages are being exchanged.

      Hopefully it helped you to understand the situation.

      Delete
    2. Very good explaination based on Good understanding of RFC. Thank you :)

      Delete
  30. I am able to understand the significance of Origin-State-Id in CER. But I am not able to understand how it will be handled, if it is sent in DWR. What is the significance of sending the same in DWR (or in fact any other application message like CCR etc).

    ReplyDelete
    Replies
    1. The Origin-State-Id AVP (AVP Code 278), of type Unsigned32, is a monotonically increasing value that is advanced whenever a Diameter entity restarts with loss of previous state, for example, upon reboot. Origin-State-Id MAY be included in any Diameter message, including CER.

      Use of Origin-State-Id:
      1. To allows other Diameter entities to infer that sessions associated with a lower Origin-State-Id are no longer active. If an access device does not intend for such inferences to be made, it MUST either not include Origin-State-Id in any message or set its value to 0.
      2. An access device/client can also include the Origin-State-Id in request messages other than the CER if there are relays or proxies in between the access device and the server.

      Delete
  31. This comment has been removed by the author.

    ReplyDelete
  32. Hello ,

    My query in standard protocol site , its been mentioned in rfc6733 page 66 that when transport detection is detected that DWR message MUST NOT be sent to alternate peer ? could you please elaborate this.

    ReplyDelete
  33. Hello Team-Diameter .
    I have a simple question .
    can diameter server such a credit control server send DWR to the client ?
    Client team said they node cannot accept DWR from server , is that true ?
    I looked for an answer in the RFCs documents, but I did not find any reference for that .
    Best regards
    Golan

    ReplyDelete
  34. Hi,

    I am getting error at the time of exchange of diameter messages. I'm acting as a diameter server. Proper CER/CEA exchange happens, so does DWR/DWA but in between I am getting the error.. connection reset by peer. What can be the possible reason behind this

    ReplyDelete
  35. Nice article.
    How the detection will happened transport layer failure, in case of DWR timeout?

    ReplyDelete
  36. Can you pls explain in DPR the 3 cause is configurable in DRA/DSC. what are the possible reasons with example
    ?

    ReplyDelete
    Replies
    1. Hi Pankaj Pandey,

      Kindly explain your requirement in detail.
      3-Causes of DPR are explained in following link.
      https://diameter-protocol.blogspot.com/2011/05/diameter-peer-connection-and.html


      Cause DONOT_WANT_TO_TALK : can be sent to connection if agreement between any two operators has end with respect to policies or validity. It totally depends on operator's requirement.



      Thanks for your query.
      Happy to help you again.
      Team-Diameter

      Delete
  37. Hi,

    Consider Client-A has two links with Client-B with different realm. If one of the link is disconnected with Client-B then Client-A sends the same request/update message with same session-id using redundant Link or realm.

    Question:
    1. Client-B will detect it as new request as HOP-by-HOP identifier will be changed for another realm but session-id will be same?

    2. How client-B will identify one request/update as a unique request , it is using only diameter session-id or a combination of session-id and HOP-by-HOP identifier?

    ReplyDelete
  38. What will be the issue if New origin-state-id is lower than current one ?

    Thanks,
    Robin

    ReplyDelete