[aerogear-dev] Simplify the metrics for sanity

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[aerogear-dev] Simplify the metrics for sanity

Matthias Wessendorf
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?

-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Oleg Matskiv
Hi Matthias,
I agree with your idea. I think that device counter for Android is really confusing so lets remove it. And as you described it, pending state doesn't add much value.

Cheers,
Oleg

On Wed, May 24, 2017 at 10:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?

-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--
Oleg Matskiv
Associate Quality Engineer
Red Hat Mobile Application Platform
[hidden email]

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Leigh Griffin
+1 to removing it and rethinking the value in what is presented!

It could also lead to false assumptions about end device delivery, when in reality it's delivering it to the gateway. 

On Wed, May 24, 2017 at 4:58 PM, Oleh Mackiv <[hidden email]> wrote:
Hi Matthias,
I agree with your idea. I think that device counter for Android is really confusing so lets remove it. And as you described it, pending state doesn't add much value.

Cheers,
Oleg

On Wed, May 24, 2017 at 10:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?

-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--
Oleg Matskiv
Associate Quality Engineer
Red Hat Mobile Application Platform
[hidden email]

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

LEIGH GRIFFIN

ENGINEERING MANAGER, MOBILE

Red Hat Ireland

Communications House, Cork Road

Waterford City, Ireland X91NY33

[hidden email]    M: <a href="tel:+353877545162" style="color:rgb(0,136,206);font-size:11px;margin:0px" target="_blank">+353877545162     IM: lgriffin


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Jose Miguel Gallas Olmedo
I say,



and then rethinking what value we want to give and how to do it properly.

Just one thing, we need the "pending" state for the UI as a "loading" state, from the moment we click the button "send notification" until one of the two states you propose is reached.


On 25 May 2017 at 13:26, Leigh Griffin <[hidden email]> wrote:
+1 to removing it and rethinking the value in what is presented!

It could also lead to false assumptions about end device delivery, when in reality it's delivering it to the gateway. 

On Wed, May 24, 2017 at 4:58 PM, Oleh Mackiv <[hidden email]> wrote:
Hi Matthias,
I agree with your idea. I think that device counter for Android is really confusing so lets remove it. And as you described it, pending state doesn't add much value.

Cheers,
Oleg

On Wed, May 24, 2017 at 10:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?

-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--
Oleg Matskiv
Associate Quality Engineer
Red Hat Mobile Application Platform
[hidden email]

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

LEIGH GRIFFIN

ENGINEERING MANAGER, MOBILE

Red Hat Ireland

Communications House, Cork Road

Waterford City, Ireland X91NY33

[hidden email]    M: <a href="tel:+353877545162" style="color:rgb(0,136,206);font-size:11px;margin:0px" target="_blank">+353877545162     IM: lgriffin


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

JOSE MIGUEL GALLAS OLMEDO

ASSOCIATE QE, mobile

Red Hat 

<span href="tel:+34618488633">M: +34618488633    


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Matthias Wessendorf


On Fri, May 26, 2017 at 9:48 AM, Jose Miguel Gallas Olmedo <[hidden email]> wrote:
I say,


and then rethinking what value we want to give and how to do it properly.

Just one thing, we need the "pending" state for the UI as a "loading" state, from the moment we click the button "send notification" until one of the two states you propose is reached.

Ok, the server has an "All Batches" loaded event, this one could be used to implement that.

One problem is, that the "loading" means -> nasty poliing of the server, until it is "done".
Unfortunately the queries are not that cool, they are a mess, for the "metrics" 

Also, one part of the problem is, that naively the UI aims to be a real-time UI, which current architecture does not allow us

-M
 


On 25 May 2017 at 13:26, Leigh Griffin <[hidden email]> wrote:
+1 to removing it and rethinking the value in what is presented!

It could also lead to false assumptions about end device delivery, when in reality it's delivering it to the gateway. 

On Wed, May 24, 2017 at 4:58 PM, Oleh Mackiv <[hidden email]> wrote:
Hi Matthias,
I agree with your idea. I think that device counter for Android is really confusing so lets remove it. And as you described it, pending state doesn't add much value.

Cheers,
Oleg

On Wed, May 24, 2017 at 10:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?

-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--
Oleg Matskiv
Associate Quality Engineer
Red Hat Mobile Application Platform
[hidden email]

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

LEIGH GRIFFIN

ENGINEERING MANAGER, MOBILE

Red Hat Ireland

Communications House, Cork Road

Waterford City, Ireland X91NY33

[hidden email]    M: <a href="tel:+353877545162" style="color:rgb(0,136,206);font-size:11px;margin:0px" target="_blank">+353877545162     IM: lgriffin


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

JOSE MIGUEL GALLAS OLMEDO

ASSOCIATE QE, mobile

Red Hat 

<span href="tel:+34618488633">M: +34618488633    


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Summers Pittman
In reply to this post by Matthias Wessendorf


On Wed, May 24, 2017 at 4:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?


What if we remove the current metrics UI and replace them with webhooks that emit events?  It lets us add events easily, somewhat simplifies debugging, and gives integrators a lot more control and hooks into our process.  We can even turn the current metrics into a microservice project as an example.  (Doubly so when we get Keycloak broken out and properly integrated)
 
-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Matthias Wessendorf


On Tue, May 30, 2017 at 2:06 PM, Summers Pittman <[hidden email]> wrote:


On Wed, May 24, 2017 at 4:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?


What if we remove the current metrics UI

For sanity, we are also simplifying the UI:
 
and replace them with webhooks that emit events?

In the long run, I am open to anything else. I think I mainly care about the actual push delivery and the events that we will be submitting to a centralized data hub/pipeline, such as Kafka.

From there, a consumer process (written in what ever language) can offer webhooks etc
 
  It lets us add events easily, somewhat simplifies debugging, and gives integrators a lot more control and hooks into our process.  We can even turn the current metrics into a microservice project as an example.  (Doubly so when we get Keycloak broken out and properly integrated)

the overall idea is to break the server in to a more modular system:
* push-sender.war
* metrics-processor.war (or jar)
* device-regitration.war 
* UI process 

I think decoupled keycloak would be also key to this, or what do you mean ? 

 
 
-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Summers Pittman


On May 30, 2017 3:23 PM, "Matthias Wessendorf" <[hidden email]> wrote:


On Tue, May 30, 2017 at 2:06 PM, Summers Pittman <[hidden email]> wrote:


On Wed, May 24, 2017 at 4:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?


What if we remove the current metrics UI

For sanity, we are also simplifying the UI:
 
and replace them with webhooks that emit events?

In the long run, I am open to anything else. I think I mainly care about the actual push delivery and the events that we will be submitting to a centralized data hub/pipeline, such as Kafka.

From there, a consumer process (written in what ever language) can offer webhooks etc
 
  It lets us add events easily, somewhat simplifies debugging, and gives integrators a lot more control and hooks into our process.  We can even turn the current metrics into a microservice project as an example.  (Doubly so when we get Keycloak broken out and properly integrated)

the overall idea is to break the server in to a more modular system:
* push-sender.war
* metrics-processor.war (or jar)
* device-regitration.war 
* UI process 

I think decoupled keycloak would be also key to this, or what do you mean ? 
What I meant was it is easier to secure services with a decoupled keycloak. 

 
 
-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [aerogear-dev] Simplify the metrics for sanity

Matthias Wessendorf
+9001 

On Wed, May 31, 2017 at 11:46 AM, Summers Pittman <[hidden email]> wrote:


On May 30, 2017 3:23 PM, "Matthias Wessendorf" <[hidden email]> wrote:


On Tue, May 30, 2017 at 2:06 PM, Summers Pittman <[hidden email]> wrote:


On Wed, May 24, 2017 at 4:30 AM, Matthias Wessendorf <[hidden email]> wrote:
Hi,

we do have a problem w/ our current metrics processing. It's complicated (lot's of CDI events and two different JMS messaging approaches...) and also slow (JPQL/JDBC) and it does consume a lot of memory and processing time. This is leading to bugs (incorrect stats) and eventually causes down times, due to heavy processing.

I'd like to dramatically simplify our metrics processing... to something like:
Success -> could connect to 3rd party, to deliver tokens
Failure -> something went wrong when talking to 3rd party service.


Right now we do have metrics on push delivery:
Pending -> the submission to the 3rd party provider is in flight
Success -> we were able to connect, and could deliver *something*
Failure -> something obvious, like invalid certificate (APNs), no connection to 3rd party possible, etc

Besides that, we also do a count on targeted devices. I think there is not really a huge value. For instance if APNs rejects some tokens, we do not track those, we just show how many tokens our DB did find, not more. We don't show any of real interest. We could improve this (see below), but I doubt that the current implementation is able to handle this well.

Also, on Android/FCM the numbers are even worse. We do, internally, leverage their topics, so we usually end up sending exactly one push to FCM, regardless of how many Android device-tokens we have in the DB. The counter says 1 (one), because the server did target one topic (not n devices).

So, for now, I'd like to dramatically simplify the code, and go with the above Success/Failure solution.

However, I honestly think in the long run, we should get something pluggable, that allows us to process the metrics independently, outside of the UPS code base. I think my previous Kafka mail is addressing this partially: The actual response and details about the push job should be logged to some Kafka system, and an independent process should be able to process those. 

This will give us much more freedom and flexibility. Perhaps also, in the future, we want some different stats, and something like Prometheus /Grafana:

A more flexible system, with independent metrics 'calculation' processing will help us here.

Any thoughts?


What if we remove the current metrics UI

For sanity, we are also simplifying the UI:
 
and replace them with webhooks that emit events?

In the long run, I am open to anything else. I think I mainly care about the actual push delivery and the events that we will be submitting to a centralized data hub/pipeline, such as Kafka.

From there, a consumer process (written in what ever language) can offer webhooks etc
 
  It lets us add events easily, somewhat simplifies debugging, and gives integrators a lot more control and hooks into our process.  We can even turn the current metrics into a microservice project as an example.  (Doubly so when we get Keycloak broken out and properly integrated)

the overall idea is to break the server in to a more modular system:
* push-sender.war
* metrics-processor.war (or jar)
* device-regitration.war 
* UI process 

I think decoupled keycloak would be also key to this, or what do you mean ? 
What I meant was it is easier to secure services with a decoupled keycloak. 

 
 
-Matthias


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev


_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev



--

_______________________________________________
aerogear-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/aerogear-dev
Loading...