• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Monitoring Tools

Page history last edited by Ken Wasetis 12 years, 1 month ago Saved with comment

The first thing you should do when setting up your system is spend some time setting up monitoring tools. How can you make quantitative changes if you aren't measuring? Here is an overview of tools that have worked well for people and what they do.


System Monitoring

  1. Munin is simply an amazing tool, whether you have 1 machine or 10000. It measures system statistics over time and will help you grasp the concept of what a "normal" system state is. It also allows custom plugins, and the plone community has already responded to that with things like munin.zope and there is also one for zope thread watcher called ZopeHealthWatcher. Ganglia is a similar package that offers much of the same functionality. Others?
  2. Monit and munin are best friends. Monit does the same thing as munin when it comes to monitoring except that it doesn't collect data over time, and if something looks fishy it takes corrective action.  What kind of action you say? Anything you ask it too! You can email alerts, automatically restart downed processes, monitor disk space, run bash script and the list goes on. How many times have you forgotten to rotate logs and run out of disk space? Monit could have told you weeks befre that happened.  What about zope using too much memory? No problem, just have monit restart zope when it reaches a certain percentage (you can get some sweet performance this way). I put some examples here, but please don't copy them word for word - they are just for ideas! Similar products include nagios and supervisor, but most people will agree that monit will win your heart here.
  3. Zope Health Watcher is perfect for finding out exactly which pages are taking a super long time (i.e. did an addIndex operation tie up your zope for all eternity?). It's simple in that it just lets you know at any time, which threads are rendering which requests. You'll be surprised how useful this can be.
  4. Just found out about this gem that monitors the length of requests in zope 2.12+ and the top like functionality that goes with it. Haven't tried it but it looks hot hot hot!
  5. Zenoss includes a lot of the features of munin and monit, and includes a bunch of network monitoring too. Again I  have not tried it but if there are opinions out there feel free to share.


Error Monitoring

If you want to be hardcore, you can write sweet scripts that pipe through all of your logs and crunch them down to statistics or maybe put them in databases. That's great, but for everyone else try these tips:

  1. Install PloneErrorMonitoring and/or customize your error pages to be, you know, useful.
  2. Put Google Analytics javascript tracking on any 4xx and 5xx page renders. Then, you can just log onto your analytics account and see where bad links are, where things timed out, etc... I don't recommend this on every page since it has the potential to slow down your final render time by up to a second.
  3. Soup up the logging module to send emails (or do something else) when an error is triggered. Check out the maillinglogger package for quick and easy setup. For those that want to roll their own: there are things to consider. You really need to think when you are coding, is this really an error worthy of ending up in my inbox? If not, downgrade that message to warning. The goal is to have a system so stable you get as few emails as possible, and it is possible! Also remember that sending an email is by no means free. If your system is hitting the crapper and triggers 1000 emails per minute, not only is your email admin going to kill you, but the system is going to double over on itself. Buffering in memory help ease that pain by chunking those emails so you know whats wrong, just not * 1000. The downside to this is that important errors may not get to you until either the buffer is filled or you have a restart. In my experience though, most really important errors come in 100's, if not 1000's. If you keep this code nice and clean, you can use this in all of your packages, not just zope and plone.
  4. If you don't want to get into the code to filter through the logs, checkout Arecibo and the plone buildout plugin, which we believe has not been moved to Andy's Github space at:  https://github.com/andymckay/arecibo.



Comments (7)

Nate Aune said

at 3:11 pm on Dec 18, 2009

Hey Elizabeth -- great tips! For monitoring the error log across multiple Plone sites with a unified dashboard and email/text notifications, you should have a look at Andy McKay's Arecibo app. http://www.areciboapp.com
There is a Plone plugin here: http://www.areciboapp.com/listener/docs/plone/

Jean said

at 9:57 pm on Apr 24, 2011

I don't think ZopeHealthWatcher is working for current releases.

Ken Wasetis said

at 11:23 am on Jun 5, 2012

I believe Arecibo app moved to: https://github.com/andymckay/arecibo, so I'll update page accordingly, but has anyone used it recently? Still useful or anything more highly recommended?

Elizabeth Leddy said

at 12:16 pm on Jun 5, 2012

I still use it, but in my own fork https://github.com/eleddy/clearwind.arecibo

Jean said

at 10:03 pm on Jun 5, 2012

Looks like Andy's site is down, but from the syndicated copy at http://planetdjango.org/ on May 08, 2012 I read: "A few years ago David Cramer started Sentry. It's now surpassed passed Arecibo in terms of functionality. The real winner for us was the addition of UDP support, we don't really care about storing every single error and having something non-blocking is crucial. So we've started to shift away from Arecibo towards to Sentry at Mozilla web development." https://www.getsentry.com/welcome/

Ken Wasetis said

at 10:27 am on Jun 6, 2012

Thanks for the latest update, all!

Jean said

at 7:20 pm on Jun 6, 2012

Another candidate, via twitter: ‏@mgedmin "Compared to Zilch, Sentry is big and complicated." https://github.com/bbangert/zilch
And then via Zilch, a pointer to Raven (standalone Python client for Sentry, no need for Django). https://github.com/dcramer/raven

You don't have permission to comment on this page.