We use Cognos products. All of our ETL jobs happen in Cognos Data Manager. Our BI group within IT and everybody in Finance relies on several ETL jobs to run throughout the day. Our company is international, and several people in Europe also rely on these jobs. 24/7 uptime is critical. These jobs take anywhere from 30 minutes to 3.5 hours to run, and several of them have dependencies on the database, which means they have to be run independently. Because of the number of jobs and their interdepedencies, they are spread throughout the day and night with a small window for error. After the jobs complete, the OLAP cubes get built. For example: ETL Job 1 - 8:00 AM (2 hours) ETL Job 2 - 11:00 AM (30 minutes) ETL Job 3 - 12:30 PM (3 hours) ETL Job 4 - 6:00 PM (30 minutes) etc. Problems: - We have no way of knowing if a job takes longer than usual unless we manually check the log files. If a job hangs, the next job begins to run and both jobs will go to hell. This has caused lots of headaches in several departments all the way up to the CFO. - If a cube build fails, an e-mail gets sent out to the BI team. However, if an ETL fails, no notification is sent out. Again, the only way to know if the job succeeded or failed is by checking the logs. - Due to the manual nature of these jobs, one person out of the team is "on support" for a week. This entails staying up to see if the jobs run well and the cubes are built successfully. Needless to say, this sucks. - Due to some recent job failures and the lack of notification, my manager told me to monitor these jobs EVERY DAY and report their status via e-mail. Nevermind the fact I'm only working in BI 33% of the time. Question: What is done at your company to ensure 24/7 availability?