This paper addresses the area of structured video indexing and retrieval, and proposes an approach based on fusion of visual and aural features with domain knowledge to detect for the first time, the structure or form of story narration in broadcast news. News producers employ established techniques to transform a news story into a captivating visual and aural narration. Our system detects the initial form of news using low-level processing of scene content in news videos, in conjunction with domain knowledge. Higher level processing is then directed by the initial structure detected to improve and extend the preliminary classification. The structure detected breaks the broadcast into segments, each of which pertains to a single topic of discussion. Further, the segments are labelled as a) anchor person or reporter, b) voice over or c) sound bite footage. This labelling may then be used in automatic annotation and construction of topical video summaries for efficient browsing. Experimental results on CNN news videos demonstrate the effectiveness of our system.