Without meta data, business users will be like tourists left in a new city without any information about the city, and data warehouse administrators will be like he town administrators who have no idea about the size of the city or how fast it is growing. Despite its criticality, meta data continues to remain the most neglected part of many data warehousing projects. “We shall worry about it later” is usually the approach.
Different data warehousing systems have different structures. Some may have an ODDS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than cussing the specifics of any one system.
In general, all data warehouse systems have the following layers: C] Data Source Layer C] Data Extraction Layer C] Staging Area TTL Layer 0 Data Storage Layer C] Data Logic Layer C] Data Presentation Layer C] Metadata Layer C] System Operations Layer The picture below shows the relationships among the different components of the data warehouse architecture: Metadata Layer This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that’s in the metadata layer. A metadata tool is often to used to manage metadata. Q 3.
Write briefly about four TTL tools. What is Transformations? Briefly explain the basic transformation types . NAS : TTL (Extract-Transform-Load) TTL comes from Data Warehousing and stands for Extract-Transform-Load. TTL covers a process of how the data are loaded from the source system to the data warehouse. Currently, the TTL encompasses a cleaning step as a separate step. The sequence is then Extract-Clean-Transform-Load. Let us briefly describe each step of the TTL process. Process Extract The Extract step covers the data extraction from the source system and makes t accessible for further processing.
The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible. The extract step should be designed in a way that it does not negatively affect the source system in terms or performance, response time or any kind of locking. Clean The cleaning step is one of the most important as it ensures the quality of the data in the data warehouse Transform The transform step applies a set of rules to transform the data from the source to the target. This includes converting any measured data to the same omission (i. E. Informed dimension) using the same units so that they can later be joined. The transformation step also requires joining data from several sources, generating aggregates, generating surrogate keys, sorting, deriving new calculated values, and applying advanced validation rules. Load During the load step, it is necessary to ensure that the load is performed correctly and with as little resources as possible. The target of the Load process is often a database. In order to make the load process efficient, it is helpful to disable any constraints and indexes before the load and enable them back only after the load completes.
The referential integrity needs to be maintained by TTL tool to ensure consistency. Transformation : In many cases a complex picture can always be treated as a combination of straight line, circles, ellipse etc. , and if we are able to generate these basic figures, we can also generate combinations of them. Once we have drawn these pictures, the need arises to transform these pictures. We are not essentially modifying the pictures, but a picture in the center of the screen needs to be shifted to the top left hand corner, say, or a picture needs to be increased to twice it’s size or a picture is o be turned through 900.
In all these cases, it is possible to view the new picture as really a new one and use algorithms to draw them, but a better method is, given their present form, try to get their new counter parts by operating on the existing data. This concept is called transformation. The three basic transformations are Translation rotation and (iii) Scaling. Translation refers to the shifting of a point to some other place, whose distance with regard to the present point is known. Rotation as the name suggests is to rotate a point about an axis. The axis can be any of the ordinates or simply any other specified line also.
Scaling is the concept of increasing (or decreasing) the size of a picture. (in one or in either directions. When it is done in both directions, the increase or decrease in both directions need not be same) To change the size of the picture, we increase or decrease the distance between the end points of the picture and also change the intermediate points are per requirements. Q. 4. What are ROLAP , MOLAR , HOLAP ? What is multidimensional analysis ? How do we achieve it ? NAS : Cubes in a data warehouse are stored in three different modes.
A relational storage del is called Relational Online Analytical Processing mode or ROLAP, while a Multidimensional Online Analytical processing mode is called MOLAR. When dimensions are stored in a combination of the two modes then it is known as Hybrid Online Analytical Processing mode or HOLAP. MOLAR This is the traditional mode in OLAP analysis. In MOLAR data is stored in form of multidimensional cubes and not in relational databases. The advantages of this mode is that it provides excellent query performance and the cubes are built for fast data retrieval.
All calculations are pre-generated when the cube is created ND can be easily applied while querying data. The disadvantages of this model are that it can handle only a limited amount of data. Since all calculations have been pre-built when the cube was created, the cube cannot be derived from a large volume of data. This deficiency can be bypassed by including only summary level calculations while constructing the cube. This model also requires huge additional investment as cube technology is proprietary and the knowledge base may not exist in the organization.
ROLAP The underlying data in this model is stored in relational databases. Since the data is stored in relational databases this model gives the appearance of traditional Lap’s slicing and dicing functionality. The advantages of this model is it can handle a large amount of data and can leverage all the functionalities of the relational database. The disadvantages are that the performance is slow and each ROLAP report is an SQL query with all the limitations of the genre. It is also limited by SQL functionalities.
ROLAP vendors have tried to mitigate this problem by building into the tool out-of-the-box complex functions as well as providing he users with an ability to define their own functions. HOLAP HOLAP technology tries to combine the strengths of the above two models. For summary type information HOLAP leverages cube technology and for drilling down into details it uses the ROLAP method. Q. 5. Explain testing process for authoresses with necessary diagram? NAS : Data Warehouse Testing Increasingly, businesses are focusing on the collection and organization of data for strategic decision making.
The ability to review historical trends and monitor near real-time operational data has become a key competitive advantage. SQ Solution provides practical recommendations for testing extract, transform, and load (TTL) applications based on years of experience testing data warehouses in the financial services and consumer retailing areas. There is definitely a significantly escalating cost connected with discovering software defects later on in the development lifestyle. In data warehousing, this can be worsened due to the added expenses of utilizing incorrect data in making important business decisions.
Given the importance of early detection of software defects, here are some general goals of testing an TTL application: Data impoliteness. Ensures that all expected data is loaded. Data transformation. Ensures that all data is transformed correctly according to business rules and/or design specifications. Data quality. Makes sure that the TTL software accurately rejects, substitutes default values, fixes or disregards, and reports incorrect data. Scalability and performance. Makes sure that data loads and queries are executed within anticipated time frames and that the technical design is scalable.
Integration testing. Ensures that the TTL process functions well with other upstream and downstream processes. User-acceptance testing. Makes sure that the solution satisfies your current expectations and anticipates your future expectations. Regression testing. Makes sure that current functionality stays intact whenever new code is released. Q. 6. What is testing ? Differentiate between Data warehouse testing and Traditional software testing ? NAS : Testing : 1 . Take measures to check the quality, performance, or reliability of (something), esp.. Fore putting it into widespread use or practice. 2. Reveal the strengths or capabilities of (someone or something) by putting them under strain: “such behavior would test any marriage”. Actually, traditional application testing doesn’t look at both acceptance testing of an application and the data content. A data warehouse project must actually ensure that acceptance testing is executing against the data (loading, value accuracy, transformation, completeness, etc. ) as well as the application functionality (reporting accuracy, display completeness, response time, etc. . Testing for data warehouse is quite different from testing the development of ALTO systems. The main areas of testing for ALTO include testing user input for valid data type, edge values, etc. Testing for data warehouse, on the other hand, cannot and should not duplicate all of the error checks done in the source system. Even though there are some data quality improvements, such as making sure postal codes are associated with the correct city and state that are practical to do, data warehouse implementations must pretty much take in what the ALTO system has produced.
Testing for data warehouse falls into three general categories. These are testing for TTL, testing that reports and other artifacts in the data warehouse provide correct answers and lastly that the performance of al the data warehouse components is acceptable Here are some main areas of testing that should be done for the TTL process: Making sure that all the records in the source system that should be brought into the data warehouse actually are extracted into the data warehouse: no more, no less.