Modern data warehousing solutions are often highly distributed and cloud-based. Users, usually authenticated via security mechanisms available in solutions like Active Directory, should have secure access to the data warehouse or a Data Lake, hosted anywhere and on any platform (Windows, Linux).
Technologies like Apache Spark or Microsoft Azure HDInsight do not provide an easy way to configure use of Windows Integrated Authentication (WIA). Some other data sources do not support WIA at all.
Increasingly common, innovative business projects have a need to integrate various databases to extract information. Those of us, who were in the industry for long enough, have seen it all. Relational and non-relational databases, CSV or Excel files. Most probably, those sources were not even designed to be used together.
Traditionally, some kind of ETL project had to be created. It would load the data into “stage” tables, apply necessary transformations (like data type unification), and finally load the data somewhere. Time of the project would be measured in days or weeks rather than hours.
Moreover, qualified staff is required, with knowledge about all source systems and the destination system. Read more