Our MDM Platform is built upon the Web 2.0 framework - a next-generation enterprise application. It uses True SOA built on the Microsoft .NET Framework to create a flexible and robust enterprise suite of applications. Our MDM platform helps in:
- Building and sharing of highly efficient knowledge repositories
- Easier search and retrieval options
- Increase visibility through enhanced collaboration between users, partners and customers
- Easier integration into other custom applications
The 4 major components to the MDM product module are:
- Extraction module
- Deduplication module
- Standardization module
- Classification module
In addition to the above-mentioned module, there is also a workflow module to integrate the workflow with the MDM process.
The extraction module is part of the ETL technology that is used to extract data from a wide range of data formats.
- Supports CSV, tab delimited, Excel or any other file formats
- Spend related specific data types are included. Money and Date are specific data types that have been included. Each data type has its own set of pre-processing rules associated with it
- Data is imported to a structured SQL database. This is a relational data model and includes higher security features
- Option to store and retrieve data, based on unique identifiers. Supports all types of complex queries
- All erroneous records are displayed separately
- Option to export error records separately and provide it back to the client
- Supports data received in multiple batches
- Flexible enough to support custom fields
The deduplication module is part of the data cleansing activity. This is the first process after extraction. There are 2 parts to the de-duplication process.
- Automatic De-duplication
- Manual De-duplication
- Highly flexible to support configuration of parameters for deduplication
- Automatically applies filters to convert commonly used conventions such as Ltd., Pvt., etc. to standardized naming conventions
- Automatically deduplicates records based on set parameters
- Partial duplicates are shown as "Probable Duplicates" and sent for manual analysis
- Platform learns from manual analysis and manually treats tagged records as duplicates thereafter
- Partial duplicates identified based on split-word analysis as well
- Duplicate identification based on Soundex algorithm
The standardization module is part of the data cleansing activity. This process follows deduplication phase.
- Standardization is based on reference to knowledge repositories
- The standardization engine refers to
- "Company alias name" table to identify different company name conventions and standardize it accordingly
- "Product lexicon" table, to identify different product name conventions and standardize it accordingly
- Supports manual analysis and standardization of records that are not covered by the existing knowledge repositories
- Standardization engine automatically learns from the manual standardization process and updates the same with the knowledge repository
- Simple user interface to standardize records
- Complete audit trail of all activities. Record wise details of user actions on a data field
- Supports subject matter based work allocation for product standardization
After data standardization, an automatic deduplication process is run to ensure removal of duplicate records.
Classification module is the most important module of MDM platform. This module helps in providing clear visibility into details of any transaction or master data. All BI functions that users would be using after this module depend on the accuracy of classification in this module.
- Supports all standard industrial taxonomies such as UNSPSC, SIC, etc.
- Supports all custom provided taxonomies
- Supports creation of new taxonomies
- Provision to define thesaurus, classification code and definition for a node
- Supports automated classification based on Boolean logic
- All rules developed for automated classification are tested with sample sets, prior to classifying live data
- If records cannot be automatically classified then all such records are tagged as "Ambiguous records" and displayed to manual analyst for classification