Showing posts with label transform. Show all posts
Showing posts with label transform. Show all posts

Wednesday, March 7, 2012

Only transform unique rows

Hi,

I have created a SSIS package which includes a Data Flow that will transform my data. The Data Flow Source for this Data Flow is a SQL Server table with approx. 60 mill. rows. Theses rows ought to be unique but isn't quite, so I would like to ensure that only one of each row makes it throug to my Data Flow Destination, another SQL Server table.

What is the most efficient way to ensure this - is there some Data Flow Transformation that can assure this, should I write a SQL statement that deletes all duplicates before the Data Flow or should I take a third approch?

- SuneI can see two ways to do so:

1. OLE DB source -> Sort -> OLE DB dest

Configure Sort transform to "Remove rows with duplicate sort values".

2. OLE DB source -> Aggregate -> OLE DB dest

Configure Aggregate transform to group by all columns.

No.2 might be more efficient. Would you like to try and tell us the results? I'm very interested to know.|||Actually there is a third way and that is to use the T-SQL statement to do the dedupe. This can be much more efficient than either of the 2 mentioned above since the engine is optimized to do this especially if it can use an index. I think you would have to do some benchmarks to determine which is optimal for your particular scenario.

Thanks,
Matt|||I have actually been using a T-SQL statement up until now and I has been working okay. However all of a sudden performance have degraded significantly which made me wonder if there was any alternative ways.

I hope to get around to make some benchmarks during the weekend and promise to post the results.

Thanks for your help so far.

- Sune

Monday, February 20, 2012

Online Data Transform

Dear All,

I am new to SSIS, and just watched some of the webcasts and take a look at SSIS Hands on Lab on the MSDN site.

I want to handle an online data transformation scenario which has some data mappings between. I want the SSIS to add each new record in source table, insert in the destination tables online. If this is possible, please let me know where can I find a sample package and configurations.

Happy new year,
Sassan

What is an "online data transformation"?

What do you mean by "insert into the destination table online"?

Do you mean that you want each record in a source table to be inserted into a destination table? If so, what exactly are you having trouble achieving?

Please be more specific.

-Jamie

|||

Dear Jamie,

Thanks for taking time to answer my question.

Your assumption from my scenario is right. I just wanted to know whether SQL 2005 support this scenario and if yes, I need a sample package and its configuration.

Look forward to hearing from you ASAP.

Regards,
Sassan

PS. I use BizTalk Server 2004 now and I want to change it using SSIS

|||

There are lots of SSIS samples provided with SQL 2005, take a look.

The task you are describing seems to fit exactly Import/Export Wizard goals - just run the wizard, follow it, and it will create the package you need.

If needed, you may then open this package in BI Dev Studio and modify it further.

|||

Dear Michael

As I am working with the BI Dev Studio and ran some of the Hands On Labs, I see that after I create the package, I must run it (Or schedule to run it regurarly) and then all the data wil be transformed by the rules I set.

I need something like trigger in order my package run by each record addition in source.

Therefore I need a sample for a scenario I explained.

Regards,

Sassan

|||

So, you mean you want the SSIS package to execute whenever a certain event occurs? Well, you can execute a SSIS package using a command line utility called dtexec.exe. It is up to you how you capture the events that you need trigger the call to dtexec.exe

-Jamie

|||

Dear Jamie,

You are right. I have two questions which needs help:

1- I have an application server which stores data via its Data Access Layer and My DB Server is not on the same server, Therefore is there another way except using triggers?

2- How can I call the package for transforming just the new record that I want to be added in the old system. I see that running package cause all data to be transformed. The only solution I see is that I set a flag in the source record after transformation and then in my package use a conditional block to just transform the records with the flag unset.

Look forward to hearing from you ASAP.

Regards,
Sassan

|||

sassankz wrote:

You are right. I have two questions which needs help:

1- I have an application server which stores data via its Data Access Layer and My DB Server is not on the same server, Therefore is there another way except using triggers?

Don't really understand what you're trying to do. When a record arrives in your "data access layer" (which I presume is a database), do you want to automatically put it into your DB server? If so....SSIS is NOT the tool that you should be using to do this. SQL Server has a number of methods of achieving this. Look at replication, log shipping or data mirroring.

sassankz wrote:

2- How can I call the package for transforming just the new record that I want to be added in the old system. I see that running package cause all data to be transformed. The only solution I see is that I set a flag in the source record after transformation and then in my package use a conditional block to just transform the records with the flag unset.

If you are going to continue using SSIS (which I don't think you should) then that sounds like a possible solution.

-Jamie

|||

Dear Jamie,

Let me explain the whole scenario. I am developing a Human Resource Management System for my client. Therefore I have my database design based on the customer requirements. The client has a payroll application which is working now and will continue to use it also. Therefore only HRM handles the Employee Introduction and I must feed the Payroll required data in order to work properly parallel with HRM.

Now there exists some data transformation from HRM to Payroll and also the data must be synchronized. It means that if I even can delete an employee from HRM, for sure I can do that it in payroll and must do that.

Now I want to have SSIS for this scenario. I want to develop some packages which handles the transformation and like you said some how data mirroring.

Please let me know whether you see SSIS useful for the above task or not.

Regards,
Sassan

|||

SSIS is inherently a tool for batch operations. If you want the changes between databases to be propogated immediately then you should be loking at some other tool in my opinion.

If the data has to be transformed in some way then perhaps you may want to look at BizTalk. I don't know much about BizTalk but this seems to be the sort of scenario for which it is designed.

-Jamie

|||

Dear Jamie,

Thanks for your recommendations. I decided to schedule package runs.

Is there a solution in SSIS in order to update destination source if the row exists, since it truncate the whole table or append data in its wizard? (I know I can write SQL Codes, I just want to know whether it has a ready configuration to support this.)

Regards,
Sassan

|||

Sassan,

SSIS is really not good in updates... Doing that in SQL is much better... So it might be a good idea to design something like an "update table" which is filled within the data flow where you "buffer" all the changes and commit that updates to the "real" table using a SQL update...

|||

Sassan,

just to add a little thing...

I just wrote something about the updates on the whislist maintained by Kirk in this forum... So I thought about how to implement that "update table"...

There might be (at least) to approaches, one is "clean" and one is "dirty", but might be a little bit more performant...

The "clean" way would be using a lookup to find out if a record already exists in the destination. If it does, then it's an update (redirect the record to the update table), if not it's new and you can send it to the destination.

The "dirty" way would be to send all records to the destination. If the write fails (because the primary key already exists) you redirect the error records to the update table. I didn't try this approach, yet. But it might work...

You might get into trouble if you can have a record more than once in a run. Then it might be a "new" and an "update" in the same batch. That might be interesting to find out...

Online Data Transform

Dear All,

I am new to SSIS, and just watched some of the webcasts and take a look at SSIS Hands on Lab on the MSDN site.

I want to handle an online data transformation scenario which has some data mappings between. I want the SSIS to add each new record in source table, insert in the destination tables online. If this is possible, please let me know where can I find a sample package and configurations.

Happy new year,
Sassan

What is an "online data transformation"?

What do you mean by "insert into the destination table online"?

Do you mean that you want each record in a source table to be inserted into a destination table? If so, what exactly are you having trouble achieving?

Please be more specific.

-Jamie

|||

Dear Jamie,

Thanks for taking time to answer my question.

Your assumption from my scenario is right. I just wanted to know whether SQL 2005 support this scenario and if yes, I need a sample package and its configuration.

Look forward to hearing from you ASAP.

Regards,
Sassan

PS. I use BizTalk Server 2004 now and I want to change it using SSIS

|||

There are lots of SSIS samples provided with SQL 2005, take a look.

The task you are describing seems to fit exactly Import/Export Wizard goals - just run the wizard, follow it, and it will create the package you need.

If needed, you may then open this package in BI Dev Studio and modify it further.

|||

Dear Michael

As I am working with the BI Dev Studio and ran some of the Hands On Labs, I see that after I create the package, I must run it (Or schedule to run it regurarly) and then all the data wil be transformed by the rules I set.

I need something like trigger in order my package run by each record addition in source.

Therefore I need a sample for a scenario I explained.

Regards,

Sassan

|||

So, you mean you want the SSIS package to execute whenever a certain event occurs? Well, you can execute a SSIS package using a command line utility called dtexec.exe. It is up to you how you capture the events that you need trigger the call to dtexec.exe

-Jamie

|||

Dear Jamie,

You are right. I have two questions which needs help:

1- I have an application server which stores data via its Data Access Layer and My DB Server is not on the same server, Therefore is there another way except using triggers?

2- How can I call the package for transforming just the new record that I want to be added in the old system. I see that running package cause all data to be transformed. The only solution I see is that I set a flag in the source record after transformation and then in my package use a conditional block to just transform the records with the flag unset.

Look forward to hearing from you ASAP.

Regards,
Sassan

|||

sassankz wrote:

You are right. I have two questions which needs help:

1- I have an application server which stores data via its Data Access Layer and My DB Server is not on the same server, Therefore is there another way except using triggers?

Don't really understand what you're trying to do. When a record arrives in your "data access layer" (which I presume is a database), do you want to automatically put it into your DB server? If so....SSIS is NOT the tool that you should be using to do this. SQL Server has a number of methods of achieving this. Look at replication, log shipping or data mirroring.

sassankz wrote:

2- How can I call the package for transforming just the new record that I want to be added in the old system. I see that running package cause all data to be transformed. The only solution I see is that I set a flag in the source record after transformation and then in my package use a conditional block to just transform the records with the flag unset.

If you are going to continue using SSIS (which I don't think you should) then that sounds like a possible solution.

-Jamie

|||

Dear Jamie,

Let me explain the whole scenario. I am developing a Human Resource Management System for my client. Therefore I have my database design based on the customer requirements. The client has a payroll application which is working now and will continue to use it also. Therefore only HRM handles the Employee Introduction and I must feed the Payroll required data in order to work properly parallel with HRM.

Now there exists some data transformation from HRM to Payroll and also the data must be synchronized. It means that if I even can delete an employee from HRM, for sure I can do that it in payroll and must do that.

Now I want to have SSIS for this scenario. I want to develop some packages which handles the transformation and like you said some how data mirroring.

Please let me know whether you see SSIS useful for the above task or not.

Regards,
Sassan

|||

SSIS is inherently a tool for batch operations. If you want the changes between databases to be propogated immediately then you should be loking at some other tool in my opinion.

If the data has to be transformed in some way then perhaps you may want to look at BizTalk. I don't know much about BizTalk but this seems to be the sort of scenario for which it is designed.

-Jamie

|||

Dear Jamie,

Thanks for your recommendations. I decided to schedule package runs.

Is there a solution in SSIS in order to update destination source if the row exists, since it truncate the whole table or append data in its wizard? (I know I can write SQL Codes, I just want to know whether it has a ready configuration to support this.)

Regards,
Sassan

|||

Sassan,

SSIS is really not good in updates... Doing that in SQL is much better... So it might be a good idea to design something like an "update table" which is filled within the data flow where you "buffer" all the changes and commit that updates to the "real" table using a SQL update...

|||

Sassan,

just to add a little thing...

I just wrote something about the updates on the whislist maintained by Kirk in this forum... So I thought about how to implement that "update table"...

There might be (at least) to approaches, one is "clean" and one is "dirty", but might be a little bit more performant...

The "clean" way would be using a lookup to find out if a record already exists in the destination. If it does, then it's an update (redirect the record to the update table), if not it's new and you can send it to the destination.

The "dirty" way would be to send all records to the destination. If the write fails (because the primary key already exists) you redirect the error records to the update table. I didn't try this approach, yet. But it might work...

You might get into trouble if you can have a record more than once in a run. Then it might be a "new" and an "update" in the same batch. That might be interesting to find out...