Home Business Intelligence Generating sample data with Copilot in Fabric Dataflows Gen2

Generating sample data with Copilot in Fabric Dataflows Gen2

by zaki Ghassan


As a Power Query fan I’m naturally interested in Copilot in Dataflows Gen2. But since Power Query is already very easy to use and since I’m pretty experienced at using it, can Copilot really do anything to help me be more productive? Well one thing I need to do a lot for blog posts, demos and testing is to generate tables of sample data. Typing data into the Enter Data source can be quite tedious and writing the M code to generate sample data isn’t straightforward, so I was wondering if Copilot in Dataflows Gen2 could do it for me. It turns out it can do so really well.

After a little bit of experimentation I’ve come up with an example prompt that can be easily modified to return a table of random values with columns of different data types and a given number of rows, and which works fairly reliably:

Return a table consisting of the following five columns: 
Country, containing one of 10 random country names; 
Product, containing one of 15 random fruit names; 
Person, containing one of 20 random first names that are common in the United Kingdom;
Date, containing random dates between January 1st 2019 and December 31st 2024;
Sales, containing random integers between 0 and 500. 
The table should have 1000 rows. Set an appropriate data type for each column.

Here’s an example of the M code generated:

let
    // Generate lists of random values
    Countries = {"United States", "Canada", "United Kingdom", "Germany", "France", "Italy", "Spain", "Australia", "India", "China"},
    Products = {"Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Grape", "Honeydew", "Kiwi", "Lemon", "Mango", "Nectarine", "Orange", "Papaya", "Quince"},
    People = {"James", "Mary", "John", "Patricia", "Robert", "Jennifer", "Michael", "Linda", "William", "Elizabeth", "David", "Barbara", "Richard", "Susan", "Joseph", "Jessica", "Thomas", "Sarah", "Charles", "Karen"},
    StartDate = #date(2019, 1, 1),
    EndDate = #date(2024, 12, 31),
    DaysBetween = Duration.Days(EndDate - StartDate) + 1,

    // Function to generate random dates
    RandomDate = () => Date.From(StartDate + #duration(Number.RoundDown(Number.RandomBetween(0, DaysBetween)), 0, 0, 0)),

    // Generate table records
    Records = List.Transform({1..1000}, each [
        Country = Countries{Number.Mod(_, List.Count(Countries))},
        Product = Products{Number.Mod(_, List.Count(Products))},
        Person = People{Number.Mod(_, List.Count(People))},
        Date = RandomDate(),
        Sales = Number.RoundDown(Number.RandomBetween(0, 500))
    ]),

    // Convert list of records to table and set data types
    Source = Table.FromRecords(Records),
    TypedTable = Table.TransformColumnTypes(Source, {
        {"Country", type text},
        {"Product", type text},
        {"Person", type text},
        {"Date", type date},
        {"Sales", Int32.Type}
    })
in
    TypedTable

And here’s an example of the output:

Definitely a time saver as far as I’m concerned. Is it totally reliable? No: it occasionally produces code that errors or which doesn’t contain genuinely random values, but it’s good enough and it’s faster to try the prompt once or twice than write the code myself. I know there are other, more sophisticated ways of generating sample data like this in Fabric, for example in Python, but as I said I’m a Power Query person.

And of course, for bonus points, we can now send the output of a Dataflow Gen2 to a CSV file in SharePoint which makes this even more useful:

Published



Source link

Related Posts

Leave a Comment