Preparing Custom Data

You can use any kind of text data for this project. In my case i use spanish letters from mexican popular signers.

Is important to mention that this would have a better performance if you do it in English, but you can choose to languages.

You can combine all the data into one single text file or split them into multiples text files in one directory. You should use the following command, to delimit each data. For this example i name my file as lyric.txt

<|endoftext|>

For example you could have something like this

Lorem ipsum dolor sit amet, ante pellentesque quam non ultricies lectus. Sed molestie ut enim id porta, dolor sit, sodales integer magna fermentum, suspendisse aenean nulla sit ligula ultricies fermentum. Urna elit lorem, magna magna sed mi platea duis id, quam sem enim ridiculus nunc proin et. Suspendisse integer, tortor massa rhoncus vel nibh. Interdum viverra faucibus mi turpis voluptas. Eu aliquet per, ea sit vestibulum vulputate, suspendisse nec. Id eget metus volutpat blandit vulputate, tellus ligula vel arcu at turpis risus, volutpat nibh sed. Luctus consequat, sapien a velit nec neque ipsum, ligula vestibulum sed morbi est ac. Aliquam vitae curae rhoncus. Odio ipsum, donec volutpat, nullam class ipsa volutpat in magnis hendrerit, placerat a pharetra in fringilla massa ligula, suspendisse suscipit libero aenean et ipsum ornare.

<|endoftext|>

Mauris lectus platea, eleifend ullamcorper arcu faucibus pellentesque aliquam, adipiscing lectus eget enim tellus quis purus. Mauris cras non suspendisse nunc pellentesque, tincidunt nec, sodales malesuada ligula aliquam porta integer. Mauris tristique quam ullamcorper nullam turpis. Lacinia dolor morbi in, fringilla risus arcu nec lobortis, mus amet a. Congue tellus varius, quis ante, elit in. Ut eu pede sodales ligula inceptos mauris, felis consequat et metus dictum, nulla tempor urna lectus, egestas praesent tellus, accumsan pede proin sit nullam mollis. Velit sit, massa elementum facilisis, quis etiam risus. Consequat porttitor, purus placerat condimentum placerat et, lorem vel aliquam turpis praesent, fringilla eros lacus ac, donec ligula egestas.

You would need to training data manually via copy and paste method. Try to get more data so you have a better performance, in the output.

Once you complete your training data. Move the folder or files to the src directory.

The next step is to encode it so that you can use it for multiples runs. For this you should execute the following command.

$ python encode.py lyric.txt lyric.npz

In this case lyric.txt would be the file that you previously create and lyric.npz would be the encode file that would be generated.

Depending on the size of your data would be the time that this could take to generate.

Last updated