First, a "string" is text data as compared to numerical data - meaning numbers instead of words.
Use Case: You have to export data from different data sources. The problem is that data stored in one system (say MySQL, Twitter, or Excel), is in a different format. You need all your column (field or feature) data to match. You may ask, so how can this happen? Well, it is not unusual for different departments to have different naming conventions as to how they store data. And, of course, there is always "human error" (which is the example below). Someone just entered in the information incorrectly by putting in extra spaces before and after the data in a few rows. What we see: " Green Dent " What we should see: "Green Dent" Strip out the spaces at the beginning or the end using Python. Name = " Green Dent " x = Name.strip() ********************************************************** What we see: " Turtle Media " What we should see: "Turtle Media" Strip out the spaces at the beginning or the end using Python. Company = " Turtle Media " x = Company.strip() Result: "Turtle Media" *********************************************************** TechCamp Basics: The above code creates a variable (a bucket) to hold the "string" or text. Anything on the right side of the equal sign tells Python to "assign" or hold this info in a variable. A variable is great because we can then use it instead of the string - consider it a shortcut to code. "Assign" to variables or "declare" variables - these are words you may here. The .strip() is a method in Python that does the all the work for us! Try this in your Jupyter Notebook. Don't forget that you can use Google's Colab to start coding in Python without installing anthing on your computer. Here is the link: Google's Colab.
0 Comments
|
AuthorWelcome to my blog on "data wrangling". This is the first post on this topic and I will add to it periodically. I write this from my experience being involved 'deep' in data - from my experience with various customer data in various industries to working with data for academic purposes. This first is simple - it is just a collection of string functions to manipulate data. I will add use cases as time permits. Gwendolyn Stripling, Ph.D., Data Driven. ArchivesCategories |
Photo used under Creative Commons from wuestenigel