Count string occurrences in pandas raw data row
up vote
7
down vote
favorite
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
add a comment |
up vote
7
down vote
favorite
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
@OmkarSabade preferably just to get the number of separators thatpandas
inferred -- but either way is acceptable.
– David L
2 hours ago
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
I have a csv file as follows:
name,age
something
tom,20
And when I put it into a dataframe it looks like:
df = pd.read_csv('file', header=None)
0 1
1 name age
2 something NaN
3 tom 20
How would I get the count of a comma in the raw row data. For example, the answer should look like:
# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
python python-3.x pandas csv dataframe
python python-3.x pandas csv dataframe
edited 1 hour ago
coldspeed
116k18107185
116k18107185
asked 2 hours ago
Henry H
1767
1767
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
@OmkarSabade preferably just to get the number of separators thatpandas
inferred -- but either way is acceptable.
– David L
2 hours ago
add a comment |
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
@OmkarSabade preferably just to get the number of separators thatpandas
inferred -- but either way is acceptable.
– David L
2 hours ago
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
@OmkarSabade preferably just to get the number of separators that
pandas
inferred -- but either way is acceptable.– David L
2 hours ago
@OmkarSabade preferably just to get the number of separators that
pandas
inferred -- but either way is acceptable.– David L
2 hours ago
add a comment |
4 Answers
4
active
oldest
votes
up vote
3
down vote
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
up vote
3
down vote
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
df = pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
df
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
add a comment |
up vote
0
down vote
Try below code
df = pd.read_csv('file', header=None)
df['_count_separators'] = df.count(axis='columns')
print(df)
output:
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
add a comment |
up vote
0
down vote
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something
?
– Dark
2 hours ago
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)
give anan
in his sample.
– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
up vote
3
down vote
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
add a comment |
up vote
3
down vote
up vote
3
down vote
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
Doing this
df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again
df2['0'].str.findall(',').str.len() # then one row into one cell , using str find
0 1
1 0
2 1
3 5
Name: 0, dtype: int64
df['_count_separators']=df2['0'].str.findall(',').str.len()
Data
name,age
something
tom,20
something,,,,,somethingelse
answered 2 hours ago
W-B
99.1k73162
99.1k73162
add a comment |
add a comment |
up vote
3
down vote
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
df = pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
df
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
add a comment |
up vote
3
down vote
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
df = pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
df
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
add a comment |
up vote
3
down vote
up vote
3
down vote
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
df = pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
df
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
Very simply, read your data as a single column series, then split on comma and concatenate with separator count.
# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)
df = pd.concat([
s.str.split(',', expand=True),
s.str.count(',').rename('_count_sep')
], axis=1)
df
0 1 _count_sep
0 name age 1
1 something None 0
2 tom 20 1
answered 2 hours ago
coldspeed
116k18107185
116k18107185
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
add a comment |
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
We are on the same road:-) cheers
– W-B
1 hour ago
We are on the same road:-) cheers
– W-B
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago
1
1
I read your mind hahahaha:-)
– W-B
1 hour ago
I read your mind hahahaha:-)
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
But learn new strcount:-) thanks man
– W-B
1 hour ago
1
1
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
Your answers stopped me from thinking otherwise
– Dark
1 hour ago
add a comment |
up vote
0
down vote
Try below code
df = pd.read_csv('file', header=None)
df['_count_separators'] = df.count(axis='columns')
print(df)
output:
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
add a comment |
up vote
0
down vote
Try below code
df = pd.read_csv('file', header=None)
df['_count_separators'] = df.count(axis='columns')
print(df)
output:
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
add a comment |
up vote
0
down vote
up vote
0
down vote
Try below code
df = pd.read_csv('file', header=None)
df['_count_separators'] = df.count(axis='columns')
print(df)
output:
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
Try below code
df = pd.read_csv('file', header=None)
df['_count_separators'] = df.count(axis='columns')
print(df)
output:
0 1 _count_separators
1 name age 1
2 something NaN 0
3 tom 20 1
answered 2 hours ago
Anjaneyulu Batta
3,23511333
3,23511333
add a comment |
add a comment |
up vote
0
down vote
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something
?
– Dark
2 hours ago
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)
give anan
in his sample.
– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
add a comment |
up vote
0
down vote
One line of code: len(df) - df[1].isna().sum()
Ohk if the nan itself is a part of the dataset then? likesomething,,,something
?
– Dark
2 hours ago
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)
give anan
in his sample.
– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
add a comment |
up vote
0
down vote
up vote
0
down vote
One line of code: len(df) - df[1].isna().sum()
One line of code: len(df) - df[1].isna().sum()
answered 2 hours ago
Quang Hoang
1,6421913
1,6421913
Ohk if the nan itself is a part of the dataset then? likesomething,,,something
?
– Dark
2 hours ago
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)
give anan
in his sample.
– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
add a comment |
Ohk if the nan itself is a part of the dataset then? likesomething,,,something
?
– Dark
2 hours ago
i'm not sure in which instance woulddf = pd.read_csv('file.csv', header=None)
give anan
in his sample.
– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
Ohk if the nan itself is a part of the dataset then? like
something,,,something
?– Dark
2 hours ago
Ohk if the nan itself is a part of the dataset then? like
something,,,something
?– Dark
2 hours ago
i'm not sure in which instance would
df = pd.read_csv('file.csv', header=None)
give a nan
in his sample.– Quang Hoang
2 hours ago
i'm not sure in which instance would
df = pd.read_csv('file.csv', header=None)
give a nan
in his sample.– Quang Hoang
2 hours ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
This assumes there are only two columns...?
– coldspeed
1 hour ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
do you also want to count the commas if they're in the column value?
– Omkar Sabade
2 hours ago
@OmkarSabade preferably just to get the number of separators that
pandas
inferred -- but either way is acceptable.– David L
2 hours ago